WorldWideScience

Sample records for additive classification tree

  1. Boosted classification trees result in minor to modest improvement in the accuracy in classifying cardiovascular outcomes compared to conventional classification trees

    Science.gov (United States)

    Austin, Peter C; Lee, Douglas S

    2011-01-01

    Purpose: Classification trees are increasingly being used to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors' objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. PMID:22254181

  2. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment betw...

  3. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge Emil Borch Laurs; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment...

  4. The decision tree approach to classification

    Science.gov (United States)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  5. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  6. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    Science.gov (United States)

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  7. Automated Decision Tree Classification of Corneal Shape

    Science.gov (United States)

    Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

    2011-01-01

    Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification

  8. Tree Classification with Fused Mobile Laser Scanning and Hyperspectral Data

    Science.gov (United States)

    Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha

    2011-01-01

    Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin. PMID:22163894

  9. A new tree classification system for southern hardwoods

    Science.gov (United States)

    James S. Meadows; Daniel A. Jr. Skojac

    2008-01-01

    A new tree classification system for southern hardwoods is described. The new system is based on the Putnam tree classification system, originally developed by Putnam et al., 1960, Management ond inventory of southern hardwoods, Agriculture Handbook 181, US For. Sew., Washington, DC, which consists of four tree classes: (1) preferred growing stock, (2) reserve growing...

  10. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

    Science.gov (United States)

    Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

    2016-01-01

    Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  11. Transferability of decision trees for land cover classification in a ...

    African Journals Online (AJOL)

    This paper attempts to derive classification rules from training data of four Landsat-8 scenes by using the classification and regression tree (CART) implementation of the decision tree algorithm. The transferability of the ruleset was evaluated by classifying two adjacent scenes. The classification of the four mosaicked scenes ...

  12. Phylogenetic classification and the universal tree.

    Science.gov (United States)

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  13. Fast Image Texture Classification Using Decision Trees

    Science.gov (United States)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  14. Predicting Battle Outcomes with Classification Trees

    National Research Council Canada - National Science Library

    Coban, Muzaffer

    2001-01-01

    ... from the actual battlefield, The models built by using classification trees reveal that the objective variables alone cannot explain the outcome of battles, Relative factors, such as leadership, have deep...

  15. Lidar-based individual tree species classification using convolutional neural network

    Science.gov (United States)

    Mizoguchi, Tomohiro; Ishii, Akira; Nakamura, Hiroyuki; Inoue, Tsuyoshi; Takamatsu, Hisashi

    2017-06-01

    Terrestrial lidar is commonly used for detailed documentation in the field of forest inventory investigation. Recent improvements of point cloud processing techniques enabled efficient and precise computation of an individual tree shape parameters, such as breast-height diameter, height, and volume. However, tree species are manually specified by skilled workers to date. Previous works for automatic tree species classification mainly focused on aerial or satellite images, and few works have been reported for classification techniques using ground-based sensor data. Several candidate sensors can be considered for classification, such as RGB or multi/hyper spectral cameras. Above all candidates, we use terrestrial lidar because it can obtain high resolution point cloud in the dark forest. We selected bark texture for the classification criteria, since they clearly represent unique characteristics of each tree and do not change their appearance under seasonable variation and aged deterioration. In this paper, we propose a new method for automatic individual tree species classification based on terrestrial lidar using Convolutional Neural Network (CNN). The key component is the creation step of a depth image which well describe the characteristics of each species from a point cloud. We focus on Japanese cedar and cypress which cover the large part of domestic forest. Our experimental results demonstrate the effectiveness of our proposed method.

  16. Object-based methods for individual tree identification and tree species classification from high-spatial resolution imagery

    Science.gov (United States)

    Wang, Le

    2003-10-01

    Modern forest management poses an increasing need for detailed knowledge of forest information at different spatial scales. At the forest level, the information for tree species assemblage is desired whereas at or below the stand level, individual tree related information is preferred. Remote Sensing provides an effective tool to extract the above information at multiple spatial scales in the continuous time domain. To date, the increasing volume and readily availability of high-spatial-resolution data have lead to a much wider application of remotely sensed products. Nevertheless, to make effective use of the improving spatial resolution, conventional pixel-based classification methods are far from satisfactory. Correspondingly, developing object-based methods becomes a central challenge for researchers in the field of Remote Sensing. This thesis focuses on the development of methods for accurate individual tree identification and tree species classification. We develop a method in which individual tree crown boundaries and treetop locations are derived under a unified framework. We apply a two-stage approach with edge detection followed by marker-controlled watershed segmentation. Treetops are modeled from radiometry and geometry aspects. Specifically, treetops are assumed to be represented by local radiation maxima and to be located near the center of the tree-crown. As a result, a marker image was created from the derived treetop to guide a watershed segmentation to further differentiate overlapping trees and to produce a segmented image comprised of individual tree crowns. The image segmentation method developed achieves a promising result for a 256 x 256 CASI image. Then further effort is made to extend our methods to the multiscales which are constructed from a wavelet decomposition. A scale consistency and geometric consistency are designed to examine the gradients along the scale-space for the purpose of separating true crown boundary from unwanted

  17. Building classification trees to explain the radioactive contamination levels of the plants

    International Nuclear Information System (INIS)

    Briand, B.

    2008-04-01

    The objective of this thesis is the development of a method allowing the identification of factors leading to various radioactive contamination levels of the plants. The methodology suggested is based on the use of a radioecological transfer model of the radionuclides through the environment (A.S.T.R.A.L. computer code) and a classification-tree method. Particularly, to avoid the instability problems of classification trees and to preserve the tree structure, a node level stabilizing technique is used. Empirical comparisons are carried out between classification trees built by this method (called R.E.N. method) and those obtained by the C.A.R.T. method. A similarity measure is defined to compare the structure of two classification trees. This measure is used to study the stabilizing performance of the R.E.N. method. The methodology suggested is applied to a simplified contamination scenario. By the results obtained, we can identify the main variables responsible of the various radioactive contamination levels of four leafy-vegetables (lettuce, cabbage, spinach and leek). Some extracted rules from these classification trees can be usable in a post-accidental context. (author)

  18. Time Series of Images to Improve Tree Species Classification

    Science.gov (United States)

    Miyoshi, G. T.; Imai, N. N.; de Moraes, M. V. A.; Tommaselli, A. M. G.; Näsi, R.

    2017-10-01

    Tree species classification provides valuable information to forest monitoring and management. The high floristic variation of the tree species appears as a challenging issue in the tree species classification because the vegetation characteristics changes according to the season. To help to monitor this complex environment, the imaging spectroscopy has been largely applied since the development of miniaturized sensors attached to Unmanned Aerial Vehicles (UAV). Considering the seasonal changes in forests and the higher spectral and spatial resolution acquired with sensors attached to UAV, we present the use of time series of images to classify four tree species. The study area is an Atlantic Forest area located in the western part of São Paulo State. Images were acquired in August 2015 and August 2016, generating three data sets of images: only with the image spectra of 2015; only with the image spectra of 2016; with the layer stacking of images from 2015 and 2016. Four tree species were classified using Spectral angle mapper (SAM), Spectral information divergence (SID) and Random Forest (RF). The results showed that SAM and SID caused an overfitting of the data whereas RF showed better results and the use of the layer stacking improved the classification achieving a kappa coefficient of 18.26 %.

  19. The process and utility of classification and regression tree methodology in nursing research.

    Science.gov (United States)

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-06-01

    This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.

  20. Deep Multi-Task Learning for Tree Genera Classification

    Science.gov (United States)

    Ko, C.; Kang, J.; Sohn, G.

    2018-05-01

    The goal for our paper is to classify tree genera using airborne Light Detection and Ranging (LiDAR) data with Convolution Neural Network (CNN) - Multi-task Network (MTN) implementation. Unlike Single-task Network (STN) where only one task is assigned to the learning outcome, MTN is a deep learning architect for learning a main task (classification of tree genera) with other tasks (in our study, classification of coniferous and deciduous) simultaneously, with shared classification features. The main contribution of this paper is to improve classification accuracy from CNN-STN to CNN-MTN. This is achieved by introducing a concurrence loss (Lcd) to the designed MTN. This term regulates the overall network performance by minimizing the inconsistencies between the two tasks. Results show that we can increase the classification accuracy from 88.7 % to 91.0 % (from STN to MTN). The second goal of this paper is to solve the problem of small training sample size by multiple-view data generation. The motivation of this goal is to address one of the most common problems in implementing deep learning architecture, the insufficient number of training data. We address this problem by simulating training dataset with multiple-view approach. The promising results from this paper are providing a basis for classifying a larger number of dataset and number of classes in the future.

  1. Improving Land Use/Land Cover Classification by Integrating Pixel Unmixing and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chao Yang

    2017-11-01

    Full Text Available Decision tree classification is one of the most efficient methods for obtaining land use/land cover (LULC information from remotely sensed imageries. However, traditional decision tree classification methods cannot effectively eliminate the influence of mixed pixels. This study aimed to integrate pixel unmixing and decision tree to improve LULC classification by removing mixed pixel influence. The abundance and minimum noise fraction (MNF results that were obtained from mixed pixel decomposition were added to decision tree multi-features using a three-dimensional (3D Terrain model, which was created using an image fusion digital elevation model (DEM, to select training samples (ROIs, and improve ROI separability. A Landsat-8 OLI image of the Yunlong Reservoir Basin in Kunming was used to test this proposed method. Study results showed that the Kappa coefficient and the overall accuracy of integrated pixel unmixing and decision tree method increased by 0.093% and 10%, respectively, as compared with the original decision tree method. This proposed method could effectively eliminate the influence of mixed pixels and improve the accuracy in complex LULC classifications.

  2. Decision tree methods: applications for classification and prediction.

    Science.gov (United States)

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  3. Vlsi implementation of flexible architecture for decision tree classification in data mining

    Science.gov (United States)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  4. Modeling time-to-event (survival) data using classification tree analysis.

    Science.gov (United States)

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  5. Crown-Level Tree Species Classification Using Integrated Airborne Hyperspectral and LIDAR Remote Sensing Data

    Science.gov (United States)

    Wang, Z.; Wu, J.; Wang, Y.; Kong, X.; Bao, H.; Ni, Y.; Ma, L.; Jin, J.

    2018-05-01

    Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR) and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1) A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM) derived from LiDAR data; 2) The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3) Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4) The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90) performed better than LiDAR-metrics method (OA = 79

  6. Tree species classification using within crown localization of waveform LiDAR attributes

    Science.gov (United States)

    Blomley, Rosmarie; Hovi, Aarne; Weinmann, Martin; Hinz, Stefan; Korpela, Ilkka; Jutzi, Boris

    2017-11-01

    Since forest planning is increasingly taking an ecological, diversity-oriented perspective into account, remote sensing technologies are becoming ever more important in assessing existing resources with reduced manual effort. While the light detection and ranging (LiDAR) technology provides a good basis for predictions of tree height and biomass, tree species identification based on this type of data is particularly challenging in structurally heterogeneous forests. In this paper, we analyse existing approaches with respect to the geometrical scale of feature extraction (whole tree, within crown partitions or within laser footprint) and conclude that currently features are always extracted separately from the different scales. Since multi-scale approaches however have proven successful in other applications, we aim to utilize the within-tree-crown distribution of within-footprint signal characteristics as additional features. To do so, a spin image algorithm, originally devised for the extraction of 3D surface features in object recognition, is adapted. This algorithm relies on spinning an image plane around a defined axis, e.g. the tree stem, collecting the number of LiDAR returns or mean values of returns attributes per pixel as respective values. Based on this representation, spin image features are extracted that comprise only those components of highest variability among a given set of library trees. The relative performance and the combined improvement of these spin image features with respect to non-spatial statistical metrics of the waveform (WF) attributes are evaluated for the tree species classification of Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst.) and Silver/Downy birch (Betula pendula Roth/Betula pubescens Ehrh.) in a boreal forest environment. This evaluation is performed for two WF LiDAR datasets that differ in footprint size, pulse density at ground, laser wavelength and pulse width. Furthermore, we evaluate the

  7. An object-oriented classification method of high resolution imagery based on improved AdaTree

    International Nuclear Information System (INIS)

    Xiaohe, Zhang; Liang, Zhai; Jixian, Zhang; Huiyong, Sang

    2014-01-01

    With the popularity of the application using high spatial resolution remote sensing image, more and more studies paid attention to object-oriented classification on image segmentation as well as automatic classification after image segmentation. This paper proposed a fast method of object-oriented automatic classification. First, edge-based or FNEA-based segmentation was used to identify image objects and the values of most suitable attributes of image objects for classification were calculated. Then a certain number of samples from the image objects were selected as training data for improved AdaTree algorithm to get classification rules. Finally, the image objects could be classified easily using these rules. In the AdaTree, we mainly modified the final hypothesis to get classification rules. In the experiment with WorldView2 image, the result of the method based on AdaTree showed obvious accuracy and efficient improvement compared with the method based on SVM with the kappa coefficient achieving 0.9242

  8. PCA based feature reduction to improve the accuracy of decision tree c4.5 classification

    Science.gov (United States)

    Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.

    2018-03-01

    Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.

  9. A novel approach to internal crown characterization for coniferous tree species classification

    Science.gov (United States)

    Harikumar, A.; Bovolo, F.; Bruzzone, L.

    2016-10-01

    The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.

  10. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Greve, Mogens Humlekrog; Bøcher, Peder Klith

    2010-01-01

    the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature...... field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v......) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME...

  11. CROWN-LEVEL TREE SPECIES CLASSIFICATION USING INTEGRATED AIRBORNE HYPERSPECTRAL AND LIDAR REMOTE SENSING DATA

    Directory of Open Access Journals (Sweden)

    Z. Wang

    2018-05-01

    Full Text Available Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1 A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM derived from LiDAR data; 2 The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3 Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4 The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90 performed better than Li

  12. Decision tree approach for classification of remotely sensed satellite ...

    Indian Academy of Sciences (India)

    sensed satellite data using open source support. Richa Sharma .... Decision tree classification techniques have been .... the USGS Earth Resource Observation Systems. (EROS) ... for shallow water, 11% were for sparse and dense built-up ...

  13. Crown-level tree species classification from AISA hyperspectral imagery using an innovative pixel-weighting approach

    Science.gov (United States)

    Liu, Haijian; Wu, Changshan

    2018-06-01

    Crown-level tree species classification is a challenging task due to the spectral similarity among different tree species. Shadow, underlying objects, and other materials within a crown may decrease the purity of extracted crown spectra and further reduce classification accuracy. To address this problem, an innovative pixel-weighting approach was developed for tree species classification at the crown level. The method utilized high density discrete LiDAR data for individual tree delineation and Airborne Imaging Spectrometer for Applications (AISA) hyperspectral imagery for pure crown-scale spectra extraction. Specifically, three steps were included: 1) individual tree identification using LiDAR data, 2) pixel-weighted representative crown spectra calculation using hyperspectral imagery, with which pixel-based illuminated-leaf fractions estimated using a linear spectral mixture analysis (LSMA) were employed as weighted factors, and 3) representative spectra based tree species classification was performed through applying a support vector machine (SVM) approach. Analysis of results suggests that the developed pixel-weighting approach (OA = 82.12%, Kc = 0.74) performed better than treetop-based (OA = 70.86%, Kc = 0.58) and pixel-majority methods (OA = 72.26, Kc = 0.62) in terms of classification accuracy. McNemar tests indicated the differences in accuracy between pixel-weighting and treetop-based approaches as well as that between pixel-weighting and pixel-majority approaches were statistically significant.

  14. Decision tree approach for classification of remotely sensed satellite

    Indian Academy of Sciences (India)

    DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source ...

  15. Classification of Parkinsonian syndromes from FDG-PET brain data using decision trees with SSM/PCA features.

    Science.gov (United States)

    Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M

    2015-01-01

    Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.

  16. Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

    Science.gov (United States)

    Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

    Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.

  17. A classification tree for the prediction of benign versus malignant disease in patients with small renal masses.

    Science.gov (United States)

    Rendon, Ricardo A; Mason, Ross J; Kirkland, Susan; Lawen, Joseph G; Abdolell, Mohamed

    2014-08-01

    To develop a classification tree for the preoperative prediction of benign versus malignant disease in patients with small renal masses. This is a retrospective study including 395 consecutive patients who underwent surgical treatment for a renal mass classification tree to predict the risk of having a benign renal mass preoperatively was developed using recursive partitioning analysis for repeated measures outcomes. Age, sex, volume on preoperative imaging, tumor location (central/peripheral), degree of endophytic component (1%-100%), and tumor axis position were used as potential predictors to develop the model. Forty-five patients (11.4%) were found to have a benign mass postoperatively. A classification tree has been developed which can predict the risk of benign disease with an accuracy of 88.9% (95% CI: 85.3 to 91.8). The significant prognostic factors in the classification tree are tumor volume, degree of endophytic component and symptoms at diagnosis. As an example of its utilization, a renal mass with a volume of classification tree to predict the risk of benign disease in small renal masses has been developed to aid the clinician when deciding on treatment strategies for small renal masses.

  18. Hierarchical classification with a competitive evolutionary neural tree.

    Science.gov (United States)

    Adams, R G.; Butchart, K; Davey, N

    1999-04-01

    A new, dynamic, tree structured network, the Competitive Evolutionary Neural Tree (CENT) is introduced. The network is able to provide a hierarchical classification of unlabelled data sets. The main advantage that the CENT offers over other hierarchical competitive networks is its ability to self determine the number, and structure, of the competitive nodes in the network, without the need for externally set parameters. The network produces stable classificatory structures by halting its growth using locally calculated heuristics. The results of network simulations are presented over a range of data sets, including Anderson's IRIS data set. The CENT network demonstrates its ability to produce a representative hierarchical structure to classify a broad range of data sets.

  19. An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms

    Directory of Open Access Journals (Sweden)

    René Roland Colditz

    2015-07-01

    Full Text Available Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for discrete and continuous land cover mapping using classification and regression tree algorithms. In addition to measures of discrete and continuous map accuracy the correct estimation of the area is another important criteria. A subset of the 30 m national land cover dataset of 2006 (NLCD2006 of the United States was used as reference set to classify NADIR BRDF-adjusted surface reflectance time series of MODIS at 900 m spatial resolution. Results show that sampling of heterogeneous pixels and sample allocation according to the expected area of each class is best for classification trees. Regression trees for continuous land cover mapping should be trained with random allocation, and predictions should be normalized with a linear scaling function to correctly estimate the total area. From the tested algorithms random forest classification yields lower errors than boosted trees of C5.0, and Cubist shows higher accuracies than random forest regression.

  20. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data

    Directory of Open Access Journals (Sweden)

    Connie Ko

    2016-08-01

    Full Text Available Recent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI is proposed that characterizes the diversity of data quality (Qi among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple and the validation samples contains four labels (pine; poplar; maple; and “unknown”. Classification accuracy improved from 72.8%; when training samples were

  1. Woodland Mapping at Single-Tree Levels Using Object-Oriented Classification of Unmanned Aerial Vehicle (uav) Images

    Science.gov (United States)

    Chenari, A.; Erfanifard, Y.; Dehghani, M.; Pourghasemi, H. R.

    2017-09-01

    Remotely sensed datasets offer a reliable means to precisely estimate biophysical characteristics of individual species sparsely distributed in open woodlands. Moreover, object-oriented classification has exhibited significant advantages over different classification methods for delineation of tree crowns and recognition of species in various types of ecosystems. However, it still is unclear if this widely-used classification method can have its advantages on unmanned aerial vehicle (UAV) digital images for mapping vegetation cover at single-tree levels. In this study, UAV orthoimagery was classified using object-oriented classification method for mapping a part of wild pistachio nature reserve in Zagros open woodlands, Fars Province, Iran. This research focused on recognizing two main species of the study area (i.e., wild pistachio and wild almond) and estimating their mean crown area. The orthoimage of study area was consisted of 1,076 images with spatial resolution of 3.47 cm which was georeferenced using 12 ground control points (RMSE=8 cm) gathered by real-time kinematic (RTK) method. The results showed that the UAV orthoimagery classified by object-oriented method efficiently estimated mean crown area of wild pistachios (52.09±24.67 m2) and wild almonds (3.97±1.69 m2) with no significant difference with their observed values (α=0.05). In addition, the results showed that wild pistachios (accuracy of 0.90 and precision of 0.92) and wild almonds (accuracy of 0.90 and precision of 0.89) were well recognized by image segmentation. In general, we concluded that UAV orthoimagery can efficiently produce precise biophysical data of vegetation stands at single-tree levels, which therefore is suitable for assessment and monitoring open woodlands.

  2. Comprehensive decision tree models in bioinformatics.

    Directory of Open Access Journals (Sweden)

    Gregor Stiglic

    Full Text Available PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. METHODS: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. RESULTS: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. CONCLUSIONS: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets

  3. Comprehensive decision tree models in bioinformatics.

    Science.gov (United States)

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly

  4. WOODLAND MAPPING AT SINGLE-TREE LEVELS USING OBJECT-ORIENTED CLASSIFICATION OF UNMANNED AERIAL VEHICLE (UAV IMAGES

    Directory of Open Access Journals (Sweden)

    A. Chenari

    2017-09-01

    Full Text Available Remotely sensed datasets offer a reliable means to precisely estimate biophysical characteristics of individual species sparsely distributed in open woodlands. Moreover, object-oriented classification has exhibited significant advantages over different classification methods for delineation of tree crowns and recognition of species in various types of ecosystems. However, it still is unclear if this widely-used classification method can have its advantages on unmanned aerial vehicle (UAV digital images for mapping vegetation cover at single-tree levels. In this study, UAV orthoimagery was classified using object-oriented classification method for mapping a part of wild pistachio nature reserve in Zagros open woodlands, Fars Province, Iran. This research focused on recognizing two main species of the study area (i.e., wild pistachio and wild almond and estimating their mean crown area. The orthoimage of study area was consisted of 1,076 images with spatial resolution of 3.47 cm which was georeferenced using 12 ground control points (RMSE=8 cm gathered by real-time kinematic (RTK method. The results showed that the UAV orthoimagery classified by object-oriented method efficiently estimated mean crown area of wild pistachios (52.09±24.67 m2 and wild almonds (3.97±1.69 m2 with no significant difference with their observed values (α=0.05. In addition, the results showed that wild pistachios (accuracy of 0.90 and precision of 0.92 and wild almonds (accuracy of 0.90 and precision of 0.89 were well recognized by image segmentation. In general, we concluded that UAV orthoimagery can efficiently produce precise biophysical data of vegetation stands at single-tree levels, which therefore is suitable for assessment and monitoring open woodlands.

  5. Wall-to-wall tree type classification using airborne lidar data and CIR images

    DEFF Research Database (Denmark)

    Schumacher, Johannes; Nord-Larsen, Thomas

    2014-01-01

    analysed at the individual tree level (object-based). However, due to computational challenges, most object-based studies cover only smaller areas and experience of larger areas is lacking. We present an approach for an object-based, unsupervised classification of trees into broadleaf or conifer using......-based classification of the TST plots showed an overall accuracy of 84% and a kappa coefficient () of 0.61 when using all plots, and 92% and 0.79, respectively, when leaving out plots with larch. NFI plots were assigned to conifer- or broadleaf-dominated or mixed depending on the area covered by the segments...... of the two tree types. In areas where lidar data were collected specifically during leaf-off conditions, 71% of the NFI plots were assigned correctly into the three categories with = 0.53. Using only NFI plots dominated by one type (broadleaf or conifer), 78% were categorized correctly with = 0...

  6. Estimating population extinction thresholds with categorical classification trees for Louisiana black bears.

    Science.gov (United States)

    Laufenberg, Jared S; Clark, Joseph D; Chandler, Richard B

    2018-01-01

    Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus) was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR) data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years ([Formula: see text]) was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when [Formula: see text], suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.

  7. Estimating population extinction thresholds with categorical classification trees for Louisiana black bears.

    Directory of Open Access Journals (Sweden)

    Jared S Laufenberg

    Full Text Available Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years ([Formula: see text] was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when [Formula: see text], suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.

  8. Estimating population extinction thresholds with categorical classification trees for Louisiana black bears

    Science.gov (United States)

    Laufenberg, Jared S.; Clark, Joseph D.; Chandler, Richard B.

    2018-01-01

    Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus) was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR) data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years () was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when , suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.

  9. The Hybrid of Classification Tree and Extreme Learning Machine for Permeability Prediction in Oil Reservoir

    KAUST Repository

    Prasetyo Utomo, Chandra

    2011-01-01

    the permeability value. These are based on the well logs data. In order to handle the high range of the permeability value, a classification tree is utilized. A benefit of this innovation is that the tree represents knowledge in a clear and succinct fashion

  10. A ROUGH SET DECISION TREE BASED MLP-CNN FOR VERY HIGH RESOLUTION REMOTELY SENSED IMAGE CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2017-09-01

    Full Text Available Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP, which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.

  11. A novel transferable individual tree crown delineation model based on Fishing Net Dragging and boundary classification

    Science.gov (United States)

    Liu, Tao; Im, Jungho; Quackenbush, Lindi J.

    2015-12-01

    This study provides a novel approach to individual tree crown delineation (ITCD) using airborne Light Detection and Ranging (LiDAR) data in dense natural forests using two main steps: crown boundary refinement based on a proposed Fishing Net Dragging (FiND) method, and segment merging based on boundary classification. FiND starts with approximate tree crown boundaries derived using a traditional watershed method with Gaussian filtering and refines these boundaries using an algorithm that mimics how a fisherman drags a fishing net. Random forest machine learning is then used to classify boundary segments into two classes: boundaries between trees and boundaries between branches that belong to a single tree. Three groups of LiDAR-derived features-two from the pseudo waveform generated along with crown boundaries and one from a canopy height model (CHM)-were used in the classification. The proposed ITCD approach was tested using LiDAR data collected over a mountainous region in the Adirondack Park, NY, USA. Overall accuracy of boundary classification was 82.4%. Features derived from the CHM were generally more important in the classification than the features extracted from the pseudo waveform. A comprehensive accuracy assessment scheme for ITCD was also introduced by considering both area of crown overlap and crown centroids. Accuracy assessment using this new scheme shows the proposed ITCD achieved 74% and 78% as overall accuracy, respectively, for deciduous and mixed forest.

  12. Prediction of Infertility Treatment Outcomes Using Classification Trees

    Directory of Open Access Journals (Sweden)

    Milewska Anna Justyna

    2016-12-01

    Full Text Available Infertility is currently a common problem with causes that are often unexplained, which complicates treatment. In many cases, the use of ART methods provides the only possibility of getting pregnant. Analysis of this type of data is very complex. More and more often, data mining methods or artificial intelligence techniques are appropriate for solving such problems. In this study, classification trees were used for analysis. This resulted in obtaining a group of patients characterized most likely to get pregnant while using in vitro fertilization.

  13. Iqpc 2015 Track: Tree Separation and Classification in Mobile Mapping LIDAR Data

    Science.gov (United States)

    Gorte, B.; Oude Elberink, S.; Sirmacek, B.; Wang, J.

    2015-08-01

    The European FP7 project IQmulus yearly organizes several processing contests, where submissions are requested for novel algorithms for point cloud and other big geodata processing. This paper describes the set-up and execution of a contest having the purpose to evaluate state-of-the-art algorithms for Mobile Mapping System point clouds, in order to detect and identify (individual) trees. By the nature of MMS these are trees in the vicinity of the road network (rather than in forests). Therefore, part of the challenge is distinguishing between trees and other objects, such as buildings, street furniture, cars etc. Three submitted segmentation and classification algorithms are thus evaluated.

  14. Prospective identification of adolescent suicide ideation using classification tree analysis: Models for community-based screening.

    Science.gov (United States)

    Hill, Ryan M; Oosterhoff, Benjamin; Kaplow, Julie B

    2017-07-01

    Although a large number of risk markers for suicide ideation have been identified, little guidance has been provided to prospectively identify adolescents at risk for suicide ideation within community settings. The current study addressed this gap in the literature by utilizing classification tree analysis (CTA) to provide a decision-making model for screening adolescents at risk for suicide ideation. Participants were N = 4,799 youth (Mage = 16.15 years, SD = 1.63) who completed both Waves 1 and 2 of the National Longitudinal Study of Adolescent to Adult Health. CTA was used to generate a series of decision rules for identifying adolescents at risk for reporting suicide ideation at Wave 2. Findings revealed 3 distinct solutions with varying sensitivity and specificity for identifying adolescents who reported suicide ideation. Sensitivity of the classification trees ranged from 44.6% to 77.6%. The tree with greatest specificity and lowest sensitivity was based on a history of suicide ideation. The tree with moderate sensitivity and high specificity was based on depressive symptoms, suicide attempts or suicide among family and friends, and social support. The most sensitive but least specific tree utilized these factors and gender, ethnicity, hours of sleep, school-related factors, and future orientation. These classification trees offer community organizations options for instituting large-scale screenings for suicide ideation risk depending on the available resources and modality of services to be provided. This study provides a theoretically and empirically driven model for prospectively identifying adolescents at risk for suicide ideation and has implications for preventive interventions among at-risk youth. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.

    Science.gov (United States)

    Chen, Shizhi; Yang, Xiaodong; Tian, Yingli

    2015-09-01

    A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.

  16. Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

    Science.gov (United States)

    2002-01-01

    their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested

  17. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad

    2015-10-11

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\\\_ws\\\\_entSort, and Mult\\\\_ws\\\\_entML are good for both optimization and classification.

  18. Classification and Optimization of Decision Trees for Inconsistent Decision Tables Represented as MVD Tables

    KAUST Repository

    Azad, Mohammad; Moshkov, Mikhail

    2015-01-01

    Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples (objects) with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. We consider three approaches (generalized, most common and many-valued decision) to handle such inconsistency. We created different greedy algorithms using various types of impurity and uncertainty measures to construct decision trees. We compared the three approaches based on the decision tree properties of the depth, average depth and number of nodes. Based on the result of the comparison, we choose to work with the many-valued decision approach. Now to determine which greedy algorithms are efficient, we compared them based on the optimization and classification results. It was found that some greedy algorithms Mult\\_ws\\_entSort, and Mult\\_ws\\_entML are good for both optimization and classification.

  19. Fusion of LiDAR and aerial imagery for the estimation of downed tree volume using Support Vector Machines classification and region based object fitting

    Science.gov (United States)

    Selvarajan, Sowmya

    The study classifies 3D small footprint full waveform digitized LiDAR fused with aerial imagery to downed trees using Support Vector Machines (SVM) algorithm. Using small footprint waveform LiDAR, airborne LiDAR systems can provide better canopy penetration and very high spatial resolution. The small footprint waveform scanner system Riegl LMS-Q680 is addition with an UltraCamX aerial camera are used to measure and map downed trees in a forest. The various data preprocessing steps helped in the identification of ground points from the dense LiDAR dataset and segment the LiDAR data to help reduce the complexity of the algorithm. The haze filtering process helped to differentiate the spectral signatures of the various classes within the aerial image. Such processes, helped to better select the features from both sensor data. The six features: LiDAR height, LiDAR intensity, LiDAR echo, and three image intensities are utilized. To do so, LiDAR derived, aerial image derived and fused LiDAR-aerial image derived features are used to organize the data for the SVM hypothesis formulation. Several variations of the SVM algorithm with different kernels and soft margin parameter C are experimented. The algorithm is implemented to classify downed trees over a pine trees zone. The LiDAR derived features provided an overall accuracy of 98% of downed trees but with no classification error of 86%. The image derived features provided an overall accuracy of 65% and fusion derived features resulted in an overall accuracy of 88%. The results are observed to be stable and robust. The SVM accuracies were accompanied by high false alarm rates, with the LiDAR classification producing 58.45%, image classification producing 95.74% and finally the fused classification producing 93% false alarm rates The Canny edge correction filter helped control the LiDAR false alarm to 35.99%, image false alarm to 48.56% and fused false alarm to 37.69% The implemented classifiers provided a powerful tool for

  20. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Wong G William

    2008-06-01

    Full Text Available Abstract Background Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. Results This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost. We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. Conclusion In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection

  1. Bayesian and Classical Machine Learning Methods: A Comparison for Tree Species Classification with LiDAR Waveform Signatures

    Directory of Open Access Journals (Sweden)

    Tan Zhou

    2017-12-01

    Full Text Available A plethora of information contained in full-waveform (FW Light Detection and Ranging (LiDAR data offers prospects for characterizing vegetation structures. This study aims to investigate the capacity of FW LiDAR data alone for tree species identification through the integration of waveform metrics with machine learning methods and Bayesian inference. Specifically, we first conducted automatic tree segmentation based on the waveform-based canopy height model (CHM using three approaches including TreeVaW, watershed algorithms and the combination of TreeVaW and watershed (TW algorithms. Subsequently, the Random forests (RF and Conditional inference forests (CF models were employed to identify important tree-level waveform metrics derived from three distinct sources, such as raw waveforms, composite waveforms, the waveform-based point cloud and the combined variables from these three sources. Further, we discriminated tree (gray pine, blue oak, interior live oak and shrub species through the RF, CF and Bayesian multinomial logistic regression (BMLR using important waveform metrics identified in this study. Results of the tree segmentation demonstrated that the TW algorithms outperformed other algorithms for delineating individual tree crowns. The CF model overcomes waveform metrics selection bias caused by the RF model which favors correlated metrics and enhances the accuracy of subsequent classification. We also found that composite waveforms are more informative than raw waveforms and waveform-based point cloud for characterizing tree species in our study area. Both classical machine learning methods (the RF and CF and the BMLR generated satisfactory average overall accuracy (74% for the RF, 77% for the CF and 81% for the BMLR and the BMLR slightly outperformed the other two methods. However, these three methods suffered from low individual classification accuracy for the blue oak which is prone to being misclassified as the interior live oak due

  2. APPLICATION OF MULTIPLE LOGISTIC REGRESSION, BAYESIAN LOGISTIC AND CLASSIFICATION TREE TO IDENTIFY THE SIGNIFICANT FACTORS INFLUENCING CRASH SEVERITY

    Directory of Open Access Journals (Sweden)

    MILAD TAZIK

    2017-11-01

    Full Text Available Identifying cases in which road crashes result in fatality or injury of drivers may help improve their safety. In this study, datasets of crashes happened in TehranQom freeway, Iran, were examined by three models (multiple logistic regression, Bayesian logistic and classification tree to analyse the contribution of several variables to fatal accidents. For multiple logistic regression and Bayesian logistic models, the odds ratio was calculated for each variable. The model which best suited the identification of accident severity was determined based on AIC and DIC criteria. Based on the results of these two models, rollover crashes (OR = 14.58, %95 CI: 6.8-28.6, not using of seat belt (OR = 5.79, %95 CI: 3.1-9.9, exceeding speed limits (OR = 4.02, %95 CI: 1.8-7.9 and being female (OR = 2.91, %95 CI: 1.1-6.1 were the most important factors in fatalities of drivers. In addition, the results of the classification tree model have verified the findings of the other models.

  3. Comparative analysis of tree classification models for detecting fusarium oxysporum f. sp cubense (TR4) based on multi soil sensor parameters

    Science.gov (United States)

    Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica

    2017-09-01

    Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.

  4. Classification tree for the assessment of sedentary lifestyle among hypertensive.

    Science.gov (United States)

    Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves Dos Santos, Naftale

    2016-04-01

    To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection). The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.

  5. Classification tree for the assessment of sedentary lifestyle among hypertensive

    Directory of Open Access Journals (Sweden)

    Larissa Castelo Guedes Martins

    Full Text Available Objective.To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL in people with high blood pressure (HTN. Methods. A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection. Results. The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Conclusion. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.

  6. A fuzzy decision tree method for fault classification in the steam generator of a pressurized water reactor

    International Nuclear Information System (INIS)

    Zio, Enrico; Baraldi, Piero; Popescu, Irina Crenguta

    2009-01-01

    This paper extends a method previously introduced by the authors for building a transparent fault classification algorithm by combining the fuzzy clustering, fuzzy logic and decision trees techniques. The baseline method transforms an opaque, fuzzy clustering-based classification model into a fuzzy logic inference model based on linguistic rules which can be represented by a decision tree formalism. The classification model thereby obtained is transparent in that it allows direct interpretation and inspection of the model. An extension in the procedure for the development of the fuzzy logic inference model is introduced to allow the treatment of more complicated cases, e.g. splitted and overlapping clusters. The corresponding computational tool developed relies on a number of parameters which can be tuned by the user to optimally compromise the level of transparency of the classification process and its efficiency. A numerical application is presented with regards to the fault classification in the Steam Generator of a Pressurized Water Reactor.

  7. Multiple Additive Regression Trees a Methodology for Predictive Data Mining for Fraud Detection

    National Research Council Canada - National Science Library

    da

    2002-01-01

    ...) is using new and innovative techniques for fraud detection. Their primary techniques for fraud detection are the data mining tools of classification trees and neural networks as well as methods for pooling the results of multiple model fits...

  8. Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data

    Directory of Open Access Journals (Sweden)

    Sarah J. Graves

    2016-02-01

    Full Text Available Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350–2500 nm of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation

  9. Classification tree for the assessment of sedentary lifestyle among hypertensive

    OpenAIRE

    Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves dos Santos, Naftale

    2016-01-01

    Objective.To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). Methods. A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examinat...

  10. Temporal expansion of annual crop classification layers for the CONUS using the C5 decision tree classifier

    Science.gov (United States)

    Friesz, Aaron M.; Wylie, Bruce K.; Howard, Daniel M.

    2017-01-01

    Crop cover maps have become widely used in a range of research applications. Multiple crop cover maps have been developed to suite particular research interests. The National Agricultural Statistics Service (NASS) Cropland Data Layers (CDL) are a series of commonly used crop cover maps for the conterminous United States (CONUS) that span from 2008 to 2013. In this investigation, we sought to contribute to the availability of consistent CONUS crop cover maps by extending temporal coverage of the NASS CDL archive back eight additional years to 2000 by creating annual NASS CDL-like crop cover maps derived from a classification tree model algorithm. We used over 11 million records to train a classification tree algorithm and develop a crop classification model (CCM). The model was used to create crop cover maps for the CONUS for years 2000–2013 at 250 m spatial resolution. The CCM and the maps for years 2008–2013 were assessed for accuracy relative to resampled NASS CDLs. The CCM performed well against a withheld test data set with a model prediction accuracy of over 90%. The assessment of the crop cover maps indicated that the model performed well spatially, placing crop cover pixels within their known domains; however, the model did show a bias towards the ‘Other’ crop cover class, which caused frequent misclassifications of pixels around the periphery of large crop cover patch clusters and of pixels that form small, sparsely dispersed crop cover patches.

  11. Multi-test decision tree and its application to microarray data classification.

    Science.gov (United States)

    Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek

    2014-05-01

    The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.

  12. Multivariate decision tree designing for the classification of multi-jet topologies in e sup + e sup - collisions

    CERN Document Server

    Mjahed, M

    2002-01-01

    The binary decision tree method is used to separate between several multi-jet topologies in e sup + e sup - collisions. Instead of the univariate process usually taken, a new design procedure for constructing multivariate decision trees is proposed. The segmentation is obtained by considering some features functions, where linear and non-linear discriminant functions and a minimal distance method are used. The classification focuses on ALEPH simulated events, with multi-jet topologies. Compared to a standard univariate tree, the multivariate decision trees offer significantly better performance.

  13. Tree Species Classification in Temperate Forests Using Formosat-2 Satellite Image Time Series

    Directory of Open Access Journals (Sweden)

    David Sheeren

    2016-09-01

    Full Text Available Mapping forest composition is a major concern for forest management, biodiversity assessment and for understanding the potential impacts of climate change on tree species distribution. In this study, the suitability of a dense high spatial resolution multispectral Formosat-2 satellite image time-series (SITS to discriminate tree species in temperate forests is investigated. Based on a 17-date SITS acquired across one year, thirteen major tree species (8 broadleaves and 5 conifers are classified in a study area of southwest France. The performance of parametric (GMM and nonparametric (k-NN, RF, SVM methods are compared at three class hierarchy levels for different versions of the SITS: (i a smoothed noise-free version based on the Whittaker smoother; (ii a non-smoothed cloudy version including all the dates; (iii a non-smoothed noise-free version including only 14 dates. Noise refers to pixels contaminated by clouds and cloud shadows. The results of the 108 distinct classifications show a very high suitability of the SITS to identify the forest tree species based on phenological differences (average κ = 0 . 93 estimated by cross-validation based on 1235 field-collected plots. SVM is found to be the best classifier with very close results from the other classifiers. No clear benefit of removing noise by smoothing can be observed. Classification accuracy is even improved using the non-smoothed cloudy version of the SITS compared to the 14 cloud-free image time series. However conclusions of the results need to be considered with caution because of possible overfitting. Disagreements also appear between the maps produced by the classifiers for complex mixed forests, suggesting a higher classification uncertainty in these contexts. Our findings suggest that time-series data can be a good alternative to hyperspectral data for mapping forest types. It also demonstrates the potential contribution of the recently launched Sentinel-2 satellite for

  14. Modeling Ecosystem Services for Park Trees: Sensitivity of i-Tree Eco Simulations to Light Exposure and Tree Species Classification

    Directory of Open Access Journals (Sweden)

    Rocco Pace

    2018-02-01

    Full Text Available Ecosystem modeling can help decision making regarding planting of urban trees for climate change mitigation and air pollution reduction. Algorithms and models that link the properties of plant functional types, species groups, or single species to their impact on specific ecosystem services have been developed. However, these models require a considerable effort for initialization that is inherently related to uncertainties originating from the high diversity of plant species in urban areas. We therefore suggest a new automated method to be used with the i-Tree Eco model to derive light competition for individual trees and investigate the importance of this property. Since competition depends also on the species, which is difficult to determine from increasingly used remote sensing methodologies, we also investigate the impact of uncertain tree species classification on the ecosystem services by comparing a species-specific inventory determined by field observation with a genus-specific categorization and a model initialization for the dominant deciduous and evergreen species only. Our results show how the simulation of competition affects the determination of carbon sequestration, leaf area, and related ecosystem services and that the proposed method provides a tool for improving estimations. Misclassifications of tree species can lead to large deviations in estimates of ecosystem impacts, particularly concerning biogenic volatile compound emissions. In our test case, monoterpene emissions almost doubled and isoprene emissions decreased to less than 10% when species were estimated to belong only to either two groups instead of being determined by species or genus. It is discussed that this uncertainty of emission estimates propagates further uncertainty in the estimation of potential ozone formation. Overall, we show the importance of using an individual light competition approach and explicitly parameterizing all ecosystem functions at the

  15. Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery.

    Science.gov (United States)

    Montella, Alfonso; Aria, Massimo; D'Ambrosio, Antonio; Mauriello, Filomena

    2012-11-01

    Aim of the study was the analysis of powered two-wheeler (PTW) crashes in Italy in order to detect interdependence as well as dissimilarities among crash characteristics and provide insights for the development of safety improvement strategies focused on PTWs. At this aim, data mining techniques were used to analyze the data relative to the 254,575 crashes involving PTWs occurred in Italy in the period 2006-2008. Classification trees analysis and rules discovery were performed. Tree-based methods are non-linear and non-parametric data mining tools for supervised classification and regression problems. They do not require a priori probabilistic knowledge about the phenomena under studying and consider conditional interactions among input data. Rules discovery is the identification of sets of items (i.e., crash patterns) that occur together in a given event (i.e., a crash in our study) more often than they would if they were independent of each other. Thus, the method can detect interdependence among crash characteristics. Due to the large number of patterns considered, both methods suffer from an extreme risk of finding patterns that appear due to chance alone. To overcome this problem, in our study we randomly split the sample data in two data sets and used well-established statistical practices to evaluate the statistical significance of the results. Both the classification trees and the rules discovery were effective in providing meaningful insights about PTW crash characteristics and their interdependencies. Even though in several cases different crash characteristics were highlighted, the results of the two the analysis methods were never contradictory. Furthermore, most of the findings of this study were consistent with the results of previous studies which used different analytical techniques, such as probabilistic models of crash injury severity. Basing on the analysis results, engineering countermeasures and policy initiatives to reduce PTW injuries and

  16. OmniGA: Optimized Omnivariate Decision Trees for Generalizable Classification Models

    KAUST Repository

    Magana-Mora, Arturo

    2017-06-14

    Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.

  17. OmniGA: Optimized Omnivariate Decision Trees for Generalizable Classification Models

    KAUST Repository

    Magana-Mora, Arturo; Bajic, Vladimir B.

    2017-01-01

    Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.

  18. Automated method for identification and artery-venous classification of vessel trees in retinal vessel networks.

    Science.gov (United States)

    Joshi, Vinayak S; Reinhardt, Joseph M; Garvin, Mona K; Abramoff, Michael D

    2014-01-01

    The separation of the retinal vessel network into distinct arterial and venous vessel trees is of high interest. We propose an automated method for identification and separation of retinal vessel trees in a retinal color image by converting a vessel segmentation image into a vessel segment map and identifying the individual vessel trees by graph search. Orientation, width, and intensity of each vessel segment are utilized to find the optimal graph of vessel segments. The separated vessel trees are labeled as primary vessel or branches. We utilize the separated vessel trees for arterial-venous (AV) classification, based on the color properties of the vessels in each tree graph. We applied our approach to a dataset of 50 fundus images from 50 subjects. The proposed method resulted in an accuracy of 91.44% correctly classified vessel pixels as either artery or vein. The accuracy of correctly classified major vessel segments was 96.42%.

  19. TREE SPECIES CLASSIFICATION OF BROADLEAVED FORESTS IN NAGANO, CENTRAL JAPAN, USING AIRBORNE LASER DATA AND MULTISPECTRAL IMAGES

    Directory of Open Access Journals (Sweden)

    S. Deng

    2017-10-01

    Full Text Available This study attempted to classify three coniferous and ten broadleaved tree species by combining airborne laser scanning (ALS data and multispectral images. The study area, located in Nagano, central Japan, is within the broadleaved forests of the Afan Woodland area. A total of 235 trees were surveyed in 2016, and we recorded the species, DBH, and tree height. The geographical position of each tree was collected using a Global Navigation Satellite System (GNSS device. Tree crowns were manually detected using GNSS position data, field photographs, true-color orthoimages with three bands (red-green-blue, RGB, 3D point clouds, and a canopy height model derived from ALS data. Then a total of 69 features, including 27 image-based and 42 point-based features, were extracted from the RGB images and the ALS data to classify tree species. Finally, the detected tree crowns were classified into two classes for the first level (coniferous and broadleaved trees, four classes for the second level (Pinus densiflora, Larix kaempferi, Cryptomeria japonica, and broadleaved trees, and 13 classes for the third level (three coniferous and ten broadleaved species, using the 27 image-based features, 42 point-based features, all 69 features, and the best combination of features identified using a neighborhood component analysis algorithm, respectively. The overall classification accuracies reached 90 % at the first and second levels but less than 60 % at the third level. The classifications using the best combinations of features had higher accuracies than those using the image-based and point-based features and the combination of all of the 69 features.

  20. Building classification trees to explain the radioactive contamination levels of the plants; Construction d'arbres de discrimination pour expliquer les niveaux de contamination radioactive des vegetaux

    Energy Technology Data Exchange (ETDEWEB)

    Briand, B

    2008-04-15

    The objective of this thesis is the development of a method allowing the identification of factors leading to various radioactive contamination levels of the plants. The methodology suggested is based on the use of a radioecological transfer model of the radionuclides through the environment (A.S.T.R.A.L. computer code) and a classification-tree method. Particularly, to avoid the instability problems of classification trees and to preserve the tree structure, a node level stabilizing technique is used. Empirical comparisons are carried out between classification trees built by this method (called R.E.N. method) and those obtained by the C.A.R.T. method. A similarity measure is defined to compare the structure of two classification trees. This measure is used to study the stabilizing performance of the R.E.N. method. The methodology suggested is applied to a simplified contamination scenario. By the results obtained, we can identify the main variables responsible of the various radioactive contamination levels of four leafy-vegetables (lettuce, cabbage, spinach and leek). Some extracted rules from these classification trees can be usable in a post-accidental context. (author)

  1. 1-Skeletons of the Spanning Tree Problems with Additional Constraints

    Directory of Open Access Journals (Sweden)

    V. A. Bondarenko

    2015-01-01

    Full Text Available In this paper, we study polyhedral properties of two spanning tree problems with additional constraints. In the first problem, it is required to find a tree with a minimum sum of edge weights among all spanning trees with the number of leaves less than or equal to a given value. In the second problem, an additional constraint is the assumption that the degree of all nodes of the spanning tree does not exceed a given value. The recognition versions of both problems are NP-complete. We consider polytopes of these problems and their 1-skeletons. We prove that in both cases it is a NP-complete problem to determine whether the vertices of 1-skeleton are adjacent. Although it is possible to obtain a superpolynomial lower bounds on the clique numbers of these graphs. These values characterize the time complexity in a broad class of algorithms based on linear comparisons. The results indicate a fundamental difference between combinatorial and geometric properties of the considered problems from the classical minimum spanning tree problem.

  2. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    Science.gov (United States)

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.

  3. Using spectrotemporal indices to improve the fruit-tree crop classification accuracy

    Science.gov (United States)

    Peña, M. A.; Liao, R.; Brenning, A.

    2017-06-01

    This study assesses the potential of spectrotemporal indices derived from satellite image time series (SITS) to improve the classification accuracy of fruit-tree crops. Six major fruit-tree crop types in the Aconcagua Valley, Chile, were classified by applying various linear discriminant analysis (LDA) techniques on a Landsat-8 time series of nine images corresponding to the 2014-15 growing season. As features we not only used the complete spectral resolution of the SITS, but also all possible normalized difference indices (NDIs) that can be constructed from any two bands of the time series, a novel approach to derive features from SITS. Due to the high dimensionality of this "enhanced" feature set we used the lasso and ridge penalized variants of LDA (PLDA). Although classification accuracies yielded by the standard LDA applied on the full-band SITS were good (misclassification error rate, MER = 0.13), they were further improved by 23% (MER = 0.10) with ridge PLDA using the enhanced feature set. The most important bands to discriminate the crops of interest were mainly concentrated on the first two image dates of the time series, corresponding to the crops' greenup stage. Despite the high predictor weights provided by the red and near infrared bands, typically used to construct greenness spectral indices, other spectral regions were also found important for the discrimination, such as the shortwave infrared band at 2.11-2.19 μm, sensitive to foliar water changes. These findings support the usefulness of spectrotemporal indices in the context of SITS-based crop type classifications, which until now have been mainly constructed by the arithmetic combination of two bands of the same image date in order to derive greenness temporal profiles like those from the normalized difference vegetation index.

  4. Prediction of survival to discharge following cardiopulmonary resuscitation using classification and regression trees.

    Science.gov (United States)

    Ebell, Mark H; Afonso, Anna M; Geocadin, Romergryko G

    2013-12-01

    To predict the likelihood that an inpatient who experiences cardiopulmonary arrest and undergoes cardiopulmonary resuscitation survives to discharge with good neurologic function or with mild deficits (Cerebral Performance Category score = 1). Classification and Regression Trees were used to develop branching algorithms that optimize the ability of a series of tests to correctly classify patients into two or more groups. Data from 2007 to 2008 (n = 38,092) were used to develop candidate Classification and Regression Trees models to predict the outcome of inpatient cardiopulmonary resuscitation episodes and data from 2009 (n = 14,435) to evaluate the accuracy of the models and judge the degree of over fitting. Both supervised and unsupervised approaches to model development were used. 366 hospitals participating in the Get With the Guidelines-Resuscitation registry. Adult inpatients experiencing an index episode of cardiopulmonary arrest and undergoing cardiopulmonary resuscitation in the hospital. The five candidate models had between 8 and 21 nodes and an area under the receiver operating characteristic curve from 0.718 to 0.766 in the derivation group and from 0.683 to 0.746 in the validation group. One of the supervised models had 14 nodes and classified 27.9% of patients as very unlikely to survive neurologically intact or with mild deficits (Tree models that predict survival to discharge with good neurologic function or with mild deficits following in-hospital cardiopulmonary arrest. Models like this can assist physicians and patients who are considering do-not-resuscitate orders.

  5. A classification model of Hyperion image base on SAM combined decision tree

    Science.gov (United States)

    Wang, Zhenghai; Hu, Guangdao; Zhou, YongZhang; Liu, Xin

    2009-10-01

    Monitoring the Earth using imaging spectrometers has necessitated more accurate analyses and new applications to remote sensing. A very high dimensional input space requires an exponentially large amount of data to adequately and reliably represent the classes in that space. On the other hand, with increase in the input dimensionality the hypothesis space grows exponentially, which makes the classification performance highly unreliable. Traditional classification algorithms Classification of hyperspectral images is challenging. New algorithms have to be developed for hyperspectral data classification. The Spectral Angle Mapper (SAM) is a physically-based spectral classification that uses an ndimensional angle to match pixels to reference spectra. The algorithm determines the spectral similarity between two spectra by calculating the angle between the spectra, treating them as vectors in a space with dimensionality equal to the number of bands. The key and difficulty is that we should artificial defining the threshold of SAM. The classification precision depends on the rationality of the threshold of SAM. In order to resolve this problem, this paper proposes a new automatic classification model of remote sensing image using SAM combined with decision tree. It can automatic choose the appropriate threshold of SAM and improve the classify precision of SAM base on the analyze of field spectrum. The test area located in Heqing Yunnan was imaged by EO_1 Hyperion imaging spectrometer using 224 bands in visual and near infrared. The area included limestone areas, rock fields, soil and forests. The area was classified into four different vegetation and soil types. The results show that this method choose the appropriate threshold of SAM and eliminates the disturbance and influence of unwanted objects effectively, so as to improve the classification precision. Compared with the likelihood classification by field survey data, the classification precision of this model

  6. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques

    Directory of Open Access Journals (Sweden)

    Muhammad Bilal

    2016-07-01

    Full Text Available Sentiment mining is a field of text mining to determine the attitude of people about a particular product, topic, politician in newsgroup posts, review sites, comments on facebook posts twitter, etc. There are many issues involved in opinion mining. One important issue is that opinions could be in different languages (English, Urdu, Arabic, etc.. To tackle each language according to its orientation is a challenging task. Most of the research work in sentiment mining has been done in English language. Currently, limited research is being carried out on sentiment classification of other languages like Arabic, Italian, Urdu and Hindi. In this paper, three classification models are used for text classification using Waikato Environment for Knowledge Analysis (WEKA. Opinions written in Roman-Urdu and English are extracted from a blog. These extracted opinions are documented in text files to prepare a training dataset containing 150 positive and 150 negative opinions, as labeled examples. Testing data set is supplied to three different models and the results in each case are analyzed. The results show that Naïve Bayesian outperformed Decision Tree and KNN in terms of more accuracy, precision, recall and F-measure.

  7. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  8. ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES

    Directory of Open Access Journals (Sweden)

    SAID BAHASSINE

    2017-06-01

    Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.

  9. Classification and Progression Based on CFS-GA and C5.0 Boost Decision Tree of TCM Zheng in Chronic Hepatitis B.

    Science.gov (United States)

    Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang

    2013-01-01

    Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.

  10. Examination as to the classification of representative tree species at Satoyama coppice forest using multiwavelength range data observed from aircraft

    International Nuclear Information System (INIS)

    Setojima, M.; Imai, Y.; Funahashi, M.; Kawai, M.; Katsuki, T.

    2006-01-01

    In this study, we examined the possibility of classifying representative tree species at Satoyama coppice forest based on spectral reflectance of the tree species. We used the airborne hyperspectral data observed in exhibition leaf stage at the test forest (about 3.4ha) in Tama Forest Science Garden (Hachioji, Tokyo) , where the forest type similar to that of Satoyama is preserved. The classification accuracy was verified by comparing the results of interpretation of color aerial photographs taken in spring and autumn in chronological order and the field survey. As a result, the 534-556 nm (band 6 and band 7) in the visible range and 739-762 nm (band 15 and band 16), 785nm (band 17) in the near infrared range are effective bands for classification of the species of such trees as Castanopsis sieboldii, Quercus glauca, Zelkova serrata, Quercus serrata, Cryptomeria japonica, and Chamaecyparis obutusa, which are representative trees in Satoyama coppice forest in Tama district

  11. The Hybrid of Classification Tree and Extreme Learning Machine for Permeability Prediction in Oil Reservoir

    KAUST Repository

    Prasetyo Utomo, Chandra

    2011-06-01

    Permeability is an important parameter connected with oil reservoir. Predicting the permeability could save millions of dollars. Unfortunately, petroleum engineers have faced numerous challenges arriving at cost-efficient predictions. Much work has been carried out to solve this problem. The main challenge is to handle the high range of permeability in each reservoir. For about a hundred year, mathematicians and engineers have tried to deliver best prediction models. However, none of them have produced satisfying results. In the last two decades, artificial intelligence models have been used. The current best prediction model in permeability prediction is extreme learning machine (ELM). It produces fairly good results but a clear explanation of the model is hard to come by because it is so complex. The aim of this research is to propose a way out of this complexity through the design of a hybrid intelligent model. In this proposal, the system combines classification and regression models to predict the permeability value. These are based on the well logs data. In order to handle the high range of the permeability value, a classification tree is utilized. A benefit of this innovation is that the tree represents knowledge in a clear and succinct fashion and thereby avoids the complexity of all previous models. Finally, it is important to note that the ELM is used as a final predictor. Results demonstrate that this proposed hybrid model performs better when compared with support vector machines (SVM) and ELM in term of correlation coefficient. Moreover, the classification tree model potentially leads to better communication among petroleum engineers concerning this important process and has wider implications for oil reservoir management efficiency.

  12. Prognostic classification index in Iranian colorectal cancer patients: Survival tree analysis

    Directory of Open Access Journals (Sweden)

    Amal Saki Malehi

    2016-01-01

    Full Text Available Aims: The aim of this study was to determine the prognostic index for separating homogenous subgroups in colorectal cancer (CRC patients based on clinicopathological characteristics using survival tree analysis. Methods: The current study was conducted at the Research Center of Gastroenterology and Liver Disease, Shahid Beheshti Medical University in Tehran, between January 2004 and January 2009. A total of 739 patients who already have been diagnosed with CRC based on pathologic report were enrolled. The data included demographic and clinical-pathological characteristic of patients. Tree-structured survival analysis based on a recursive partitioning algorithm was implemented to evaluate prognostic factors. The probability curves were calculated according to the Kaplan-Meier method, and the hazard ratio was estimated as an interest effect size. Result: There were 526 males (71.2% of these patients. The mean survival time (from diagnosis time was 42.46± (3.4. Survival tree identified three variables as main prognostic factors and based on their four prognostic subgroups was constructed. The log-rank test showed good separation of survival curves. Patients with Stage I-IIIA and treated with surgery as the first treatment showed low risk (median = 34 months whereas patients with stage IIIB, IV, and more than 68 years have the worse survival outcome (median = 9.5 months. Conclusion: Constructing the prognostic classification index via survival tree can aid the researchers to assess interaction between clinical variables and determining the cumulative effect of these variables on survival outcome.

  13. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    CSIR Research Space (South Africa)

    Adelabu, S

    2013-11-01

    Full Text Available in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples. We performed...

  14. Classification of soil respiration in areas of sugarcane renewal using decision tree

    Directory of Open Access Journals (Sweden)

    Camila Viana Vieira Farhate

    Full Text Available ABSTRACT: The use of data mining is a promising alternative to predict soil respiration from correlated variables. Our objective was to build a model using variable selection and decision tree induction to predict different levels of soil respiration, taking into account physical, chemical and microbiological variables of soil as well as precipitation in renewal of sugarcane areas. The original dataset was composed of 19 variables (18 independent variables and one dependent (or response variable. The variable-target refers to soil respiration as the target classification. Due to a large number of variables, a procedure for variable selection was conducted to remove those with low correlation with the variable-target. For that purpose, four approaches of variable selection were evaluated: no variable selection, correlation-based feature selection (CFS, chisquare method (χ2 and Wrapper. To classify soil respiration, we used the decision tree induction technique available in the Weka software package. Our results showed that data mining techniques allow the development of a model for soil respiration classification with accuracy of 81 %, resulting in a knowledge base composed of 27 rules for prediction of soil respiration. In particular, the wrapper method for variable selection identified a subset of only five variables out of 18 available in the original dataset, and they had the following order of influence in determining soil respiration: soil temperature > precipitation > macroporosity > soil moisture > potential acidity.

  15. Identification of pests and diseases of Dalbergia hainanensis based on EVI time series and classification of decision tree

    Science.gov (United States)

    Luo, Qiu; Xin, Wu; Qiming, Xiong

    2017-06-01

    In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.

  16. Automated morphological analysis of bone marrow cells in microscopic images for diagnosis of leukemia: nucleus-plasma separation and cell classification using a hierarchical tree model of hematopoesis

    Science.gov (United States)

    Krappe, Sebastian; Wittenberg, Thomas; Haferlach, Torsten; Münzenmayer, Christian

    2016-03-01

    The morphological differentiation of bone marrow is fundamental for the diagnosis of leukemia. Currently, the counting and classification of the different types of bone marrow cells is done manually under the use of bright field microscopy. This is a time-consuming, subjective, tedious and error-prone process. Furthermore, repeated examinations of a slide may yield intra- and inter-observer variances. For that reason a computer assisted diagnosis system for bone marrow differentiation is pursued. In this work we focus (a) on a new method for the separation of nucleus and plasma parts and (b) on a knowledge-based hierarchical tree classifier for the differentiation of bone marrow cells in 16 different classes. Classification trees are easily interpretable and understandable and provide a classification together with an explanation. Using classification trees, expert knowledge (i.e. knowledge about similar classes and cell lines in the tree model of hematopoiesis) is integrated in the structure of the tree. The proposed segmentation method is evaluated with more than 10,000 manually segmented cells. For the evaluation of the proposed hierarchical classifier more than 140,000 automatically segmented bone marrow cells are used. Future automated solutions for the morphological analysis of bone marrow smears could potentially apply such an approach for the pre-classification of bone marrow cells and thereby shortening the examination time.

  17. Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis

    Science.gov (United States)

    Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.

    2016-01-01

    In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…

  18. Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer – a classification tree approach

    International Nuclear Information System (INIS)

    Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven

    2006-01-01

    A critical choice facing breast cancer patients is which surgical treatment – mastectomy or breast conserving surgery (BCS) – is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of 'propensity' is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients

  19. Application of classification-tree methods to identify nitrate sources in ground water

    Science.gov (United States)

    Spruill, T.B.; Showers, W.J.; Howe, S.S.

    2002-01-01

    A study was conducted to determine if nitrate sources in ground water (fertilizer on crops, fertilizer on golf courses, irrigation spray from hog (Sus scrofa) wastes, and leachate from poultry litter and septic systems) could be classified with 80% or greater success. Two statistical classification-tree models were devised from 48 water samples containing nitrate from five source categories. Model I was constructed by evaluating 32 variables and selecting four primary predictor variables (??15N, nitrate to ammonia ratio, sodium to potassium ratio, and zinc) to identify nitrate sources. A ??15N value of nitrate plus potassium 18.2 indicated inorganic or soil organic N. A nitrate to ammonia ratio 575 indicated nitrate from golf courses. A sodium to potassium ratio 3.2 indicated spray or poultry wastes. A value for zinc 2.8 indicated poultry wastes. Model 2 was devised by using all variables except ??15N. This model also included four variables (sodium plus potassium, nitrate to ammonia ratio, calcium to magnesium ratio, and sodium to potassium ratio) to distinguish categories. Both models were able to distinguish all five source categories with better than 80% overall success and with 71 to 100% success in individual categories using the learning samples. Seventeen water samples that were not used in model development were tested using Model 2 for three categories, and all were correctly classified. Classification-tree models show great potential in identifying sources of contamination and variables important in the source-identification process.

  20. Comparison of rule induction, decision trees and formal concept analysis approaches for classification

    Science.gov (United States)

    Kotelnikov, E. V.; Milov, V. R.

    2018-05-01

    Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.

  1. First Experience with Sentinel-2 Data for Crop and Tree Species Classifications in Central Europe

    Directory of Open Access Journals (Sweden)

    Markus Immitzer

    2016-02-01

    Full Text Available The study presents the preliminary results of two classification exercises assessing the capabilities of pre-operational (August 2015 Sentinel-2 (S2 data for mapping crop types and tree species. In the first case study, an S2 image was used to map six summer crop species in Lower Austria as well as winter crops/bare soil. Crop type maps are needed to account for crop-specific water use and for agricultural statistics. Crop type information is also useful to parametrize crop growth models for yield estimation, as well as for the retrieval of vegetation biophysical variables using radiative transfer models. The second case study aimed to map seven different deciduous and coniferous tree species in Germany. Detailed information about tree species distribution is important for forest management and to assess potential impacts of climate change. In our S2 data assessment, crop and tree species maps were produced at 10 m spatial resolution by combining the ten S2 spectral channels with 10 and 20 m pixel size. A supervised Random Forest classifier (RF was deployed and trained with appropriate ground truth. In both case studies, S2 data confirmed its expected capabilities to produce reliable land cover maps. Cross-validated overall accuracies ranged between 65% (tree species and 76% (crop types. The study confirmed the high value of the red-edge and shortwave infrared (SWIR bands for vegetation mapping. Also, the blue band was important in both study sites. The S2-bands in the near infrared were amongst the least important channels. The object based image analysis (OBIA and the classical pixel-based classification achieved comparable results, mainly for the cropland. As only single date acquisitions were available for this study, the full potential of S2 data could not be assessed. In the future, the two twin S2 satellites will offer global coverage every five days and therefore permit to concurrently exploit unprecedented spectral and temporal

  2. Remote sensing mapping of macroalgal farms by modifying thresholds in the classification tree

    KAUST Repository

    Zheng, Yuhan

    2018-05-07

    Remote sensing is the main approach used to classify and map aquatic vegetation, and classification tree (CT) analysis is superior to various classification methods. Based on previous studies, modified CT can be developed from traditional CT by adjusting the thresholds based on the statistical relationship between spectral features to classify different images without ground-truth data. However, no studies have yet employed this method to resolve marine vegetation. In this study, three Gao-Fen 1 satellite images obtained with the same sensor on January 30, 2014, November 5, 2014, and January 21, 2015 were selected, and two features were then employed to extract macroalgae from aquaculture farms from the seawater background. Besides, object-based classification and other image analysis methods were adopted to improve the classification accuracy in this study. Results show that the overall accuracies of traditional CTs for three images are 92.0%, 94.2% and 93.9%, respectively, whereas the overall accuracies of the two corresponding modified CTs for images obtained on January 21, 2015 and November 5, 2014 are 93.1% and 89.5%, respectively. This indicates modified CTs can help map macroalgae with multi-date imagery and monitor the spatiotemporal distribution of macroalgae in coastal environments.

  3. Remote sensing mapping of macroalgal farms by modifying thresholds in the classification tree

    KAUST Repository

    Zheng, Yuhan; Duarte, Carlos M.; Chen, Jiang; Li, Dan; Lou, Zhaohan; Wu, Jiaping

    2018-01-01

    Remote sensing is the main approach used to classify and map aquatic vegetation, and classification tree (CT) analysis is superior to various classification methods. Based on previous studies, modified CT can be developed from traditional CT by adjusting the thresholds based on the statistical relationship between spectral features to classify different images without ground-truth data. However, no studies have yet employed this method to resolve marine vegetation. In this study, three Gao-Fen 1 satellite images obtained with the same sensor on January 30, 2014, November 5, 2014, and January 21, 2015 were selected, and two features were then employed to extract macroalgae from aquaculture farms from the seawater background. Besides, object-based classification and other image analysis methods were adopted to improve the classification accuracy in this study. Results show that the overall accuracies of traditional CTs for three images are 92.0%, 94.2% and 93.9%, respectively, whereas the overall accuracies of the two corresponding modified CTs for images obtained on January 21, 2015 and November 5, 2014 are 93.1% and 89.5%, respectively. This indicates modified CTs can help map macroalgae with multi-date imagery and monitor the spatiotemporal distribution of macroalgae in coastal environments.

  4. Effects of sample survey design on the accuracy of classification tree models in species distribution models

    Science.gov (United States)

    Thomas C. Edwards; D. Richard Cutler; Niklaus E. Zimmermann; Linda Geiser; Gretchen G. Moisen

    2006-01-01

    We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by...

  5. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory

    Science.gov (United States)

    Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...

  6. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    Science.gov (United States)

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  7. Malignancy Risk Assessment in Patients with Thyroid Nodules Using Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Shokouh Taghipour Zahir

    2013-01-01

    Full Text Available Purpose. We sought to investigate the utility of classification and regression trees (CART classifier to differentiate benign from malignant nodules in patients referred for thyroid surgery. Methods. Clinical and demographic data of 271 patients referred to the Sadoughi Hospital during 2006–2011 were collected. In a two-step approach, a CART classifier was employed to differentiate patients with a high versus low risk of thyroid malignancy. The first step served as the screening procedure and was tailored to produce as few false negatives as possible. The second step identified those with the lowest risk of malignancy, chosen from a high risk population. Sensitivity, specificity, positive and negative predictive values (PPV and NPV of the optimal tree were calculated. Results. In the first step, age, sex, and nodule size contributed to the optimal tree. Ultrasonographic features were employed in the second step with hypoechogenicity and/or microcalcifications yielding the highest discriminatory ability. The combined tree produced a sensitivity and specificity of 80.0% (95% CI: 29.9–98.9 and 94.1% (95% CI: 78.9–99.0, respectively. NPV and PPV were 66.7% (41.1–85.6 and 97.0% (82.5–99.8, respectively. Conclusion. CART classifier reliably identifies patients with a low risk of malignancy who can avoid unnecessary surgery.

  8. Electronic Nose Odor Classification with Advanced Decision Tree Structures

    Directory of Open Access Journals (Sweden)

    S. Guney

    2013-09-01

    Full Text Available Electronic nose (e-nose is an electronic device which can measure chemical compounds in air and consequently classify different odors. In this paper, an e-nose device consisting of 8 different gas sensors was designed and constructed. Using this device, 104 different experiments involving 11 different odor classes (moth, angelica root, rose, mint, polis, lemon, rotten egg, egg, garlic, grass, and acetone were performed. The main contribution of this paper is the finding that using the chemical domain knowledge it is possible to train an accurate odor classification system. The domain knowledge about chemical compounds is represented by a decision tree whose nodes are composed of classifiers such as Support Vector Machines and k-Nearest Neighbor. The overall accuracy achieved with the proposed algorithm and the constructed e-nose device was 97.18 %. Training and testing data sets used in this paper are published online.

  9. Characterization of Escherichia coli isolates from different fecal sources by means of classification tree analysis of fatty acid methyl ester (FAME) profiles.

    Science.gov (United States)

    Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven

    2006-03-01

    Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.

  10. Automated Detection of Connective Tissue by Tissue Counter Analysis and Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Josef Smolle

    2001-01-01

    Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.

  11. Discriminant forest classification method and system

    Science.gov (United States)

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  12. Use of Binary Partition Tree and energy minimization for object-based classification of urban land cover

    Science.gov (United States)

    Li, Mengmeng; Bijker, Wietske; Stein, Alfred

    2015-04-01

    Two main challenges are faced when classifying urban land cover from very high resolution satellite images: obtaining an optimal image segmentation and distinguishing buildings from other man-made objects. For optimal segmentation, this work proposes a hierarchical representation of an image by means of a Binary Partition Tree (BPT) and an unsupervised evaluation of image segmentations by energy minimization. For building extraction, we apply fuzzy sets to create a fuzzy landscape of shadows which in turn involves a two-step procedure. The first step is a preliminarily image classification at a fine segmentation level to generate vegetation and shadow information. The second step models the directional relationship between building and shadow objects to extract building information at the optimal segmentation level. We conducted the experiments on two datasets of Pléiades images from Wuhan City, China. To demonstrate its performance, the proposed classification is compared at the optimal segmentation level with Maximum Likelihood Classification and Support Vector Machine classification. The results show that the proposed classification produced the highest overall accuracies and kappa coefficients, and the smallest over-classification and under-classification geometric errors. We conclude first that integrating BPT with energy minimization offers an effective means for image segmentation. Second, we conclude that the directional relationship between building and shadow objects represented by a fuzzy landscape is important for building extraction.

  13. An edit script for taxonomic classifications

    Directory of Open Access Journals (Sweden)

    Valiente Gabriel

    2005-08-01

    Full Text Available Abstract Background The NCBI taxonomy provides one of the most powerful ways to navigate sequence data bases but currently users are forced to formulate queries according to a single taxonomic classification. Given that there is not universal agreement on the classification of organisms, providing a single classification places constraints on the questions biologists can ask. However, maintaining multiple classifications is burdensome in the face of a constantly growing NCBI classification. Results In this paper, we present a solution to the problem of generating modifications of the NCBI taxonomy, based on the computation of an edit script that summarises the differences between two classification trees. Our algorithms find the shortest possible edit script based on the identification of all shared subtrees, and only take time quasi linear in the size of the trees because classification trees have unique node labels. Conclusion These algorithms have been recently implemented, and the software is freely available for download from http://darwin.zoology.gla.ac.uk/~rpage/forest/.

  14. A cross-cultural investigation of college student alcohol consumption: a classification tree analysis.

    Science.gov (United States)

    Kitsantas, Panagiota; Kitsantas, Anastasia; Anagnostopoulou, Tanya

    2008-01-01

    In this cross-cultural study, the authors attempted to identify high-risk subgroups for alcohol consumption among college students. American and Greek students (N = 132) answered questions about alcohol consumption, religious beliefs, attitudes toward drinking, advertisement influences, parental monitoring, and drinking consequences. Heavy drinkers in the American group were younger and less religious than were infrequent drinkers. In the Greek group, heavy drinkers tended to deny the negative results of drinking alcohol and use a permissive attitude to justify it, whereas infrequent drinkers were more likely to be monitored by their parents. These results suggest that parental monitoring and an emphasis on informing students about the negative effects of alcohol on their health and social and academic lives may be effective methods of reducing alcohol consumption. Classification tree analysis revealed that student attitudes toward drinking were important in the classification of American and Greek drinkers, indicating that this is a powerful predictor of alcohol consumption regardless of ethnic background.

  15. Tree Colors: Color Schemes for Tree-Structured Data.

    Science.gov (United States)

    Tennekes, Martijn; de Jonge, Edwin

    2014-12-01

    We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations.

  16. Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines.

    Science.gov (United States)

    Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim

    2015-07-30

    Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Univariate decision tree induction using maximum margin classification

    OpenAIRE

    Yıldız, Olcay Taner

    2012-01-01

    In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. In this paper, we propose a new decision tree learning algorithm called univariate margin tree where, for each continuous attribute, the best split is found using convex optimization. Our simulation results on 47 data sets show that the novel margin tree classifier performs at least as good as C4.5 and linear discriminant tree (LDT) with a similar time complexity. F...

  18. Incorporating additional tree and environmental variables in a lodgepole pine stem profile model

    Science.gov (United States)

    John C. Byrne

    1993-01-01

    A new variable-form segmented stem profile model is developed for lodgepole pine (Pinus contorta) trees from the northern Rocky Mountains of the United States. I improved estimates of stem diameter by predicting two of the model coefficients with linear equations using a measure of tree form, defined as a ratio of dbh and total height. Additional improvements were...

  19. Real-time classification of humans versus animals using profiling sensors and hidden Markov tree model

    Science.gov (United States)

    Hossen, Jakir; Jacobs, Eddie L.; Chari, Srikant

    2015-07-01

    Linear pyroelectric array sensors have enabled useful classifications of objects such as humans and animals to be performed with relatively low-cost hardware in border and perimeter security applications. Ongoing research has sought to improve the performance of these sensors through signal processing algorithms. In the research presented here, we introduce the use of hidden Markov tree (HMT) models for object recognition in images generated by linear pyroelectric sensors. HMTs are trained to statistically model the wavelet features of individual objects through an expectation-maximization learning process. Human versus animal classification for a test object is made by evaluating its wavelet features against the trained HMTs using the maximum-likelihood criterion. The classification performance of this approach is compared to two other techniques; a texture, shape, and spectral component features (TSSF) based classifier and a speeded-up robust feature (SURF) classifier. The evaluation indicates that among the three techniques, the wavelet-based HMT model works well, is robust, and has improved classification performance compared to a SURF-based algorithm in equivalent computation time. When compared to the TSSF-based classifier, the HMT model has a slightly degraded performance but almost an order of magnitude improvement in computation time enabling real-time implementation.

  20. Spatial-temporal changes in trees outside forests

    DEFF Research Database (Denmark)

    Novotný, M.; Skaloš, J.; Plieninger, T.

    2017-01-01

    Trees outside forests act as ecologically valuable elements in the rural landscapes of Europe. This study proposes a new classification system for trees outside forest elements based on the shape and size of the patches and their location in fields. Using this system, the study evaluates the spat......Trees outside forests act as ecologically valuable elements in the rural landscapes of Europe. This study proposes a new classification system for trees outside forest elements based on the shape and size of the patches and their location in fields. Using this system, the study evaluates...

  1. Computer-assisted detection of colonic polyps with CT colonography using neural networks and binary classification trees

    International Nuclear Information System (INIS)

    Jerebko, Anna K.; Summers, Ronald M.; Malley, James D.; Franaszek, Marek; Johnson, C. Daniel

    2003-01-01

    Detection of colonic polyps in CT colonography is problematic due to complexities of polyp shape and the surface of the normal colon. Published results indicate the feasibility of computer-aided detection of polyps but better classifiers are needed to improve specificity. In this paper we compare the classification results of two approaches: neural networks and recursive binary trees. As our starting point we collect surface geometry information from three-dimensional reconstruction of the colon, followed by a filter based on selected variables such as region density, Gaussian and average curvature and sphericity. The filter returns sites that are candidate polyps, based on earlier work using detection thresholds, to which the neural nets or the binary trees are applied. A data set of 39 polyps from 3 to 25 mm in size was used in our investigation. For both neural net and binary trees we use tenfold cross-validation to better estimate the true error rates. The backpropagation neural net with one hidden layer trained with Levenberg-Marquardt algorithm achieved the best results: sensitivity 90% and specificity 95% with 16 false positives per study

  2. Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey.

    Science.gov (United States)

    Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T

    2006-08-01

    The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.

  3. First Experience with Sentinel-2 Data for Crop and Tree Species Classifications in Central Europe

    OpenAIRE

    Markus Immitzer; Francesco Vuolo; Clement Atzberger

    2016-01-01

    The study presents the preliminary results of two classification exercises assessing the capabilities of pre-operational (August 2015) Sentinel-2 (S2) data for mapping crop types and tree species. In the first case study, an S2 image was used to map six summer crop species in Lower Austria as well as winter crops/bare soil. Crop type maps are needed to account for crop-specific water use and for agricultural statistics. Crop type information is also useful to parametrize crop growth models fo...

  4. GENERATION OF 2D LAND COVER MAPS FOR URBAN AREAS USING DECISION TREE CLASSIFICATION

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects...... of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes ‘building’ (99%, 95% CI: 95%-100%) and ‘road and parking lot’ (90%, 95% CI: 83%-95%). Some...

  5. Predicting the disease of Alzheimer with SNP biomarkers and clinical data using data mining classification approach: decision tree.

    Science.gov (United States)

    Erdoğan, Onur; Aydin Son, Yeşim

    2014-01-01

    Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.

  6. Classification decision tree in CT imaging: application to the differential diagnosis of solitary pulmonary nodules

    International Nuclear Information System (INIS)

    Ma Hongxia; Guo Yulin; Wang Qiuping; Qiang Yongqian; Liu Min; Guo Xiaojuan; Guo Youmin; Chen Qihang

    2008-01-01

    Objective: To establish classification and regression tree (CART) for differentiating benign from malignant solitary pulmonary nudules (SPN). Methods: One hundred and sixteen consecutive cases with 116 solitary pulmonary nodules, which finally were pathologically proven 54 malignant nodules and 62 benign nodules, were prospectively registered in this research. Twelve clinical presentations and 22 CT findings were collected as predictors. A classification tree was established to distinguish benign SPNs from malignant ones. In the observer test, two groups (one made of junior radiologists and one of senior radiologists) were independently presented with clinical information and CT images without knowing the pathologic and machine-learning results. Performance of observers and CART were compared by receiver operating characteristic analysis. Results: Receiver operating characteristic analysis showed areas under the curve of CART, senior radiologists and junior radiologists respectively were 0.910±0.029, 0.827±0.038, 0.612±0.052. Difference between areas(DBF) between CART and junior radiologists was 0.297(P<0.01). DBF between CART and senior radiologists was 0.083 (P<0.05). DBF between senior and junior radiologists was 0.214 (P<0.01). CART showed a best diagnostic efficiency, followed by junior radiologists, and then senior radiologists. Conclusion: Our data mining techniques using CART prove a high accuracy in differentiating benign from malignant pulmonary nodules based on clinical variables and CT findings. It will be a potentially useful tool in further application of artificial intelligence in the imaging diagnosis. (authors)

  7. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    Science.gov (United States)

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  8. Indexing Density Models for Incremental Learning and Anytime Classification on Data Streams

    DEFF Research Database (Denmark)

    Seidl, Thomas; Assent, Ira; Kranen, Philipp

    2009-01-01

    Classification of streaming data faces three basic challenges: it has to deal with huge amounts of data, the varying time between two stream data items must be used best possible (anytime classification) and additional training data must be incrementally learned (anytime learning) for applying...... to the individual object to be classified) a hierarchy of mixture densities that represent kernel density estimators at successively coarser levels. Our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any...... point of interruption. Moreover, we propose a novel evaluation method for anytime classification using Poisson streams and demonstrate the anytime learning performance of the Bayes tree....

  9. Categorizing ideas about trees: a tree of trees.

    Science.gov (United States)

    Fisler, Marie; Lecointre, Guillaume

    2013-01-01

    The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a "tree of trees." Then, we categorize schools of tree-representations. Classical schools like "cladists" and "pheneticists" are recovered but others are not: "gradists" are separated into two blocks, one of them being called here "grade theoreticians." We propose new interesting categories like the "buffonian school," the "metaphoricians," and those using "strictly genealogical classifications." We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization.

  10. AUTOMATIC TREE-CROWN DETECTION IN CHALLENGING SCENARIOS

    Directory of Open Access Journals (Sweden)

    D. Bulatov

    2016-06-01

    Full Text Available In this paper, a new procedure for individual tree detection and modeling is presented. The input of this procedure consists of a normalized digital surface model NDSM, and a possibly error-prone classification result. The procedure is modular so that the functionality, the advantages and the disadvantages for every single module will be explained. The most important technical contributions of the paper are: Employing watershed transformation combined with classification results, applying hotspots detectors for identifying treetops in groups of trees, and correcting NDSM by detecting and geometric reconstruction of small anomalies, such as earth walls. Two minor contributions are made up by a detailed literature research on available methods for individual tree detection and estimation of tree-crowns for clearly identified trees in order to reduce arbitrariness by assigning trees to one of the few types in the output model.

  11. Optimizing tree-species classification in hyperspectal images

    CSIR Research Space (South Africa)

    Barnard, E

    2010-11-01

    Full Text Available for classification. Scaling of these components so that all features have equal variance is found to be useful, and their best performance (88.9% accurate classification) is achieved with 15 scaled features and a support vector machine as classifier. A graphical...

  12. A new approach to enhance the performance of decision tree for classifying gene expression data.

    Science.gov (United States)

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  13. Assessment and classification of hazardous street trees in University ...

    African Journals Online (AJOL)

    The study was carried out to assessed and classified hazardous trees within the University of Ibadan (UI) campus, Oyo State, Nigeria. The study population was 25 municipal tree species comprising of 420 individual trees located along the major roads of the study area, which were considered hazardous to the community.

  14. Remote sensing of aquatic vegetation distribution in Taihu Lake using an improved classification tree with modified thresholds.

    Science.gov (United States)

    Zhao, Dehua; Jiang, Hao; Yang, Tangwu; Cai, Ying; Xu, Delin; An, Shuqing

    2012-03-01

    Classification trees (CT) have been used successfully in the past to classify aquatic vegetation from spectral indices (SI) obtained from remotely-sensed images. However, applying CT models developed for certain image dates to other time periods within the same year or among different years can reduce the classification accuracy. In this study, we developed CT models with modified thresholds using extreme SI values (CT(m)) to improve the stability of the models when applying them to different time periods. A total of 903 ground-truth samples were obtained in September of 2009 and 2010 and classified as emergent, floating-leaf, or submerged vegetation or other cover types. Classification trees were developed for 2009 (Model-09) and 2010 (Model-10) using field samples and a combination of two images from winter and summer. Overall accuracies of these models were 92.8% and 94.9%, respectively, which confirmed the ability of CT analysis to map aquatic vegetation in Taihu Lake. However, Model-10 had only 58.9-71.6% classification accuracy and 31.1-58.3% agreement (i.e., pixels classified the same in the two maps) for aquatic vegetation when it was applied to image pairs from both a different time period in 2010 and a similar time period in 2009. We developed a method to estimate the effects of extrinsic (EF) and intrinsic (IF) factors on model uncertainty using Modis images. Results indicated that 71.1% of the instability in classification between time periods was due to EF, which might include changes in atmospheric conditions, sun-view angle and water quality. The remainder was due to IF, such as phenological and growth status differences between time periods. The modified version of Model-10 (i.e. CT(m)) performed better than traditional CT with different image dates. When applied to 2009 images, the CT(m) version of Model-10 had very similar thresholds and performance as Model-09, with overall accuracies of 92.8% and 90.5% for Model-09 and the CT(m) version of Model

  15. From Google Maps to a fine-grained catalog of street trees

    Science.gov (United States)

    Branson, Steve; Wegner, Jan Dirk; Hall, David; Lang, Nico; Schindler, Konrad; Perona, Pietro

    2018-01-01

    Up-to-date catalogs of the urban tree population are of importance for municipalities to monitor and improve quality of life in cities. Despite much research on automation of tree mapping, mainly relying on dedicated airborne LiDAR or hyperspectral campaigns, tree detection and species recognition is still mostly done manually in practice. We present a fully automated tree detection and species recognition pipeline that can process thousands of trees within a few hours using publicly available aerial and street view images of Google MapsTM. These data provide rich information from different viewpoints and at different scales from global tree shapes to bark textures. Our work-flow is built around a supervised classification that automatically learns the most discriminative features from thousands of trees and corresponding, publicly available tree inventory data. In addition, we introduce a change tracker that recognizes changes of individual trees at city-scale, which is essential to keep an urban tree inventory up-to-date. The system takes street-level images of the same tree location at two different times and classifies the type of change (e.g., tree has been removed). Drawing on recent advances in computer vision and machine learning, we apply convolutional neural networks (CNN) for all classification tasks. We propose the following pipeline: download all available panoramas and overhead images of an area of interest, detect trees per image and combine multi-view detections in a probabilistic framework, adding prior knowledge; recognize fine-grained species of detected trees. In a later, separate module, track trees over time, detect significant changes and classify the type of change. We believe this is the first work to exploit publicly available image data for city-scale street tree detection, species recognition and change tracking, exhaustively over several square kilometers, respectively many thousands of trees. Experiments in the city of Pasadena

  16. Using classification tree modelling to investigate drug prescription practices at health facilities in rural Tanzania

    Directory of Open Access Journals (Sweden)

    Kajungu Dan K

    2012-09-01

    Full Text Available Abstract Background Drug prescription practices depend on several factors related to the patient, health worker and health facilities. A better understanding of the factors influencing prescription patterns is essential to develop strategies to mitigate the negative consequences associated with poor practices in both the public and private sectors. Methods A cross-sectional study was conducted in rural Tanzania among patients attending health facilities, and health workers. Patients, health workers and health facilities-related factors with the potential to influence drug prescription patterns were used to build a model of key predictors. Standard data mining methodology of classification tree analysis was used to define the importance of the different factors on prescription patterns. Results This analysis included 1,470 patients and 71 health workers practicing in 30 health facilities. Patients were mostly treated in dispensaries. Twenty two variables were used to construct two classification tree models: one for polypharmacy (prescription of ≥3 drugs on a single clinic visit and one for co-prescription of artemether-lumefantrine (AL with antibiotics. The most important predictor of polypharmacy was the diagnosis of several illnesses. Polypharmacy was also associated with little or no supervision of the health workers, administration of AL and private facilities. Co-prescription of AL with antibiotics was more frequent in children under five years of age and the other important predictors were transmission season, mode of diagnosis and the location of the health facility. Conclusion Standard data mining methodology is an easy-to-implement analytical approach that can be useful for decision-making. Polypharmacy is mainly due to the diagnosis of multiple illnesses.

  17. Spatial and Spectral Hybrid Image Classification for Rice Lodging Assessment through UAV Imagery

    Directory of Open Access Journals (Sweden)

    Ming-Der Yang

    2017-06-01

    Full Text Available Rice lodging identification relies on manual in situ assessment and often leads to a compensation dispute in agricultural disaster assessment. Therefore, this study proposes a comprehensive and efficient classification technique for agricultural lands that entails using unmanned aerial vehicle (UAV imagery. In addition to spectral information, digital surface model (DSM and texture information of the images was obtained through image-based modeling and texture analysis. Moreover, single feature probability (SFP values were computed to evaluate the contribution of spectral and spatial hybrid image information to classification accuracy. The SFP results revealed that texture information was beneficial for the classification of rice and water, DSM information was valuable for lodging and tree classification, and the combination of texture and DSM information was helpful in distinguishing between artificial surface and bare land. Furthermore, a decision tree classification model incorporating SFP values yielded optimal results, with an accuracy of 96.17% and a Kappa value of 0.941, compared with that of a maximum likelihood classification model (90.76%. The rice lodging ratio in paddies at the study site was successfully identified, with three paddies being eligible for disaster relief. The study demonstrated that the proposed spatial and spectral hybrid image classification technology is a promising tool for rice lodging assessment.

  18. Multi-pruning of decision trees for knowledge representation and classification

    KAUST Repository

    Azad, Mohammad

    2016-06-09

    We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.

  19. Multi-pruning of decision trees for knowledge representation and classification

    KAUST Repository

    Azad, Mohammad; Chikalov, Igor; Hussain, Shahid; Moshkov, Mikhail

    2016-01-01

    We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART. © 2015 IEEE.

  20. KLASIFIKASI KARAKTERISTIK KECELAKAAN LALU LINTAS DI KOTA DENPASAR DENGAN PENDEKATAN CLASSIFICATION AND REGRESSION TREES (CART

    Directory of Open Access Journals (Sweden)

    I GEDE AGUS JIWADIANA

    2015-11-01

    Full Text Available The aim of this research is to determine the classification characteristics of traffic accidents in Denpasar city in January-July 2014 by using Classification And Regression Trees (CART. Then, for determine the explanatory variables into the main classifier of CART. The result showed that optimum CART generate three terminal node. First terminal node, there are 12 people were classified as heavy traffic accident characteritics with single accident, and second terminal nodes, there are 68 people were classified as minor traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in district road and sub-district road. For third terminal node, there are 291 people were classified as medium traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in municipality road and explanatory variables into the main splitter to make of CART is type of traffic accident with maximum homogeneity measure of 0.03252.

  1. Human action analysis with randomized trees

    CERN Document Server

    Yu, Gang; Liu, Zicheng

    2014-01-01

    This book will provide a comprehensive overview on human action analysis with randomized trees. It will cover both the supervised random trees and the unsupervised random trees. When there are sufficient amount of labeled data available, supervised random trees provides a fast method for space-time interest point matching. When labeled data is minimal as in the case of example-based action search, unsupervised random trees is used to leverage the unlabelled data. We describe how the randomized trees can be used for action classification, action detection, action search, and action prediction.

  2. Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

    Science.gov (United States)

    Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

    2017-10-11

    Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.

  3. Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.

    Science.gov (United States)

    Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid

    2016-10-01

    Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.

  4. Groundwater level prediction of landslide based on classification and regression tree

    Directory of Open Access Journals (Sweden)

    Yannan Zhao

    2016-09-01

    Full Text Available According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree (CART model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15% respectively. To compare the support vector machine (SVM model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.

  5. Proactive data mining with decision trees

    CERN Document Server

    Dahan, Haim; Rokach, Lior; Maimon, Oded

    2014-01-01

    This book explores a proactive and domain-driven method to classification tasks. This novel proactive approach to data mining not only induces a model for predicting or explaining a phenomenon, but also utilizes specific problem/domain knowledge to suggest specific actions to achieve optimal changes in the value of the target attribute. In particular, the authors suggest a specific implementation of the domain-driven proactive approach for classification trees. The book centers on the core idea of moving observations from one branch of the tree to another. It introduces a novel splitting crite

  6. Application of Object Based Classification and High Resolution Satellite Imagery for Savanna Ecosystem Analysis

    Directory of Open Access Journals (Sweden)

    Jane Southworth

    2010-12-01

    Full Text Available Savanna ecosystems are an important component of dryland regions and yet are exceedingly difficult to study using satellite imagery. Savannas are composed are varying amounts of trees, shrubs and grasses and typically traditional classification schemes or vegetation indices cannot differentiate across class type. This research utilizes object based classification (OBC for a region in Namibia, using IKONOS imagery, to help differentiate tree canopies and therefore woodland savanna, from shrub or grasslands. The methodology involved the identification and isolation of tree canopies within the imagery and the creation of tree polygon layers had an overall accuracy of 84%. In addition, the results were scaled up to a corresponding Landsat image of the same region, and the OBC results compared to corresponding pixel values of NDVI. The results were not compelling, indicating once more the problems of these traditional image analysis techniques for savanna ecosystems. Overall, the use of the OBC holds great promise for this ecosystem and could be utilized more frequently in studies of vegetation structure.

  7. Land Cover and Land Use Classification with TWOPAC: towards Automated Processing for Pixel- and Object-Based Image Classification

    Directory of Open Access Journals (Sweden)

    Stefan Dech

    2012-09-01

    Full Text Available We present a novel and innovative automated processing environment for the derivation of land cover (LC and land use (LU information. This processing framework named TWOPAC (TWinned Object and Pixel based Automated classification Chain enables the standardized, independent, user-friendly, and comparable derivation of LC and LU information, with minimized manual classification labor. TWOPAC allows classification of multi-spectral and multi-temporal remote sensing imagery from different sensor types. TWOPAC enables not only pixel-based classification, but also allows classification based on object-based characteristics. Classification is based on a Decision Tree approach (DT for which the well-known C5.0 code has been implemented, which builds decision trees based on the concept of information entropy. TWOPAC enables automatic generation of the decision tree classifier based on a C5.0-retrieved ascii-file, as well as fully automatic validation of the classification output via sample based accuracy assessment.Envisaging the automated generation of standardized land cover products, as well as area-wide classification of large amounts of data in preferably a short processing time, standardized interfaces for process control, Web Processing Services (WPS, as introduced by the Open Geospatial Consortium (OGC, are utilized. TWOPAC’s functionality to process geospatial raster or vector data via web resources (server, network enables TWOPAC’s usability independent of any commercial client or desktop software and allows for large scale data processing on servers. Furthermore, the components of TWOPAC were built-up using open source code components and are implemented as a plug-in for Quantum GIS software for easy handling of the classification process from the user’s perspective.

  8. Forest Tree Species Distribution Mapping Using Landsat Satellite Imagery and Topographic Variables with the Maximum Entropy Method in Mongolia

    Science.gov (United States)

    Hao Chiang, Shou; Valdez, Miguel; Chen, Chi-Farn

    2016-06-01

    Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM) were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface reflectance coupled

  9. FOREST TREE SPECIES DISTRIBUTION MAPPING USING LANDSAT SATELLITE IMAGERY AND TOPOGRAPHIC VARIABLES WITH THE MAXIMUM ENTROPY METHOD IN MONGOLIA

    Directory of Open Access Journals (Sweden)

    S. H. Chiang

    2016-06-01

    Full Text Available Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface

  10. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    Science.gov (United States)

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  11. Synthesis of phylogeny and taxonomy into a comprehensive tree of life

    Science.gov (United States)

    Hinchliff, Cody E.; Smith, Stephen A.; Allman, James F.; Burleigh, J. Gordon; Chaudhary, Ruchi; Coghill, Lyndon M.; Crandall, Keith A.; Deng, Jiabin; Drew, Bryan T.; Gazis, Romina; Gude, Karl; Hibbett, David S.; Katz, Laura A.; Laughinghouse, H. Dail; McTavish, Emily Jane; Midford, Peter E.; Owen, Christopher L.; Ree, Richard H.; Rees, Jonathan A.; Soltis, Douglas E.; Williams, Tiffani; Cranston, Karen A.

    2015-01-01

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966

  12. Phylogenetic classification of bony fishes.

    Science.gov (United States)

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution

  13. SPORT FOOD ADDITIVE CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    I. P. Prokopenko

    2015-01-01

    Full Text Available Correctly organized nutritive and pharmacological support is an important component of an athlete's preparation for competitions, an optimal shape maintenance, fast recovery and rehabilitation after traumas and defatigation. Special products of enhanced biological value (BAS for athletes nutrition are used with this purpose. Easy-to-use energy sources are administered into athlete's organism, yielded materials and biologically active substances which regulate and activate exchange reactions which proceed with difficulties during certain physical trainings. The article presents sport supplements classification which can be used before warm-up and trainings, after trainings and in competitions breaks.

  14. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  15. Spatial and temporal variation of light inside peach trees

    International Nuclear Information System (INIS)

    Genard, M.; Baret, F.

    1994-01-01

    Gap fractions measured with hemispherical photographs were used to describe spatial and temporal variations of diffuse and direct light fractions transmitted to shoots within peach trees. For both cultivars studied, spatial variability of daily diffuse and direct light transmitted to shoots was very high within the tree. Diffuse and daily direct light fractions transmitted to shoots increased with shoot height within the tree and for more erect shoots. Temporal variations of hourly direct light were also large among shoots. Hourly direct light fractions transmitted to shoots were analyzed using recent developments in multivariate exploratory analysis. A gradient was observed between shoots sunlit almost all day and other shoots almost never sunlit. Well sunlit shoots were mostly located at the top of the tree and were more erect. Shoots located in the outer parts of the tree crown were slightly but significantly more sunlit than others for one cultivar. Principal component analysis additionally discriminated shoots according to the time of the day they were sunlit. This classification was related to shoot compass position for one cultivar. Spatial location of the shoot in the tree explained only a small part of light climate variability. Consequences of modeling light climate within the tree are discussed

  16. An overview of decision tree applied to power systems

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Chen, Zhe

    2013-01-01

    The corrosive volume of available data in electric power systems motivate the adoption of data mining techniques in the emerging field of power system data analytics. The mainstream of data mining algorithm applied to power system, Decision Tree (DT), also named as Classification And Regression...... Tree (CART), has gained increasing interests because of its high performance in terms of computational efficiency, uncertainty manageability, and interpretability. This paper presents an overview of a variety of DT applications to power systems for better interfacing of power systems with data...... analytics. The fundamental knowledge of CART algorithm is also introduced which is then followed by examples of both classification tree and regression tree with the help of case study for security assessment of Danish power system....

  17. Stock Picking via Nonsymmetrically Pruned Binary Decision Trees

    OpenAIRE

    Anton Andriyashin

    2008-01-01

    Stock picking is the field of financial analysis that is of particular interest for many professional investors and researchers. In this study stock picking is implemented via binary classification trees. Optimal tree size is believed to be the crucial factor in forecasting performance of the trees. While there exists a standard method of tree pruning, which is based on the cost-complexity tradeoff and used in the majority of studies employing binary decision trees, this paper introduces a no...

  18. On Internet Traffic Classification: A Two-Phased Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Taimur Bakhshi

    2016-01-01

    Full Text Available Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

  19. The Bird Core for Minimum Cost Spanning Tree problems Revisited : Monotonicity and Additivity Aspects

    NARCIS (Netherlands)

    Tijs, S.H.; Moretti, S.; Brânzei, R.; Norde, H.W.

    2005-01-01

    A new way is presented to define for minimum cost spanning tree (mcst-) games the irreducible core, which is introduced by Bird in 1976.The Bird core correspondence turns out to have interesting monotonicity and additivity properties and each stable cost monotonic allocation rule for mcst-problems

  20. Combining evolutionary algorithms with oblique decision trees to detect bent-double galaxies

    Science.gov (United States)

    Cantu-Paz, Erick; Kamath, Chandrika

    2000-10-01

    Decision tress have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis- parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learned is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction wiht deterministic hill-climbing and the use of simulated annealing. In this paper, we use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. We demonstrate our technique on a synthetic data set, and then we apply it to a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology. In addition, we describe our experiences with several split evaluation criteria. Our results suggest that, in some cases, the evolutionary approach is faster and more accurate than existing oblique decision tree algorithms. However, for our astronomical data, the accuracy is not significantly different than the axis-parallel trees.

  1. Decision Tree Technique for Particle Identification

    International Nuclear Information System (INIS)

    Quiller, Ryan

    2003-01-01

    Particle identification based on measurements such as the Cerenkov angle, momentum, and the rate of energy loss per unit distance (-dE/dx) is fundamental to the BaBar detector for particle physics experiments. It is particularly important to separate the charged forms of kaons and pions. Currently, the Neural Net, an algorithm based on mapping input variables to an output variable using hidden variables as intermediaries, is one of the primary tools used for identification. In this study, a decision tree classification technique implemented in the computer program, CART, was investigated and compared to the Neural Net over the range of momenta, 0.25 GeV/c to 5.0 GeV/c. For a given subinterval of momentum, three decision trees were made using different sets of input variables. The sensitivity and specificity were calculated for varying kaon acceptance thresholds. This data was used to plot Receiver Operating Characteristic curves (ROC curves) to compare the performance of the classification methods. Also, input variables used in constructing the decision trees were analyzed. It was found that the Neural Net was a significant contributor to decision trees using dE/dx and the Cerenkov angle as inputs. Furthermore, the Neural Net had poorer performance than the decision tree technique, but tended to improve decision tree performance when used as an input variable. These results suggest that the decision tree technique using Neural Net input may possibly increase accuracy of particle identification in BaBar

  2. Recruiting Conventional Tree Architecture Models into State-of-the-Art LiDAR Mapping for Investigating Tree Growth Habits in Structure.

    Science.gov (United States)

    Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne

    2018-01-01

    Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology-light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics.

  3. The gravity apple tree

    International Nuclear Information System (INIS)

    Aldama, Mariana Espinosa

    2015-01-01

    The gravity apple tree is a genealogical tree of the gravitation theories developed during the past century. The graphic representation is full of information such as guides in heuristic principles, names of main proponents, dates and references for original articles (See under Supplementary Data for the graphic representation). This visual presentation and its particular classification allows a quick synthetic view for a plurality of theories, many of them well validated in the Solar System domain. Its diachronic structure organizes information in a shape of a tree following similarities through a formal concept analysis. It can be used for educational purposes or as a tool for philosophical discussion. (paper)

  4. A renewed perspective on agroforestry concepts and classification.

    Science.gov (United States)

    Torquebiau, E F

    2000-11-01

    Agroforestry, the association of trees with farming practices, is progressively becoming a recognized land-use discipline. However, it is still perceived by some scientists, technicians and farmers as a sort of environmental fashion which does not deserve credit. The peculiar history of agroforestry and the complex relationships between agriculture and forestry explain some misunderstandings about the concepts and classification of agroforestry and reveal that, contrarily to common perception, agroforestry is closer to agriculture than to forestry. Based on field experience from several countries, a structural classification of agroforestry into six simple categories is proposed: crops under tree cover, agroforests, agroforestry in a linear arrangement, animal agroforestry, sequential agroforestry and minor agroforestry techniques. It is argued that this pragmatic classification encompasses all major agroforestry associations and allows simultaneous agroforestry to be clearly differentiated from sequential agroforestry, two categories showing contrasting ecological tree-crop interactions. It can also contribute to a betterment of the image of agroforestry and lead to a simplification of its definition.

  5. An automated approach to the design of decision tree classifiers

    Science.gov (United States)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  6. Classification and Analysis of Computer Network Traffic

    DEFF Research Database (Denmark)

    Bujlow, Tomasz

    2014-01-01

    various classification modes (decision trees, rulesets, boosting, softening thresholds) regarding the classification accuracy and the time required to create the classifier. We showed how to use our VBS tool to obtain per-flow, per-application, and per-content statistics of traffic in computer networks...

  7. Forest tree species discrimination in western Himalaya using EO-1 Hyperion

    Science.gov (United States)

    George, Rajee; Padalia, Hitendra; Kushwaha, S. P. S.

    2014-05-01

    The information acquired in the narrow bands of hyperspectral remote sensing data has potential to capture plant species spectral variability, thereby improving forest tree species mapping. This study assessed the utility of spaceborne EO-1 Hyperion data in discrimination and classification of broadleaved evergreen and conifer forest tree species in western Himalaya. The pre-processing of 242 bands of Hyperion data resulted into 160 noise-free and vertical stripe corrected reflectance bands. Of these, 29 bands were selected through step-wise exclusion of bands (Wilk's Lambda). Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) algorithms were applied to the selected bands to assess their effectiveness in classification. SVM was also applied to broadband data (Landsat TM) to compare the variation in classification accuracy. All commonly occurring six gregarious tree species, viz., white oak, brown oak, chir pine, blue pine, cedar and fir in western Himalaya could be effectively discriminated. SVM produced a better species classification (overall accuracy 82.27%, kappa statistic 0.79) than SAM (overall accuracy 74.68%, kappa statistic 0.70). It was noticed that classification accuracy achieved with Hyperion bands was significantly higher than Landsat TM bands (overall accuracy 69.62%, kappa statistic 0.65). Study demonstrated the potential utility of narrow spectral bands of Hyperion data in discriminating tree species in a hilly terrain.

  8. Classification tree analysis of second neoplasms in survivors of childhood cancer

    International Nuclear Information System (INIS)

    Jazbec, Janez; Todorovski, Ljupčo; Jereb, Berta

    2007-01-01

    Reports on childhood cancer survivors estimated cumulative probability of developing secondary neoplasms vary from 3,3% to 25% at 25 years from diagnosis, and the risk of developing another cancer to several times greater than in the general population. In our retrospective study, we have used the classification tree multivariate method on a group of 849 first cancer survivors, to identify childhood cancer patients with the greatest risk for development of secondary neoplasms. In observed group of patients, 34 develop secondary neoplasm after treatment of primary cancer. Analysis of parameters present at the treatment of first cancer, exposed two groups of patients at the special risk for secondary neoplasm. First are female patients treated for Hodgkin's disease at the age between 10 and 15 years, whose treatment included radiotherapy. Second group at special risk were male patients with acute lymphoblastic leukemia who were treated at the age between 4,6 and 6,6 years of age. The risk groups identified in our study are similar to the results of studies that used more conventional approaches. Usefulness of our approach in study of occurrence of second neoplasms should be confirmed in larger sample study, but user friendly presentation of results makes it attractive for further studies

  9. The importance of chemosensory clues in Aguaruna tree classification and identification.

    Science.gov (United States)

    Jernigan, Kevin A

    2008-05-03

    The ethnobotanical literature still contains few detailed descriptions of the sensory criteria people use for judging membership in taxonomic categories. Olfactory criteria in particular have been explored very little. This paper will describe the importance of odor for woody plant taxonomy and identification among the Aguaruna Jívaro of the northern Peruvian Amazon, focusing on the Aguaruna category númi (trees excluding palms). Aguaruna informants almost always place trees that they consider to have a similar odor together as kumpají - 'companions,' a metaphor they use to describe trees that they consider to be related. The research took place in several Aguaruna communities in the upper Marañón region of the Peruvian Amazon. Structured interview data focus on informant criteria for membership in various folk taxa of trees. Informants were also asked to explain what members of each group of related companions had in common. This paper focuses on odor and taste criteria that came to light during these structured interviews. Botanical voucher specimens were collected, wherever possible. Of the 182 tree folk genera recorded in this study, 51 (28%) were widely considered to possess a distinctive odor. Thirty nine of those (76%) were said to have odors similar to some other tree, while the other 24% had unique odors. Aguaruna informants very rarely described tree odors in non-botanical terms. Taste was used mostly to describe trees with edible fruits. Trees judged to be related were nearly always in the same botanical family. The results of this study illustrate that odor of bark, sap, flowers, fruit and leaves are important clues that help the Aguaruna to judge the relatedness of trees found in their local environment. In contrast, taste appears to play a more limited role. The results suggest a more general ethnobotanical hypothesis that could be tested in other cultural settings: people tend to consider plants with similar odors to be related, but say that

  10. The importance of chemosensory clues in Aguaruna tree classification and identification

    Directory of Open Access Journals (Sweden)

    Jernigan Kevin A

    2008-05-01

    Full Text Available Abstract Background The ethnobotanical literature still contains few detailed descriptions of the sensory criteria people use for judging membership in taxonomic categories. Olfactory criteria in particular have been explored very little. This paper will describe the importance of odor for woody plant taxonomy and identification among the Aguaruna Jívaro of the northern Peruvian Amazon, focusing on the Aguaruna category númi (trees excluding palms. Aguaruna informants almost always place trees that they consider to have a similar odor together as kumpají – 'companions,' a metaphor they use to describe trees that they consider to be related. Methods The research took place in several Aguaruna communities in the upper Marañón region of the Peruvian Amazon. Structured interview data focus on informant criteria for membership in various folk taxa of trees. Informants were also asked to explain what members of each group of related companions had in common. This paper focuses on odor and taste criteria that came to light during these structured interviews. Botanical voucher specimens were collected, wherever possible. Results Of the 182 tree folk genera recorded in this study, 51 (28% were widely considered to possess a distinctive odor. Thirty nine of those (76% were said to have odors similar to some other tree, while the other 24% had unique odors. Aguaruna informants very rarely described tree odors in non-botanical terms. Taste was used mostly to describe trees with edible fruits. Trees judged to be related were nearly always in the same botanical family. Conclusion The results of this study illustrate that odor of bark, sap, flowers, fruit and leaves are important clues that help the Aguaruna to judge the relatedness of trees found in their local environment. In contrast, taste appears to play a more limited role. The results suggest a more general ethnobotanical hypothesis that could be tested in other cultural settings: people tend to

  11. Tree detection in urban regions from aerial imagery and DSM based on local maxima points

    Science.gov (United States)

    Korkmaz, Özgür; Yardımcı ćetin, Yasemin; Yilmaz, Erdal

    2017-05-01

    In this study, we propose an automatic approach for tree detection and classification in registered 3-band aerial images and associated digital surface models (DSM). The tree detection results can be used in 3D city modelling and urban planning. This problem is magnified when trees are in close proximity to each other or other objects such as rooftops in the scenes. This study presents a method for locating individual trees and estimation of crown size based on local maxima from DSM accompanied by color and texture information. For this purpose, segment level classifier trained for 10 classes and classification results are improved by analyzing the class probabilities of neighbour segments. Later, the tree classes under a certain height were eliminated using the Digital Terrain Model (DTM). For the tree classes, local maxima points are obtained and the tree radius estimate is made from the vertical and horizontal height profiles passing through these points. The final tree list containing the centers and radius of the trees is obtained by selecting from the list of tree candidates according to the overlapping and selection parameters. Although the limited number of train sets are used in this study, tree classification and localization results are competitive.

  12. Recruiting Conventional Tree Architecture Models into State-of-the-Art LiDAR Mapping for Investigating Tree Growth Habits in Structure

    Directory of Open Access Journals (Sweden)

    Yi Lin

    2018-02-01

    Full Text Available Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology—light detection and ranging (LiDAR. Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics.

  13. Extensions and applications of ensemble-of-trees methods in machine learning

    Science.gov (United States)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  14. Identifying Domain-General and Domain-Specific Predictors of Low Mathematics Performance: A Classification and Regression Tree Analysis

    Directory of Open Access Journals (Sweden)

    David J. Purpura

    2017-12-01

    Full Text Available Many children struggle to successfully acquire early mathematics skills. Theoretical and empirical evidence has pointed to deficits in domain-specific skills (e.g., non-symbolic mathematics skills or domain-general skills (e.g., executive functioning and language as underlying low mathematical performance. In the current study, we assessed a sample of 113 three- to five-year old preschool children on a battery of domain-specific and domain-general factors in the fall and spring of their preschool year to identify Time 1 (fall factors associated with low performance in mathematics knowledge at Time 2 (spring. We used the exploratory approach of classification and regression tree analyses, a strategy that uses step-wise partitioning to create subgroups from a larger sample using multiple predictors, to identify the factors that were the strongest classifiers of low performance for younger and older preschool children. Results indicated that the most consistent classifier of low mathematics performance at Time 2 was children’s Time 1 mathematical language skills. Further, other distinct classifiers of low performance emerged for younger and older children. These findings suggest that risk classification for low mathematics performance may differ depending on children’s age.

  15. Mapping Urban Tree Canopy Cover Using Fused Airborne LIDAR and Satellite Imagery Data

    Science.gov (United States)

    Parmehr, Ebadat G.; Amati, Marco; Fraser, Clive S.

    2016-06-01

    Urban green spaces, particularly urban trees, play a key role in enhancing the liveability of cities. The availability of accurate and up-to-date maps of tree canopy cover is important for sustainable development of urban green spaces. LiDAR point clouds are widely used for the mapping of buildings and trees, and several LiDAR point cloud classification techniques have been proposed for automatic mapping. However, the effectiveness of point cloud classification techniques for automated tree extraction from LiDAR data can be impacted to the point of failure by the complexity of tree canopy shapes in urban areas. Multispectral imagery, which provides complementary information to LiDAR data, can improve point cloud classification quality. This paper proposes a reliable method for the extraction of tree canopy cover from fused LiDAR point cloud and multispectral satellite imagery data. The proposed method initially associates each LiDAR point with spectral information from the co-registered satellite imagery data. It calculates the normalised difference vegetation index (NDVI) value for each LiDAR point and corrects tree points which have been misclassified as buildings. Then, region growing of tree points, taking the NDVI value into account, is applied. Finally, the LiDAR points classified as tree points are utilised to generate a canopy cover map. The performance of the proposed tree canopy cover mapping method is experimentally evaluated on a data set of airborne LiDAR and WorldView 2 imagery covering a suburb in Melbourne, Australia.

  16. The Tree of Industrial Life

    DEFF Research Database (Denmark)

    Andersen, Esben Sloth

    2002-01-01

    The purpose of this paper is to bring forth an interaction between evolutionary economics and industrial systematics. The suggested solution is to reconstruct the "family tree" of the industries. Such a tree is based on similarities, but it may also reflect the evolutionary history in industries....... For this purpose the paper shows how matrices of input-output coefficients can be transformed into binary characteristics matrices and to distance matrices, and it also discusses the possible evolutionary meaning of this translation. Then these derived matrices are used as inputs to algorithms for the heuristic...... finding of optimal industrial trees. The results are presented as taxonomic trees that can easily be compared with the hierarchical structure of existing systems of industrial classification....

  17. Shopping intention prediction using decision trees

    Directory of Open Access Journals (Sweden)

    Dario Šebalj

    2017-09-01

    Full Text Available Introduction: The price is considered to be neglected marketing mix element due to the complexity of price management and sensitivity of customers on price changes. It pulls the fastest customer reactions to that change. Accordingly, the process of making shopping decisions can be very challenging for customer. Objective: The aim of this paper is to create a model that is able to predict shopping intention and classify respondents into one of the two categories, depending on whether they intend to shop or not. Methods: Data sample consists of 305 respondents, who are persons older than 18 years involved in buying groceries for their household. The research was conducted in February 2017. In order to create a model, the decision trees method was used with its several classification algorithms. Results: All models, except the one that used RandomTree algorithm, achieved relatively high classification rate (over the 80%. The highest classification accuracy of 84.75% gave J48 and RandomForest algorithms. Since there is no statistically significant difference between those two algorithms, authors decided to choose J48 algorithm and build a decision tree. Conclusions: The value for money and price level in the store were the most significant variables for classification of shopping intention. Future study plans to compare this model with some other data mining techniques, such as neural networks or support vector machines since these techniques achieved very good accuracy in some previous research in this field.

  18. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Science.gov (United States)

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for

  19. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Directory of Open Access Journals (Sweden)

    Dawyndt Peter

    2010-01-01

    Full Text Available Abstract Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the

  20. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    Science.gov (United States)

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial

  1. A machine learning approach to galaxy-LSS classification - I. Imprints on halo merger trees

    Science.gov (United States)

    Hui, Jianan; Aragon, Miguel; Cui, Xinping; Flegal, James M.

    2018-04-01

    The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here, we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of 93 per cent. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine (SVM) to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time, and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for SVM.

  2. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    Science.gov (United States)

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  3. Building optimal regression tree by ant colony system-genetic algorithm: Application to modeling of melting points

    Energy Technology Data Exchange (ETDEWEB)

    Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)

    2011-10-17

    Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.

  4. [Analysis of dietary pattern and diabetes mellitus influencing factors identified by classification tree model in adults of Fujian].

    Science.gov (United States)

    Yu, F L; Ye, Y; Yan, Y S

    2017-05-10

    Objective: To find out the dietary patterns and explore the relationship between environmental factors (especially dietary patterns) and diabetes mellitus in the adults of Fujian. Methods: Multi-stage sampling method were used to survey residents aged ≥18 years by questionnaire, physical examination and laboratory detection in 10 disease surveillance points in Fujian. Factor analysis was used to identify the dietary patterns, while logistic regression model was applied to analyze relationship between dietary patterns and diabetes mellitus, and classification tree model was adopted to identify the influencing factors for diabetes mellitus. Results: There were four dietary patterns in the population, including meat, plant, high-quality protein, and fried food and beverages patterns. The result of logistic analysis showed that plant pattern, which has higher factor loading of fresh fruit-vegetables and cereal-tubers, was a protective factor for non-diabetes mellitus. The risk of diabetes mellitus in the population at T2 and T3 levels of factor score were 0.727 (95 %CI: 0.561-0.943) times and 0.736 (95 %CI : 0.573-0.944) times higher, respectively, than those whose factor score was in lowest quartile. Thirteen influencing factors and eleven group at high-risk for diabetes mellitus were identified by classification tree model. The influencing factors were dyslipidemia, age, family history of diabetes, hypertension, physical activity, career, sex, sedentary time, abdominal adiposity, BMI, marital status, sleep time and high-quality protein pattern. Conclusion: There is a close association between dietary patterns and diabetes mellitus. It is necessary to promote healthy and reasonable diet, strengthen the monitoring and control of blood lipids, blood pressure and body weight, and have good lifestyle for the prevention and control of diabetes mellitus.

  5. A Branch-and-Price approach to find optimal decision trees

    NARCIS (Netherlands)

    Firat, M.; Crognier, Guillaume; Gabor, Adriana; Zhang, Y.

    2018-01-01

    In Artificial Intelligence (AI) field, decision trees have gained certain importance due to their effectiveness in solving classification and regression problems. Recently, in the literature we see finding optimal decision trees are formulated as Mixed Integer Linear Programming (MILP) models. This

  6. I - Multivariate Classification and Machine Learning in HEP

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Traditional multivariate methods for classification (Stochastic Gradient Boosted Decision Trees and Multi-Layer Perceptrons) are explained in theory and practise using examples from HEP. General aspects of multivariate classification are discussed, in particular different regularisation techniques. Afterwards, data-driven techniques are introduced and compared to MC-based methods.

  7. Traditional Chinese medicine pharmacovigilance in signal detection: decision tree-based data classification.

    Science.gov (United States)

    Wei, Jian-Xiang; Wang, Jing; Zhu, Yun-Xia; Sun, Jun; Xu, Hou-Ming; Li, Ming

    2018-03-09

    Traditional Chinese Medicine (TCM) is a style of traditional medicine informed by modern medicine but built on a foundation of more than 2500 years of Chinese medical practice. According to statistics, TCM accounts for approximately 14% of total adverse drug reaction (ADR) spontaneous reporting data in China. Because of the complexity of the components in TCM formula, which makes it essentially different from Western medicine, it is critical to determine whether ADR reports of TCM should be analyzed independently. Reports in the Chinese spontaneous reporting database between 2010 and 2011 were selected. The dataset was processed and divided into the total sample (all data) and the subsample (including TCM data only). Four different ADR signal detection methods-PRR, ROR, MHRA and IC- currently widely used in China, were applied for signal detection on the two samples. By comparison of experimental results, three of them-PRR, MHRA and IC-were chosen to do the experiment. We designed several indicators for performance evaluation such as R (recall ratio), P (precision ratio), and D (discrepancy ratio) based on the reference database and then constructed a decision tree for data classification based on such indicators. For PRR: R 1 -R 2  = 0.72%, P 1 -P 2  = 0.16% and D = 0.92%; For MHRA: R 1 -R 2  = 0.97%, P 1 -P 2  = 0.20% and D = 1.18%; For IC: R 1 -R 2  = 1.44%, P 2 -P 1  = 4.06% and D = 4.72%. The threshold of R,Pand Dis set as 2%, 2% and 3% respectively. Based on the decision tree, the results are "separation" for PRR, MHRA and IC. In order to improve the efficiency and accuracy of signal detection, we suggest that TCM data should be separated from the total sample when conducting analyses.

  8. Comparison of Single and Multi-Scale Method for Leaf and Wood Points Classification from Terrestrial Laser Scanning Data

    Science.gov (United States)

    Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie

    2018-04-01

    The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.

  9. Distribution of cavity trees in midwestern old-growth and second-growth forests

    Science.gov (United States)

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  10. Schistosoma mansoni reinfection: Analysis of risk factors by classification and regression tree (CART modeling.

    Directory of Open Access Journals (Sweden)

    Andréa Gazzinelli

    Full Text Available Praziquantel (PZQ is an effective chemotherapy for schistosomiasis mansoni and a mainstay for its control and potential elimination. However, it does not prevent against reinfection, which can occur rapidly in areas with active transmission. A guide to ranking the risk factors for Schistosoma mansoni reinfection would greatly contribute to prioritizing resources and focusing prevention and control measures to prevent rapid reinfection. The objective of the current study was to explore the relationship among the socioeconomic, demographic, and epidemiological factors that can influence reinfection by S. mansoni one year after successful treatment with PZQ in school-aged children in Northeastern Minas Gerais state Brazil. Parasitological, socioeconomic, demographic, and water contact information were surveyed in 506 S. mansoni-infected individuals, aged 6 to 15 years, resident in these endemic areas. Eligible individuals were treated with PZQ until they were determined to be negative by the absence of S. mansoni eggs in the feces on two consecutive days of Kato-Katz fecal thick smear. These individuals were surveyed again 12 months from the date of successful treatment with PZQ. A classification and regression tree modeling (CART was then used to explore the relationship between socioeconomic, demographic, and epidemiological variables and their reinfection status. The most important risk factor identified for S. mansoni reinfection was their "heavy" infection at baseline. Additional analyses, excluding heavy infection status, showed that lower socioeconomic status and a lower level of education of the household head were also most important risk factors for S. mansoni reinfection. Our results provide an important contribution toward the control and possible elimination of schistosomiasis by identifying three major risk factors that can be used for targeted treatment and monitoring of reinfection. We suggest that control measures that target

  11. Classification of tree species based on longwave hyperspectral data from leaves, a case study for a tropical dry forest

    Science.gov (United States)

    Harrison, D.; Rivard, B.; Sánchez-Azofeifa, A.

    2018-04-01

    Remote sensing of the environment has utilized the visible, near and short-wave infrared (IR) regions of the electromagnetic (EM) spectrum to characterize vegetation health, vigor and distribution. However, relatively little research has focused on the use of the longwave infrared (LWIR, 8.0-12.5 μm) region for studies of vegetation. In this study LWIR leaf reflectance spectra were collected in the wet seasons (May through December) of 2013 and 2014 from twenty-six tree species located in a high species diversity environment, a tropical dry forest in Costa Rica. A continuous wavelet transformation (CWT) was applied to all spectra to minimize noise and broad amplitude variations attributable to non-compositional effects. Species discrimination was then explored with Random Forest classification and accuracy improved was observed with preprocessing of reflectance spectra with continuous wavelet transformation. Species were found to share common spectral features that formed the basis for five spectral types that were corroborated with linear discriminate analysis. The source of most of the observed spectral features is attributed to cell wall or cuticle compounds (cellulose, cutin, matrix glycan, silica and oleanolic acid). Spectral types could be advantageous for the analysis of airborne hyperspectral data because cavity effects will lower the spectral contrast thus increasing the reliance of classification efforts on dominant spectral features. Spectral types specifically derived from leaf level data are expected to support the labeling of spectral classes derived from imagery. The results of this study and that of Ribeiro Da Luz (2006), Ribeiro Da Luz and Crowley (2007, 2010), Ullah et al. (2012) and Rock et al. (2016) have now illustrated success in tree species discrimination across a range of ecosystems using leaf-level spectral observations. With advances in LWIR sensors and concurrent improvements in their signal to noise, applications to large-scale species

  12. Vessel-guided airway tree segmentation

    DEFF Research Database (Denmark)

    Lo, Pechin Chien Pau; Sporring, Jon; Ashraf, Haseem

    2010-01-01

    This paper presents a method for airway tree segmentation that uses a combination of a trained airway appearance model, vessel and airway orientation information, and region growing. We propose a voxel classification approach for the appearance model, which uses a classifier that is trained to di...

  13. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.

    Science.gov (United States)

    Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are

  14. An Improved Rotation Forest for Multi-Feature Remote-Sensing Imagery Classification

    Directory of Open Access Journals (Sweden)

    Yingchang Xiu

    2017-11-01

    Full Text Available Multi-feature, especially multi-temporal, remote-sensing data have the potential to improve land cover classification accuracy. However, sometimes it is difficult to utilize all the features efficiently. To enhance classification performance based on multi-feature imagery, an improved rotation forest, combining Principal Component Analysis (PCA and a boosting naïve Bayesian tree (NBTree, is proposed. First, feature extraction was carried out with PCA. The feature set was randomly split into several disjoint subsets; then, PCA was applied to each subset, and new training data for linear extracted features based on original training data were obtained. These steps were repeated several times. Second, based on the new training data, a boosting naïve Bayesian tree was constructed as the base classifier, which aims to achieve lower prediction error than a decision tree in the original rotation forest. At the classification phase, the improved rotation forest has two-layer voting. It first obtains several predictions through weighted voting in a boosting naïve Bayesian tree; then, the first-layer vote predicts by majority to obtain the final result. To examine the classification performance, the improved rotation forest was applied to multi-feature remote-sensing images, including MODIS Enhanced Vegetation Index (EVI imagery time series, MODIS Surface Reflectance products and ancillary data in Shandong Province for 2013. The EVI imagery time series was preprocessed using harmonic analysis of time series (HANTS to reduce the noise effects. The overall accuracy of the final classification result was 89.17%, and the Kappa coefficient was 0.71, which outperforms the original rotation forest and other classifier ensemble results, as well as the NASA land cover product. However, this new algorithm requires more computational time, meaning the efficiency needs to be further improved. Generally, the improved rotation forest has a potential advantage in

  15. Predicting student satisfaction with courses based on log data from a virtual learning environment – a neural network and classification tree model

    Directory of Open Access Journals (Sweden)

    Ivana Đurđević Babić

    2015-03-01

    Full Text Available Student satisfaction with courses in academic institutions is an important issue and is recognized as a form of support in ensuring effective and quality education, as well as enhancing student course experience. This paper investigates whether there is a connection between student satisfaction with courses and log data on student courses in a virtual learning environment. Furthermore, it explores whether a successful classification model for predicting student satisfaction with course can be developed based on course log data and compares the results obtained from implemented methods. The research was conducted at the Faculty of Education in Osijek and included analysis of log data and course satisfaction on a sample of third and fourth year students. Multilayer Perceptron (MLP with different activation functions and Radial Basis Function (RBF neural networks as well as classification tree models were developed, trained and tested in order to classify students into one of two categories of course satisfaction. Type I and type II errors, and input variable importance were used for model comparison and classification accuracy. The results indicate that a successful classification model using tested methods can be created. The MLP model provides the highest average classification accuracy and the lowest preference in misclassification of students with a low level of course satisfaction, although a t-test for the difference in proportions showed that the difference in performance between the compared models is not statistically significant. Student involvement in forum discussions is recognized as a valuable predictor of student satisfaction with courses in all observed models.

  16. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    Science.gov (United States)

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  17. Community assessment of tropical tree biomass

    DEFF Research Database (Denmark)

    Theilade, Ida; Rutishauser, Ervan; Poulsen, Michael K.

    2015-01-01

    Background REDD+ programs rely on accurate forest carbon monitoring. Several REDD+ projects have recently shown that local communities can monitor above ground biomass as well as external professionals, but at lower costs. However, the precision and accuracy of carbon monitoring conducted by local...... communities have rarely been assessed in the tropics. The aim of this study was to investigate different sources of error in tree biomass measurements conducted by community monitors and determine the effect on biomass estimates. Furthermore, we explored the potential of local ecological knowledge to assess...... measurement, with special attention given to large and odd-shaped trees. A better understanding of traditional classification systems and concepts is required for local tree identifications and wood density estimates to become useful in monitoring of biomass and tree diversity....

  18. Modeling and Testing Landslide Hazard Using Decision Tree

    Directory of Open Access Journals (Sweden)

    Mutasem Sh. Alkhasawneh

    2014-01-01

    Full Text Available This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID, Exhaustive CHAID, Classification and Regression Tree (CRT, and Quick-Unbiased-Efficient Statistical Tree (QUEST. Twenty-one factors were extracted using digital elevation models (DEMs and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0% compared to CHAID (81.9%, CRT (75.6%, and QUEST (74.0% model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

  19. Minimum Error Entropy Classification

    CERN Document Server

    Marques de Sá, Joaquim P; Santos, Jorge M F; Alexandre, Luís A

    2013-01-01

    This book explains the minimum error entropy (MEE) concept applied to data classification machines. Theoretical results on the inner workings of the MEE concept, in its application to solving a variety of classification problems, are presented in the wider realm of risk functionals. Researchers and practitioners also find in the book a detailed presentation of practical data classifiers using MEE. These include multi‐layer perceptrons, recurrent neural networks, complexvalued neural networks, modular neural networks, and decision trees. A clustering algorithm using a MEE‐like concept is also presented. Examples, tests, evaluation experiments and comparison with similar machines using classic approaches, complement the descriptions.

  20. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China

    Science.gov (United States)

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (Pbiomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia. PMID:27002822

  1. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China.

    Science.gov (United States)

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (Pbiomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia.

  2. Mapping forest tree species over large areas with partially cloudy Landsat imagery

    Science.gov (United States)

    Turlej, K.; Radeloff, V.

    2017-12-01

    Forests provide numerous services to natural systems and humankind, but which services forest provide depends greatly on their tree species composition. That makes it important to track not only changes in forest extent, something that remote sensing excels in, but also to map tree species. The main goal of our work was to map tree species with Landsat imagery, and to identify how to maximize mapping accuracy by including partially cloudy imagery. Our study area covered one Landsat footprint (26/28) in Northern Wisconsin, USA, with temperate and boreal forests. We selected this area because it contains numerous tree species and variable forest composition providing an ideal study area to test the limits of Landsat data. We quantified how species-level classification accuracy was affected by a) the number of acquisitions, b) the seasonal distribution of observations, and c) the amount of cloud contamination. We classified a single year stack of Landsat-7, and -8 images data with a decision tree algorithm to generate a map of dominant tree species at the pixel- and stand-level. We obtained three important results. First, we achieved producer's accuracies in the range 70-80% and user's accuracies in range 80-90% for the most abundant tree species in our study area. Second, classification accuracy improved with more acquisitions, when observations were available from all seasons, and is the best when images with up to 40% cloud cover are included. Finally, classifications for pure stands were 10 to 30 percentage points better than those for mixed stands. We conclude that including partially cloudy Landsat imagery allows to map forest tree species with accuracies that were previously only possible for rare years with many cloud-free observations. Our approach thus provides important information for both forest management and science.

  3. Study and ranking of determinants of Taenia solium infections by classification tree models.

    Science.gov (United States)

    Mwape, Kabemba E; Phiri, Isaac K; Praet, Nicolas; Dorny, Pierre; Muma, John B; Zulu, Gideon; Speybroeck, Niko; Gabriël, Sarah

    2015-01-01

    Taenia solium taeniasis/cysticercosis is an important public health problem occurring mainly in developing countries. This work aimed to study the determinants of human T. solium infections in the Eastern province of Zambia and rank them in order of importance. A household (HH)-level questionnaire was administered to 680 HHs from 53 villages in two rural districts and the taeniasis and cysticercosis status determined. A classification tree model (CART) was used to define the relative importance and interactions between different predictor variables in their effect on taeniasis and cysticercosis. The Katete study area had a significantly higher taeniasis and cysticercosis prevalence than the Petauke area. The CART analysis for Katete showed that the most important determinant for cysticercosis infections was the number of HH inhabitants (6 to 10) and for taeniasis was the number of HH inhabitants > 6. The most important determinant in Petauke for cysticercosis was the age of head of household > 32 years and for taeniasis it was age taeniasis and cysticercosis infections was the number of HH inhabitants (6 to 10) in Katete district and age in Petauke. The results suggest that control measures should target HHs with a high number of inhabitants and older individuals. © The American Society of Tropical Medicine and Hygiene.

  4. Building an asynchronous web-based tool for machine learning classification.

    Science.gov (United States)

    Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila

    2002-01-01

    Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

  5. 78 FR 68983 - Cotton Futures Classification: Optional Classification Procedure

    Science.gov (United States)

    2013-11-18

    ...-AD33 Cotton Futures Classification: Optional Classification Procedure AGENCY: Agricultural Marketing... regulations to allow for the addition of an optional cotton futures classification procedure--identified and... response to requests from the U.S. cotton industry and ICE, AMS will offer a futures classification option...

  6. Beef Quality Identification Using Thresholding Method and Decision Tree Classification Based on Android Smartphone

    Directory of Open Access Journals (Sweden)

    Kusworo Adi

    2017-01-01

    Full Text Available Beef is one of the animal food products that have high nutrition because it contains carbohydrates, proteins, fats, vitamins, and minerals. Therefore, the quality of beef should be maintained so that consumers get good beef quality. Determination of beef quality is commonly conducted visually by comparing the actual beef and reference pictures of each beef class. This process presents weaknesses, as it is subjective in nature and takes a considerable amount of time. Therefore, an automated system based on image processing that is capable of determining beef quality is required. This research aims to develop an image segmentation method by processing digital images. The system designed consists of image acquisition processes with varied distance, resolution, and angle. Image segmentation is done to separate the images of fat and meat using the Otsu thresholding method. Classification was carried out using the decision tree algorithm and the best accuracies were obtained at 90% for training and 84% for testing. Once developed, this system is then embedded into the android programming. Results show that the image processing technique is capable of proper marbling score identification.

  7. Video genre classification using multimodal features

    Science.gov (United States)

    Jin, Sung Ho; Bae, Tae Meon; Choo, Jin Ho; Ro, Yong Man

    2003-12-01

    We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.

  8. Simple street tree sampling

    Science.gov (United States)

    David J. Nowak; Jeffrey T. Walton; James Baldwin; Jerry. Bond

    2015-01-01

    Information on street trees is critical for management of this important resource. Sampling of street tree populations provides an efficient means to obtain street tree population information. Long-term repeat measures of street tree samples supply additional information on street tree changes and can be used to report damages from catastrophic events. Analyses of...

  9. Classification of Internet banking customers using data mining algorithms

    Directory of Open Access Journals (Sweden)

    Reza Radfar

    2014-03-01

    Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.

  10. adabag: An R Package for Classification with Boosting and Bagging

    Directory of Open Access Journals (Sweden)

    Esteban Alfaro

    2013-09-01

    Full Text Available Boosting and bagging are two widely used ensemble methods for classification. Their common goal is to improve the accuracy of a classifier combining single classifiers which are slightly better than random guessing. Among the family of boosting algorithms, AdaBoost (adaptive boosting is the best known, although it is suitable only for dichotomous tasks. AdaBoost.M1 and SAMME (stagewise additive modeling using a multi-class exponential loss function are two easy and natural extensions to the general case of two or more classes. In this paper, the adabag R package is introduced. This version implements AdaBoost.M1, SAMME and bagging algorithms with classification trees as base classifiers. Once the ensembles have been trained, they can be used to predict the class of new samples. The accuracy of these classifiers can be estimated in a separated data set or through cross validation. Moreover, the evolution of the error as the ensemble grows can be analysed and the ensemble can be pruned. In addition, the margin in the class prediction and the probability of each class for the observations can be calculated. Finally, several classic examples in classification literature are shown to illustrate the use of this package.

  11. Towards a natural classification and backbone tree for Sordariomycete

    Digital Repository Service at National Institute of Oceanography (India)

    Maharachchikumbura, S.S.N.; Hyde, K.D.; Jones, E.B.G.; McKenzie, E.H.C.; Huang, S.-K.; Abdel-Wahab, M.A.; Daranagama, D.A.; Dayarathne, M.; D'souza, M.J.; Goonasekara, I.D.; Hongsanan, S.; Jayawardena, R.S.; Kirk, P.M.; Konta, S.; Liu, J.-K.; Liu, Z.-Y.; Norphanphoun, C.; Pang, K.-L.; Perera, R.H.; Senanayake, I.C.; Shang, Q.; Shenoy, B.D.; Xiao, Y.; Bahkali, A.H.; Kang, J.; Somrothipol, S.; Suetrong, S.; Wen, T.; Xu, J.

    , lichenized or lichenicolous taxa The class includes freshwater, marine and terrestrial taxa and has a worldwide distribution This paper provides an updated outline of the Sordariomycetes and a backbone tree incorporating asexual and sexual genera in the class...

  12. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    International Nuclear Information System (INIS)

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 ≤ r ≤ 21 (85.2%) and r ≥ 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 ≤ r ≤ 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (∼2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 ≤ r ≤ 21.

  13. Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    Science.gov (United States)

    Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.

    2008-01-01

    Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742

  14. Multiscale Vessel-guided Airway Tree Segmentation

    DEFF Research Database (Denmark)

    Lo, Pechin Chien Pau; Sporring, Jon; de Bruijne, Marleen

    2009-01-01

    This paper presents a method for airway tree segmentation that uses a combination of a trained airway appearance model, vessel and airway orientation information, and region growing. The method uses a voxel classification based appearance model, which involves the use of a classifier that is trai...

  15. Two tree-formation methods for fast pattern search using nearest-neighbour and nearest-centroid matching

    NARCIS (Netherlands)

    Schomaker, Lambertus; Mangalagiu, D.; Vuurpijl, Louis; Weinfeld, M.; Schomaker, Lambert; Vuurpijl, Louis

    2000-01-01

    This paper describes tree­based classification of character images, comparing two methods of tree formation and two methods of matching: nearest neighbor and nearest centroid. The first method, Preprocess Using Relative Distances (PURD) is a tree­based reorganization of a flat list of patterns,

  16. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    Science.gov (United States)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  17. Comparison of two aerial imaging platforms for identification of Huanglongbing-infected citrus trees

    DEFF Research Database (Denmark)

    Garcia Ruiz, Francisco Jose; Sankaran, Sindhuja; Maja, Joe Mari

    2013-01-01

    and HLB-infected trees. During classification studies, accuracies in the range of 67–85% and false negatives from 7% to 32% were acquired from UAV-based data; while corresponding values were 61–74% and 28–45% with aircraft-based data. Among the tested classification algorithms, support vector machine (SVM......) with kernel resulted in better performance than other methods such as SVM (linear), linear discriminant analysis and quadratic discriminant analysis. Thus, high-resolution aerial sensing has good prospect for the detection of HLB-infected trees....

  18. Decision trees in epidemiological research.

    Science.gov (United States)

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  19. Automatic Hierarchical Color Image Classification

    Directory of Open Access Journals (Sweden)

    Jing Huang

    2003-02-01

    Full Text Available Organizing images into semantic categories can be extremely useful for content-based image retrieval and image annotation. Grouping images into semantic classes is a difficult problem, however. Image classification attempts to solve this hard problem by using low-level image features. In this paper, we propose a method for hierarchical classification of images via supervised learning. This scheme relies on using a good low-level feature and subsequently performing feature-space reconfiguration using singular value decomposition to reduce noise and dimensionality. We use the training data to obtain a hierarchical classification tree that can be used to categorize new images. Our experimental results suggest that this scheme not only performs better than standard nearest-neighbor techniques, but also has both storage and computational advantages.

  20. Research on Classification of Chinese Text Data Based on SVM

    Science.gov (United States)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  1. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy.

    Science.gov (United States)

    Letunic, Ivica; Bork, Peer

    2011-07-01

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.

  2. A Classification Regression Tree Analysis to Reduce Balance Impairments and Falls in the Older population: Impact on Resource Utilization and Clinical Decision-Making in USA Rehabilitation Service Delivery

    Directory of Open Access Journals (Sweden)

    Lucinda Pfalzer

    2013-06-01

    Full Text Available Background/Purpose: Over 1/3 of adults over age 65 experiences at least one fall each year. This pilot report uses a classification regression tree analysis (CART to model the outcomes for balance/risk of falls from the Gentiva® Safe Strides® Program (SSP. Methods/Outcomes: SSP is a home-based balance/fall prevention program designed to treat root causes of a patient

  3. Classification for Inconsistent Decision Tables

    KAUST Repository

    Azad, Mohammad

    2016-09-28

    Decision trees have been used widely to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples with equal values of conditional attributes but different labels, then to discover the essential patterns or knowledge from the data set is challenging. Three approaches (generalized, most common and many-valued decision) have been considered to handle such inconsistency. The decision tree model has been used to compare the classification results among three approaches. Many-valued decision approach outperforms other approaches, and M_ws_entM greedy algorithm gives faster and better prediction accuracy.

  4. Classification for Inconsistent Decision Tables

    KAUST Repository

    Azad, Mohammad; Moshkov, Mikhail

    2016-01-01

    Decision trees have been used widely to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples with equal values of conditional attributes but different labels, then to discover the essential patterns or knowledge from the data set is challenging. Three approaches (generalized, most common and many-valued decision) have been considered to handle such inconsistency. The decision tree model has been used to compare the classification results among three approaches. Many-valued decision approach outperforms other approaches, and M_ws_entM greedy algorithm gives faster and better prediction accuracy.

  5. Multiple endmember spectral-angle-mapper (SAM) analysis improves discrimination of Savanna tree species

    CSIR Research Space (South Africa)

    Cho, Moses A

    2009-08-01

    Full Text Available of this paper was to evaluate the classification performance of a multiple-endmember spectral angle mapper (SAM) classification approach in discriminating seven common African savanna tree species and to compare the results with the traditional SAM classifier...

  6. The role of non-fig-wasp insects on fig tree biology, with a proposal of the F phase (Fallen figs)

    Science.gov (United States)

    Palmieri, Luciano; Pereira, Rodrigo Augusto Santinelo

    2018-07-01

    The two seminal papers by Galil and Eisikowitch describing the development of Ficus flowers and their sycophilous wasps (i.e., phases A-E) have been adopted in several ecological and evolutionary studies on a wide range of fig tree-insect interactions. Their classification, however, is not inclusive enough to encompass all the diversity of insects associated with the fig development, and the impact of this fauna on the fig-fig wasp mutualism is still unexplored. Here we describe the life history of the non-fig-wasp insects and propose an additional phase to fig-development classification, the F phase (Fallen figs). These figs are not consumed by frugivores while still on the parent tree, fall to the ground and turn into a resource for a diverse range of animals. To support the relevance of the F phase, we summarized a 5-years-period of field observations made on different biomes in three continents. Additionally, we compiled data from the literature of non-fig-wasp insects including only insects associated with inflorescences of wild fig tree species. We report 129 species of non-fig-wasp insects feeding on figs; they colonize the figs in different phases of development and some groups rely on the fallen figs to complete their life cycles. Their range of interaction varies from specialists - that use exclusively fig pulp or fig seeds in their diets - to generalists, opportunists and parasitoids species. The formalization of this additional phase will encourage new studies on fig tree ecology and improve our knowledge on the processes that affect the diversification of insects. It will also help us to understand the implications this fauna may have had on the origin and maintenance of mutualistic interactions.

  7. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    Directory of Open Access Journals (Sweden)

    Yin Wang

    2016-01-01

    Full Text Available Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  8. In Vivo Pattern Classification of Ingestive Behavior in Ruminants Using FBG Sensors and Machine Learning

    Directory of Open Access Journals (Sweden)

    Vinicius Pegorini

    2015-11-01

    Full Text Available Pattern classification of ingestive behavior in grazing animals has extreme importance in studies related to animal nutrition, growth and health. In this paper, a system to classify chewing patterns of ruminants in in vivo experiments is developed. The proposal is based on data collected by optical fiber Bragg grating sensors (FBG that are processed by machine learning techniques. The FBG sensors measure the biomechanical strain during jaw movements, and a decision tree is responsible for the classification of the associated chewing pattern. In this study, patterns associated with food intake of dietary supplement, hay and ryegrass were considered. Additionally, two other important events for ingestive behavior were monitored: rumination and idleness. Experimental results show that the proposed approach for pattern classification is capable of differentiating the five patterns involved in the chewing process with an overall accuracy of 94%.

  9. In Vivo Pattern Classification of Ingestive Behavior in Ruminants Using FBG Sensors and Machine Learning.

    Science.gov (United States)

    Pegorini, Vinicius; Karam, Leandro Zen; Pitta, Christiano Santos Rocha; Cardoso, Rafael; da Silva, Jean Carlos Cardozo; Kalinowski, Hypolito José; Ribeiro, Richardson; Bertotti, Fábio Luiz; Assmann, Tangriani Simioni

    2015-11-11

    Pattern classification of ingestive behavior in grazing animals has extreme importance in studies related to animal nutrition, growth and health. In this paper, a system to classify chewing patterns of ruminants in in vivo experiments is developed. The proposal is based on data collected by optical fiber Bragg grating sensors (FBG) that are processed by machine learning techniques. The FBG sensors measure the biomechanical strain during jaw movements, and a decision tree is responsible for the classification of the associated chewing pattern. In this study, patterns associated with food intake of dietary supplement, hay and ryegrass were considered. Additionally, two other important events for ingestive behavior were monitored: rumination and idleness. Experimental results show that the proposed approach for pattern classification is capable of differentiating the five patterns involved in the chewing process with an overall accuracy of 94%.

  10. Classification tree analyses reveal limited potential for early targeted prevention against childhood overweight.

    Science.gov (United States)

    Beyerlein, Andreas; Kusian, Dennis; Ziegler, Anette-Gabriele; Schaffrath-Rosario, Angelika; von Kries, Rüdiger

    2014-02-01

    Whether specific combinations of risk factors in very early life might allow identification of high-risk target groups for overweight prevention programs was examined. Data of n = 8981 children from the German KiGGS study were analyzed. Using a classification tree approach, predictive risk factor combinations were assessed for overweight in 3-6, 7-10, and 11-17-year-old children. In preschool children, the subgroup with the highest overweight risk were migrant children with at least one obese parent, with a prevalence of 36.6 (95% confidence interval or CI: 22.9, 50.4)%, compared to an overall prevalence of 10.0 (8.9, 11.2)%. The prevalence of overweight increased from 18.3 (16.8, 19.8)% to 57.9 (46.6, 69.3)% in 7-10-year-old children, if at least one parent was obese and the child had been born large-for-gestational-age. In 11-17-year-olds, the overweight risk increased from 20.1 (18.9, 21.3)% to 63.0 (46.4, 79.7)% in the highest risk group. However, high prevalence ratios were found only in small subgroups, containing <10% of all overweight cases in the respective age group. Our results indicate only a limited potential for early targeted preventions against overweight in children and adolescents. Copyright © 2013 The Obesity Society.

  11. CLASSIFICATION OF ENTREPRENEURIAL INTENTIONS BY NEURAL NETWORKS, DECISION TREES AND SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2010-12-01

    Full Text Available Entrepreneurial intentions of students are important to recognize during the study in order to provide those students with educational background that will support such intentions and lead them to successful entrepreneurship after the study. The paper aims to develop a model that will classify students according to their entrepreneurial intentions by benchmarking three machine learning classifiers: neural networks, decision trees, and support vector machines. A survey was conducted at a Croatian university including a sample of students at the first year of study. Input variables described students’ demographics, importance of business objectives, perception of entrepreneurial carrier, and entrepreneurial predispositions. Due to a large dimension of input space, a feature selection method was used in the pre-processing stage. For comparison reasons, all tested models were validated on the same out-of-sample dataset, and a cross-validation procedure for testing generalization ability of the models was conducted. The models were compared according to its classification accuracy, as well according to input variable importance. The results show that although the best neural network model produced the highest average hit rate, the difference in performance is not statistically significant. All three models also extract similar set of features relevant for classifying students, which can be suggested to be taken into consideration by universities while designing their academic programs.

  12. Statistical analysis of texture in trunk images for biometric identification of tree species.

    Science.gov (United States)

    Bressane, Adriano; Roveda, José A F; Martins, Antônio C G

    2015-04-01

    The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year. Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.

  13. Combining logistic regression with classification and regression tree to predict quality of care in a home health nursing data set.

    Science.gov (United States)

    Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun

    2006-01-01

    In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.

  14. Decision trees in epidemiological research

    Directory of Open Access Journals (Sweden)

    Ashwini Venkatasubramaniam

    2017-09-01

    Full Text Available Abstract Background In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART technique and the newer Conditional Inference tree (CTree technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  15. Trees in the city: valuing street trees in Portland, Oregon

    Science.gov (United States)

    G.H. Donovan; D.T. Butry

    2010-01-01

    We use a hedonic price model to simultaneously estimate the effects of street trees on the sales price and the time-on-market (TOM) of houses in Portland. Oregon. On average, street trees add $8,870 to sales price and reduce TOM by 1.7 days. In addition, we found that the benefits of street trees spill over to neighboring houses. Because the provision and maintenance...

  16. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

    Science.gov (United States)

    Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver

    2008-07-01

    DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree

  17. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  18. AERIAL IMAGES FROM AN UAV SYSTEM: 3D MODELING AND TREE SPECIES CLASSIFICATION IN A PARK AREA

    Directory of Open Access Journals (Sweden)

    R. Gini

    2012-07-01

    Full Text Available The use of aerial imagery acquired by Unmanned Aerial Vehicles (UAVs is scheduled within the FoGLIE project (Fruition of Goods Landscape in Interactive Environment: it starts from the need to enhance the natural, artistic and cultural heritage, to produce a better usability of it by employing audiovisual movable systems of 3D reconstruction and to improve monitoring procedures, by using new media for integrating the fruition phase with the preservation ones. The pilot project focus on a test area, Parco Adda Nord, which encloses various goods' types (small buildings, agricultural fields and different tree species and bushes. Multispectral high resolution images were taken by two digital compact cameras: a Pentax Optio A40 for RGB photos and a Sigma DP1 modified to acquire the NIR band. Then, some tests were performed in order to analyze the UAV images' quality with both photogrammetric and photo-interpretation purposes, to validate the vector-sensor system, the image block geometry and to study the feasibility of tree species classification. Many pre-signalized Control Points were surveyed through GPS to allow accuracy analysis. Aerial Triangulations (ATs were carried out with photogrammetric commercial software, Leica Photogrammetry Suite (LPS and PhotoModeler, with manual or automatic selection of Tie Points, to pick out pros and cons of each package in managing non conventional aerial imagery as well as the differences in the modeling approach. Further analysis were done on the differences between the EO parameters and the corresponding data coming from the on board UAV navigation system.

  19. Land use classification from Sentinel-2 imagery

    OpenAIRE

    Borràs, J.; Delegido, J.; Pezzola, A.; Pereira, M.; Morassi, G.; Camps-Valls, G.

    2017-01-01

    [EN] Sentinel-2 (S2), a new ESA satellite for Earth observation, accounts with 13 bands which provide high-quality radiometric images with an excellent spatial resolution (10 and 20 m) ideal for classification purposes. In this paper, two objectives have been addressed: to determine the best classification method for S2, and to quantify its improve-ment with respect to the SPOT operational mission. To do so, four classifiers (LDA, RF, Decision Trees, K-NN) have been selected and applied to tw...

  20. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    Directory of Open Access Journals (Sweden)

    Zekić-Sušac Marijana

    2014-09-01

    Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

  1. Selection of morphological features of pollen grains for chosen tree taxa

    Directory of Open Access Journals (Sweden)

    Agnieszka Kubik-Komar

    2018-05-01

    Full Text Available The basis of aerobiological studies is to monitor airborne pollen concentrations and pollen season timing. This task is performed by appropriately trained staff and is difficult and time consuming. The goal of this research is to select morphological characteristics of grains that are the most discriminative for distinguishing between birch, hazel and alder taxa and are easy to determine automatically from microscope images. This selection is based on the split attributes of the J4.8 classification trees built for different subsets of features. Determining the discriminative features by this method, we provide specific rules for distinguishing between individual taxa, at the same time obtaining a high percentage of correct classification. The most discriminative among the 13 morphological characteristics studied are the following: number of pores, maximum axis, minimum axis, axes difference, maximum oncus width, and number of lateral pores. The classification result of the tree based on this subset is better than the one built on the whole feature set and it is almost 94%. Therefore, selection of attributes before tree building is recommended. The classification results for the features easiest to obtain from the image, i.e. maximum axis, minimum axis, axes difference, and number of lateral pores, are only 2.09 pp lower than those obtained for the complete set, but 3.23 pp lower than the results obtained for the selected most discriminating attributes only.

  2. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery

    Science.gov (United States)

    Helmer, E.H.; Kennaway, T.A.; Pedreros, D.H.; Clark, M.L.; Marcano-Vega, H.; Tieszen, L.L.; Ruzycki, T.R.; Schill, S.R.; Carrington, C.M.S.

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius, testing a more detailed classification than earlier work in the latter three islands. Secondly, we estimate the extents of land cover and protected forest by formation for five islands and ask how land cover has changed over the second half of the 20th century. The image interpretation approach combines image mosaics and ancillary geographic data, classifying the resulting set of raster data with decision tree software. Cloud-free image mosaics for one or two seasons were created by applying regression tree normalization to scene dates that could fill cloudy areas in a base scene. Such mosaics are also known as cloud-filled, cloud-minimized or cloud-cleared imagery, mosaics, or composites. The approach accurately distinguished several classes that more standard methods would confuse; the seamless mosaics aided reference data collection; and the multiseason imagery allowed us to separate drought deciduous forests and woodlands from semi-deciduous ones. Cultivated land areas declined 60 to 100 percent from about 1945 to 2000 on several islands. Meanwhile, forest cover has increased 50 to 950%. This trend will likely continue where sugar cane cultivation has dominated. Like the island of Puerto Rico, most higher-elevation forest formations are protected in formal or informal reserves. Also similarly, lowland forests, which are drier forest types on these islands, are not well represented in reserves. Former cultivated lands in lowland areas could provide lands for new reserves of drier forest types. The land-use history of these islands may provide insight for planners in countries currently considering

  3. Degree of susceptibility of industrial gases of tree and shrub species

    Energy Technology Data Exchange (ETDEWEB)

    Dobrovoljskii, I A

    1952-01-01

    The trees and shrubs of the iron smelting region of Krivoi Rog, in the Ukraine, were surveyed to determine susceptibility to air pollution damage. Most of the observations were made in parks and green belts in industrial areas. A classification of tree and shrub species is presented; they are separated into three classes according to their susceptibility to air pollutant injury.

  4. TESTING OF LAND COVER CLASSIFICATION FROM MULTISPECTRAL AIRBORNE LASER SCANNING DATA

    Directory of Open Access Journals (Sweden)

    K. Bakuła

    2016-06-01

    Full Text Available Multispectral Airborne Laser Scanning provides a new opportunity for airborne data collection. It provides high-density topographic surveying and is also a useful tool for land cover mapping. Use of a minimum of three intensity images from a multiwavelength laser scanner and 3D information included in the digital surface model has the potential for land cover/use classification and a discussion about the application of this type of data in land cover/use mapping has recently begun. In the test study, three laser reflectance intensity images (orthogonalized point cloud acquired in green, near-infrared and short-wave infrared bands, together with a digital surface model, were used in land cover/use classification where six classes were distinguished: water, sand and gravel, concrete and asphalt, low vegetation, trees and buildings. In the tested methods, different approaches for classification were applied: spectral (based only on laser reflectance intensity images, spectral with elevation data as additional input data, and spectro-textural, using morphological granulometry as a method of texture analysis of both types of data: spectral images and the digital surface model. The method of generating the intensity raster was also tested in the experiment. Reference data were created based on visual interpretation of ALS data and traditional optical aerial and satellite images. The results have shown that multispectral ALS data are unlike typical multispectral optical images, and they have a major potential for land cover/use classification. An overall accuracy of classification over 90% was achieved. The fusion of multi-wavelength laser intensity images and elevation data, with the additional use of textural information derived from granulometric analysis of images, helped to improve the accuracy of classification significantly. The method of interpolation for the intensity raster was not very helpful, and using intensity rasters with both first and

  5. Integrating classification trees with local logistic regression in Intensive Care prognosis.

    Science.gov (United States)

    Abu-Hanna, Ameen; de Keizer, Nicolette

    2003-01-01

    Health care effectiveness and efficiency are under constant scrutiny especially when treatment is quite costly as in the Intensive Care (IC). Currently there are various international quality of care programs for the evaluation of IC. At the heart of such quality of care programs lie prognostic models whose prediction of patient mortality can be used as a norm to which actual mortality is compared. The current generation of prognostic models in IC are statistical parametric models based on logistic regression. Given a description of a patient at admission, these models predict the probability of his or her survival. Typically, this patient description relies on an aggregate variable, called a score, that quantifies the severity of illness of the patient. The use of a parametric model and an aggregate score form adequate means to develop models when data is relatively scarce but it introduces the risk of bias. This paper motivates and suggests a method for studying and improving the performance behavior of current state-of-the-art IC prognostic models. Our method is based on machine learning and statistical ideas and relies on exploiting information that underlies a score variable. In particular, this underlying information is used to construct a classification tree whose nodes denote patient sub-populations. For these sub-populations, local models, most notably logistic regression ones, are developed using only the total score variable. We compare the performance of this hybrid model to that of a traditional global logistic regression model. We show that the hybrid model not only provides more insight into the data but also has a better performance. We pay special attention to the precision aspect of model performance and argue why precision is more important than discrimination ability.

  6. Variable coupling between sap-flow and transpiration in pine trees under drought conditions

    Science.gov (United States)

    Preisler, Yakir; Tatarinov, Fyodor; Rohatyn, Shani; Rotenberg, Eyal; Grunzweig, Jose M.; Yakir, Dan

    2016-04-01

    Changes in diurnal patterns in water transport and physiological activities in response to changes in environmental conditions are important adjustments of trees to drought. The rate of sap flow (SF) in trees is expected to be in agreement with the rate of tree-scale transpiration (T) and provides a powerful measure of water transport in the soil-plant-atmosphere system. The aim of this five-years study was to investigate the temporal links between SF and T in Pinus halepensis exposed to extreme seasonal drought in the Yatir forest in Israel. We continuously measured SF (20 trees), the daily variations in stem diameter (ΔDBH, determined with high precision dendrometers; 8 trees), and ecosystem evapotranspiration (ET; eddy covariance), which were complemented with short-term campaigns of leaf-scale measurements of H2O and CO2 gas exchange, water potentials, and hydraulic conductivity. During the rainy season, tree SF was well synchronized with ecosystem ET, reaching maximum rates during midday in all trees. However, during the dry season, the daily SF trends greatly varied among trees, allowing a classification of trees into three classes: 1) Trees that remain with SF maximum at midday, 2) trees that advanced their SF peak to early morning, and 3) trees that delayed their SF peak to late afternoon hours. This classification remained valid for the entire study period (2010-2015), and strongly correlated with tree height and DBH, and to a lower degree with crown size and competition index. In the dry season, class 3 trees (large) tended to delay the timing of SF maximum to the afternoon, and to advance their maximum diurnal DBH to early morning, while class 2 trees (smaller) advanced their SF maximum to early morning and had maximum daily DBH during midday and afternoon. Leaf-scale transpiration (T), measurements showed a typical morning peak in all trees, irrespective of classification, and a secondary peak in the afternoon in large trees only. Water potential and

  7. Diversity of shrub tree layer, leaf litter decomposition and N release in a Brazilian Cerrado under N, P and N plus P additions

    International Nuclear Information System (INIS)

    Khan Baiocchi Jacobson, Tamiel; Cunha Bustamante, Mercedes Maria da; Rodrigues Kozovits, Alessandra

    2011-01-01

    This study investigated changes in diversity of shrub-tree layer, leaf decomposition rates, nutrient release and soil NO fluxes of a Brazilian savanna (cerrado sensu stricto) under N, P and N plus P additions. Simultaneous addition of N and P affected density, dominance, richness and diversity patterns more significantly than addition of N or P separately. Leaf litter decomposition rates increased in P and NP plots but did not differ in N plots in comparison to control plots. N addition increased N mass loss, while the combined addition of N and P resulted in an immobilization of N in leaf litter. Soil NO emissions were also higher when N was applied without P. The results indicate that if the availability of P is not increased proportionally to the availability of N, the losses of N are intensified. - Highlights: → Simultaneous addition of N and P affected richness and diversity of the shrub-tree layer of a Brazilian savanna more significantly than addition of N or P separately. → Leaf litter decomposition rates increased in P and NP plots but did not differ in N plots in comparison to control plots. N addition increased N mass loss, while the combined addition of N and P resulted in an immobilization of N in leaf litter. Soil NO emissions were also higher when N was applied without P. → The results indicated that if increases in N deposition in Cerrado ecosystems are not accompanied by P additions, higher N losses through leaching and gas emissions can occur with other ecosystem impacts. - Shrub-tree diversity and functioning of Brazilian savanna are affected by increasing nutrient availability.

  8. Diversity of shrub tree layer, leaf litter decomposition and N release in a Brazilian Cerrado under N, P and N plus P additions

    Energy Technology Data Exchange (ETDEWEB)

    Khan Baiocchi Jacobson, Tamiel, E-mail: tamiel@unb.br [Departamento de Ecologia, Universidade de Brasilia, Brasilia-DF 70919-970 (Brazil); Cunha Bustamante, Mercedes Maria da, E-mail: mercedes@unb.br [Departamento de Ecologia, Universidade de Brasilia, Brasilia-DF 70919-970 (Brazil); Rodrigues Kozovits, Alessandra, E-mail: kozovits@icep.ufop.br [Departamento de Ecologia, Universidade de Brasilia, Brasilia-DF 70919-970 (Brazil)

    2011-10-15

    This study investigated changes in diversity of shrub-tree layer, leaf decomposition rates, nutrient release and soil NO fluxes of a Brazilian savanna (cerrado sensu stricto) under N, P and N plus P additions. Simultaneous addition of N and P affected density, dominance, richness and diversity patterns more significantly than addition of N or P separately. Leaf litter decomposition rates increased in P and NP plots but did not differ in N plots in comparison to control plots. N addition increased N mass loss, while the combined addition of N and P resulted in an immobilization of N in leaf litter. Soil NO emissions were also higher when N was applied without P. The results indicate that if the availability of P is not increased proportionally to the availability of N, the losses of N are intensified. - Highlights: > Simultaneous addition of N and P affected richness and diversity of the shrub-tree layer of a Brazilian savanna more significantly than addition of N or P separately. > Leaf litter decomposition rates increased in P and NP plots but did not differ in N plots in comparison to control plots. N addition increased N mass loss, while the combined addition of N and P resulted in an immobilization of N in leaf litter. Soil NO emissions were also higher when N was applied without P. > The results indicated that if increases in N deposition in Cerrado ecosystems are not accompanied by P additions, higher N losses through leaching and gas emissions can occur with other ecosystem impacts. - Shrub-tree diversity and functioning of Brazilian savanna are affected by increasing nutrient availability.

  9. Construction and application of hierarchical decision tree for classification of ultrasonographic prostate images

    NARCIS (Netherlands)

    Giesen, R. J.; Huynen, A. L.; Aarnink, R. G.; de la Rosette, J. J.; Debruyne, F. M.; Wijkstra, H.

    1996-01-01

    A non-parametric algorithm is described for the construction of a binary decision tree classifier. This tree is used to correlate textural features, computed from ultrasonographic prostate images, with the histopathology of the imaged tissue. The algorithm consists of two parts; growing and pruning.

  10. Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models

    Directory of Open Access Journals (Sweden)

    Paccaud Fred

    2004-04-01

    Full Text Available Abstract Background We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. Methods Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i linear regression; (ii logistic classification; (iii regression trees; (iv classification trees (iii and iv are collectively known as "CART". Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. Results Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. Conclusions There were no striking differences between either the algebraic (i, ii vs. non-algebraic (iii, iv, or the regression (i, iii vs. classification (ii, iv modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.

  11. Atlas of United States Trees, Volume 2: Alaska Trees and Common Shrubs.

    Science.gov (United States)

    Viereck, Leslie A.; Little, Elbert L., Jr.

    This volume is the second in a series of atlases describing the natural distribution or range of native tree species in the United States. The 82 species maps include 32 of trees in Alaska, 6 of shrubs rarely reaching tree size, and 44 more of common shrubs. More than 20 additional maps summarize environmental factors and furnish general…

  12. Minimum triplet covers of binary phylogenetic X-trees.

    Science.gov (United States)

    Huber, K T; Moulton, V; Steel, M

    2017-12-01

    Trees with labelled leaves and with all other vertices of degree three play an important role in systematic biology and other areas of classification. A classical combinatorial result ensures that such trees can be uniquely reconstructed from the distances between the leaves (when the edges are given any strictly positive lengths). Moreover, a linear number of these pairwise distance values suffices to determine both the tree and its edge lengths. A natural set of pairs of leaves is provided by any 'triplet cover' of the tree (based on the fact that each non-leaf vertex is the median vertex of three leaves). In this paper we describe a number of new results concerning triplet covers of minimum size. In particular, we characterize such covers in terms of an associated graph being a 2-tree. Also, we show that minimum triplet covers are 'shellable' and thereby provide a set of pairs for which the inter-leaf distance values will uniquely determine the underlying tree and its associated branch lengths.

  13. AN OBJECT-BASED METHOD FOR CHINESE LANDFORM TYPES CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    H. Ding

    2016-06-01

    Full Text Available Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM. In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.

  14. Land Cover Classification from Multispectral Data Using Computational Intelligence Tools: A Comparative Study

    Directory of Open Access Journals (Sweden)

    André Mora

    2017-11-01

    Full Text Available This article discusses how computational intelligence techniques are applied to fuse spectral images into a higher level image of land cover distribution for remote sensing, specifically for satellite image classification. We compare a fuzzy-inference method with two other computational intelligence methods, decision trees and neural networks, using a case study of land cover classification from satellite images. Further, an unsupervised approach based on k-means clustering has been also taken into consideration for comparison. The fuzzy-inference method includes training the classifier with a fuzzy-fusion technique and then performing land cover classification using reinforcement aggregation operators. To assess the robustness of the four methods, a comparative study including three years of land cover maps for the district of Mandimba, Niassa province, Mozambique, was undertaken. Our results show that the fuzzy-fusion method performs similarly to decision trees, achieving reliable classifications; neural networks suffer from overfitting; while k-means clustering constitutes a promising technique to identify land cover types from unknown areas.

  15. Fungal diseases of tree stands under urbanized conditions of Moscow

    Directory of Open Access Journals (Sweden)

    Smirnova Oksana G.

    2013-01-01

    Full Text Available Phytosanitary and ecological estimation of tree-stands has been con­ducted at the Forest Experimental Station of Moscow Agricultural Academy and parks of Northeast of Moscow in 2007-2011. Fomes fomentarius was proved to be a very serious pathogen of trees under conditions of Moscow, Piptoporus betulinus, Phellinus igniarius, and Fomitopsis pinicola also occurred and caused damage to trees. This rather bad phytosanitary situation depends on alarming ecological situation in Moscow. At the Forest Experimental Station of Moscow Agricultural Academy a number and cover of lichens decreased. In general, all trees in Moscow are in dynamic equilibrium with the urbanized environment. In connection with this, the following classification of tree-stands was proposed for the urbanized environment: 1 - healthy trees, 2 - affected trees which can be managed, 3 - dry woods, 3a - very diseased. Many tree-stands in investigated regions of Moscow are found to belong to the groups 2 and 3c. All tree-stands must be carefully monitored and managed in order to provide a well-timed decision on the support system for preservation of trees as ‘lungs of city’ and avoid unpredictable tree falling which put people and traffic at risk.

  16. An ecological classification system for the central hardwoods region: The Hoosier National Forest

    Science.gov (United States)

    James E. Van Kley; George R. Parker

    1993-01-01

    This study, a multifactor ecological classification system, using vegetation, soil characteristics, and physiography, was developed for the landscape of the Hoosier National Forest in Southern Indiana. Measurements of ground flora, saplings, and canopy trees from selected stands older than 80 years were subjected to TWINSPAN classification and DECORANA ordination....

  17. Biodiversity among Lactobacillus helveticus Strains Isolated from Different Natural Whey Starter Cultures as Revealed by Classification Trees

    Science.gov (United States)

    Gatti, Monica; Trivisano, Carlo; Fabrizi, Enrico; Neviani, Erasmo; Gardini, Fausto

    2004-01-01

    Lactobacillus helveticus is a homofermentative thermophilic lactic acid bacterium used extensively for manufacturing Swiss type and aged Italian cheese. In this study, the phenotypic and genotypic diversity of strains isolated from different natural dairy starter cultures used for Grana Padano, Parmigiano Reggiano, and Provolone cheeses was investigated by a classification tree technique. A data set was used that consists of 119 L. helveticus strains, each of which was studied for its physiological characters, as well as surface protein profiles and hybridization with a species-specific DNA probe. The methodology employed in this work allowed the strains to be grouped into terminal nodes without difficult and subjective interpretation. In particular, good discrimination was obtained between L. helveticus strains isolated, respectively, from Grana Padano and from Provolone natural whey starter cultures. The method used in this work allowed identification of the main characteristics that permit discrimination of biotypes. In order to understand what kind of genes could code for phenotypes of technological relevance, evidence that specific DNA sequences are present only in particular biotypes may be of great interest. PMID:14711641

  18. Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study

    Directory of Open Access Journals (Sweden)

    Kritski Afrânio

    2006-02-01

    Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.

  19. On the Accuracy of Language Trees

    Science.gov (United States)

    Pompei, Simone; Loreto, Vittorio; Tria, Francesca

    2011-01-01

    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it. PMID:21674034

  20. On the accuracy of language trees.

    Directory of Open Access Journals (Sweden)

    Simone Pompei

    Full Text Available Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve

  1. Multi-phenology WorldView-2 imagery improves remote sensing of savannah tree species

    Science.gov (United States)

    Madonsela, Sabelo; Cho, Moses Azong; Mathieu, Renaud; Mutanga, Onisimo; Ramoelo, Abel; Kaszta, Żaneta; Kerchove, Ruben Van De; Wolff, Eléonore

    2017-06-01

    Biodiversity mapping in African savannah is important for monitoring changes and ensuring sustainable use of ecosystem resources. Biodiversity mapping can benefit from multi-spectral instruments such as WorldView-2 with very high spatial resolution and a spectral configuration encompassing important spectral regions not previously available for vegetation mapping. This study investigated i) the benefits of the eight-band WorldView-2 (WV-2) spectral configuration for discriminating tree species in Southern African savannah and ii) if multiple-images acquired at key points of the typical phenological development of savannahs (peak productivity, transition to senescence) improve on tree species classifications. We first assessed the discriminatory power of WV-2 bands using interspecies-Spectral Angle Mapper (SAM) via Band Add-On procedure and tested the spectral capability of WorldView-2 against simulated IKONOS for tree species classification. The results from interspecies-SAM procedure identified the yellow and red bands as the most statistically significant bands (p = 0.000251 and p = 0.000039 respectively) in the discriminatory power of WV-2 during the transition from wet to dry season (April). Using Random Forest classifier, the classification scenarios investigated showed that i) the 8-bands of the WV-2 sensor achieved higher classification accuracy for the April date (transition from wet to dry season, senescence) compared to the March date (peak productivity season) ii) the WV-2 spectral configuration systematically outperformed the IKONOS sensor spectral configuration and iii) the multi-temporal approach (March and April combined) improved the discrimination of tress species and produced the highest overall accuracy results at 80.4%. Consistent with the interspecies-SAM procedure, the yellow (605 nm) band also showed a statistically significant contribution in the improved classification accuracy from WV-2. These results highlight the mapping opportunities

  2. Interpretation of Forest Resources at the Individual Tree Level at Purple Mountain, Nanjing City, China, Using WorldView-2 Imagery by Combining GPS, RS and GIS Technologies

    Directory of Open Access Journals (Sweden)

    Songqiu Deng

    2013-12-01

    Full Text Available This study attempted to measure forest resources at the individual tree level using high-resolution images by combining GPS, RS, and Geographic Information System (GIS technologies. The images were acquired by the WorldView-2 satellite with a resolution of 0.5 m in the panchromatic band and 2.0 m in the multispectral bands. Field data of 90 plots were used to verify the interpreted accuracy. The tops of trees in three groups, namely ≥10 cm, ≥15 cm, and ≥20 cm DBH (diameter at breast height, were extracted by the individual tree crown (ITC approach using filters with moving windows of 3 × 3 pixels, 5 × 5 pixels and 7 × 7 pixels, respectively. In the study area, there were 1,203,970 trees of DBH over 10 cm, and the interpreted accuracy was 73.68 ± 15.14% averaged over the 90 plots. The numbers of the trees that were ≥15 cm and ≥20 cm DBH were 727,887 and 548,919, with an average accuracy of 68.74 ± 17.21% and 71.92 ± 18.03%, respectively. The pixel-based classification showed that the classified accuracies of the 16 classes obtained using the eight multispectral bands were higher than those obtained using only the four standard bands. The increments ranged from 0.1% for the water class to 17.0% for Metasequoia glyptostroboides, with an average value of 4.8% for the 16 classes. In addition, to overcome the “mixed pixels” problem, a crown-based supervised classification, which can improve the classified accuracy of both dominant species and smaller classes, was used for generating a thematic map of tree species. The improvements of the crown- to pixel-based classification ranged from −1.6% for the open forest class to 34.3% for Metasequoia glyptostroboides, with an average value of 20.3% for the 10 classes. All tree tops were then annotated with the species attributes from the map, and a tree count of different species indicated that the forest of Purple Mountain is mainly dominated by Quercus acutissima, Liquidambar formosana

  3. On Tree-Based Phylogenetic Networks.

    Science.gov (United States)

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

  4. Towards a formal genealogical classification of the Lezgian languages (North Caucasus: testing various phylogenetic methods on lexical data.

    Directory of Open Access Journals (Sweden)

    Alexei Kassian

    Full Text Available A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ, Neighbor joining (NJ, Unweighted pair group method with arithmetic mean (UPGMA, Bayesian Markov chain Monte Carlo (MCMC, Unweighted maximum parsimony (UMP. Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances. Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists, the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP have yielded less likely topologies.

  5. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    Science.gov (United States)

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.

  6. Data Fusion Research of Triaxial Human Body Motion Gesture based on Decision Tree

    Directory of Open Access Journals (Sweden)

    Feihong Zhou

    2014-05-01

    Full Text Available The development status of human body motion gesture data fusion domestic and overseas has been analyzed. A triaxial accelerometer is adopted to develop a wearable human body motion gesture monitoring system aimed at old people healthcare. On the basis of a brief introduction of decision tree algorithm, the WEKA workbench is adopted to generate a human body motion gesture decision tree. At last, the classification quality of the decision tree has been validated through experiments. The experimental results show that the decision tree algorithm could reach an average predicting accuracy of 97.5 % with lower time cost.

  7. DeepSAT: A Deep Learning Approach to Tree-cover Delineation in 1-m NAIP Imagery for the Continental United States

    Science.gov (United States)

    Ganguly, S.; Basu, S.; Nemani, R. R.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

    2016-12-01

    High resolution tree cover classification maps are needed to increase the accuracy of current land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) tree cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agriculture Imagery Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with 60 million pixels) and has a total size of 65 terabytes for a single acquisition. Features extracted from the entire dataset would amount to 8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. Using the NASA Earth Exchange (NEX) initiative, we have developed an end-to-end architecture by integrating a segmentation module based on Statistical Region Merging, a classification algorithm using Deep Belief Network and a structured prediction algorithm using Conditional Random Fields to integrate the results from the segmentation and classification modules to create per-pixel class labels. The training process is scaled up using the power of GPUs and the prediction is scaled to quarter million NAIP tiles spanning the whole of Continental United States using the NEX HPC supercomputing cluster. An initial pilot over the

  8. DeepSAT: A Deep Learning Approach to Tree-Cover Delineation in 1-m NAIP Imagery for the Continental United States

    Science.gov (United States)

    Ganguly, Sangram; Basu, Saikat; Nemani, Ramakrishna R.; Mukhopadhyay, Supratik; Michaelis, Andrew; Votava, Petr

    2016-01-01

    High resolution tree cover classification maps are needed to increase the accuracy of current land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) tree cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agriculture Imagery Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with 60 million pixels) and has a total size of 65 terabytes for a single acquisition. Features extracted from the entire dataset would amount to 8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. Using the NASA Earth Exchange (NEX) initiative, we have developed an end-to-end architecture by integrating a segmentation module based on Statistical Region Merging, a classification algorithm using Deep Belief Network and a structured prediction algorithm using Conditional Random Fields to integrate the results from the segmentation and classification modules to create per-pixel class labels. The training process is scaled up using the power of GPUs and the prediction is scaled to quarter million NAIP tiles spanning the whole of Continental United States using the NEX HPC supercomputing cluster. An initial pilot over the

  9. Monitoring stress-related mass variations in Amazon trees using accelerometers

    Science.gov (United States)

    van Emmerik, T. H. M.; Steele-Dunne, S. C.; Gentine, P.; Hut, R.; Guerin, M. F.; Leus, G.; Oliveira, R. S.; Van De Giesen, N.

    2016-12-01

    Containing half of the world's rainforests, the Amazon plays a key role in the global water and carbon budget. However, the Amazon remains poorly understood, but appears to be vulnerable to increasing moisture stress, and future droughts have the potential to considerably change the global water and carbon budget. Field measurements will allow further investigations of the effects of moisture stress and droughts on tree dynamics, and its impact on the water and carbon budget. This study focuses on studying the diurnal mass variations of seven Amazonian tree species. The mass of trees is influenced by physiological processes within the tree (e.g. transpiration and root water uptake), as well as external loads (e.g. intercepted precipitation). Depending on the physiological traits of an individual tree, moisture stress and drought affect processes such as photosynthesis, assimilation, transpiration, and root water uptake. In turn, these have their influence on diurnal mass variations of a tree. Our study uses measured three-dimensional displacement and acceleration of trees, to detect and quantify their diurnal (bio)mass variations. Nineteen accelerometers and dendrometers were installed on seven different tree species in the Amazon rainforest, covering an area of 250 x 250 m. The selected species span a wide range in wood density (0.5 - 1.1), diameter (15 - 40 cm) and height (25 - 60 m). Acceleration was measured with a frequency of 10 Hz, from August 2015 to June 2016, covering both the wet and dry season. On-site additional measurements of net radiation, wind speed at three heights, temperature, and precipitation as available every 15 minutes. Dendrometers measured variation in xylem and bark thickness every 5 minutes. The MUltiple SIgnal Classification (MUSIC) algorithm was applied to the acceleration time series to estimate the frequency spectrum of each tree. A correction was necessary to account for the dominant effect of wind. The resulting spectra reveal

  10. Comparison of leaf-on and leaf-off ALS data for mapping riparian tree species

    Science.gov (United States)

    Laslier, Marianne; Ba, Antoine; Hubert-Moy, Laurence; Dufour, Simon

    2017-10-01

    Forest species composition is a fundamental indicator of forest study and management. However, describing forest species composition at large scales and of highly diverse populations remains an issue for which remote sensing can provide significant contribution, in particular, Airborne Laser Scanning (ALS) data. Riparian corridors are good examples of highly valuable ecosystems, with high species richness and large surface areas that can be time consuming and expensive to monitor with in situ measurements. Remote sensing could be useful to study them, but few studies have focused on monitoring riparian tree species using ALS data. This study aimed to determine which metrics derived from ALS data are best suited to identify and map riparian tree species. We acquired very high density leaf-on and leaf-off ALS data along the Sélune River (France). In addition, we inventoried eight main riparian deciduous tree species along the study site. After manual segmentation of the inventoried trees, we extracted 68 morphological and structural metrics from both leaf-on and leaf-off ALS point clouds. Some of these metrics were then selected using Sequential Forward Selection (SFS) algorithm. Support Vector Machine (SVM) classification results showed good accuracy with 7 metrics (0.77). Both leaf-on and leafoff metrics were kept as important metrics for distinguishing tree species. Results demonstrate the ability of 3D information derived from high density ALS data to identify riparian tree species using external and internal structural metrics. They also highlight the complementarity of leaf-on and leaf-off Lidar data for distinguishing riparian tree species.

  11. Geometry of convex polygons and locally minimal binary trees spanning these polygons

    International Nuclear Information System (INIS)

    Ivanov, A O; Tuzhilin, A A

    1999-01-01

    In previous works the authors have obtained an effective classification of planar locally minimal binary trees with convex boundaries. The main aim of the present paper is to find more subtle restrictions on the possible structure of such trees in terms of the geometry of the given boundary set. Special attention is given to the case of quasiregular boundaries (that is, boundaries that are sufficiently close to regular ones in a certain sense). In particular, a series of quasiregular boundaries that cannot be spanned by a locally minimal binary tree is constructed

  12. Hyper-parameter tuning of a decision tree induction algorithm

    NARCIS (Netherlands)

    Mantovani, R.G.; Horváth, T.; Cerri, R.; Vanschoren, J.; de Carvalho, A.C.P.L.F.

    2017-01-01

    Supervised classification is the most studied task in Machine Learning. Among the many algorithms used in such task, Decision Tree algorithms are a popular choice, since they are robust and efficient to construct. Moreover, they have the advantage of producing comprehensible models and satisfactory

  13. Application of classification trees for the qualitative differentiation of focal liver lesions suspicious for metastasis in gadolinium-EOB-DTPA-enhanced liver MR imaging

    Energy Technology Data Exchange (ETDEWEB)

    Schelhorn, J. [Sophien und Hufeland Klinikum, Weimar (Germany). Dept. of Radiology and Nuclear Medicine; Benndorf, M.; Dietzel, M.; Burmeister, H.P.; Kaiser, W.A.; Baltzer, P.A.T. [Jena Univ. (Germany). Inst. of Diagnostic and Interventional Radiology

    2012-09-15

    Purpose: To evaluate the diagnostic accuracy of qualitative descriptors alone and in combination for the classification of focal liver lesions (FLLs) suspicious for metastasis in gadolinium-EOB-DTPA-enhanced liver MR imaging. Materials and Methods: Consecutive patients with clinically suspected liver metastases were eligible for this retrospective investigation. 50 patients met the inclusion criteria. All underwent Gd-EOB-DTPA-enhanced liver MRI (T2w, chemical shift T1w, dynamic T1w). Primary liver malignancies or treated lesions were excluded. All investigations were read by two blinded observers (O1, O2). Both independently identified the presence of lesions and evaluated predefined qualitative lesion descriptors (signal intensities, enhancement pattern and morphology). A reference standard was determined under consideration of all clinical and follow-up information. Statistical analysis besides contingency tables (chi square, kappa statistics) included descriptor combinations using classification trees (CHAID methodology) as well as ROC analysis. Results: In 38 patients, 120 FLLs (52 benign, 68 malignant) were present. 115 (48 benign, 67 malignant) were identified by the observers. The enhancement pattern, relative SI upon T2w and late enhanced T1w images contributed significantly to the differentiation of FLLs. The overall classification accuracy was 91.3 % (O1) and 88.7 % (O2), kappa = 0.902. Conclusion: The combination of qualitative lesion descriptors proposed in this work revealed high diagnostic accuracy and interobserver agreement in the differentiation of focal liver lesions suspicious for metastases using Gd-EOB-DTPA-enhanced liver MRI. (orig.)

  14. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  15. Large variations in diurnal and seasonal patterns of sap flux among Aleppo pine trees in semi-arid forest reflect tree-scale hydraulic adjustments

    Science.gov (United States)

    Preisler, Yakir; Tatarinov, Fyodor; Rohatyn, Shani; Rotenberg, Eyal; Grünzweig, José M.; Klein, Tamir; Yakir, Dan

    2015-04-01

    Adjustments and adaptations of trees to drought vary across different biomes, species and habitats, with important implications for tree mortality and forest dieback associated with global climate change. The aim of this study was to investigate possible links between the patterns of variations in water flux dynamics and drought resistance in Aleppo pine (Pinus halepensis) trees in a semi-arid stand (Yatir forest, Israel). We measured sap flow (SF) and variations in stem diameter, complemented with short-term campaigns of leaf-scale measurements of water vapour and CO2 gas exchange, branch water potential and hydraulic conductivity, as well as eddy flux measurements of evapotranspiration (ET) from a permanent flux tower at the site. SF rates were well synchronized with ET, reaching maximum rates during midday in all trees during the rainy season (Dec-Apr). However, during the dry season (May-Nov), the daily trend in the rates of SF greatly varied among trees, allowing classification into three tree classes: 1) trees with SF maximum rate constantly occurring in mid-day (12:00-13:00); 2)trees showing a shift to an early morning SF peak (04:00-06:00); and 3) trees shifting their daily SF peak to the evening (16:00-18:00). This classification did not change during the four years study period, between 2010 and 2014. Checking for correlation of tree parameters as DBH, tree height, crown size, and competition indices with rates of SF, indicated that timing of maximum SF in summer was mainly related to tree size (DBH), when large trees tended to have a later SF maximum. Dendrometer measurements indicated that large trees (high DBH) had maximum daily diameter in the morning during summer and winter, while small trees typically had maximum daily diameter during midday and afternoon in winter and summer, respectively. Leaf-scale transpiration (T) measurements showed typical morning peak in all trees, and another peak in the afternoon in large trees only. Different diurnal

  16. Evaluation of forest cover estimates for Haiti using supervised classification of Landsat data

    Science.gov (United States)

    Churches, Christopher E.; Wampler, Peter J.; Sun, Wanxiao; Smith, Andrew J.

    2014-08-01

    This study uses 2010-2011 Landsat Thematic Mapper (TM) imagery to estimate total forested area in Haiti. The thematic map was generated using radiometric normalization of digital numbers by a modified normalization method utilizing pseudo-invariant polygons (PIPs), followed by supervised classification of the mosaicked image using the Food and Agriculture Organization (FAO) of the United Nations Land Cover Classification System. Classification results were compared to other sources of land-cover data produced for similar years, with an emphasis on the statistics presented by the FAO. Three global land cover datasets (GLC2000, Globcover, 2009, and MODIS MCD12Q1), and a national-scale dataset (a land cover analysis by Haitian National Centre for Geospatial Information (CNIGS)) were reclassified and compared. According to our classification, approximately 32.3% of Haiti's total land area was tree covered in 2010-2011. This result was confirmed using an error-adjusted area estimator, which predicted a tree covered area of 32.4%. Standardization to the FAO's forest cover class definition reduces the amount of tree cover of our supervised classification to 29.4%. This result was greater than the reported FAO value of 4% and the value for the recoded GLC2000 dataset of 7.0%, but is comparable to values for three other recoded datasets: MCD12Q1 (21.1%), Globcover (2009) (26.9%), and CNIGS (19.5%). We propose that at coarse resolutions, the segmented and patchy nature of Haiti's forests resulted in a systematic underestimation of the extent of forest cover. It appears the best explanation for the significant difference between our results, FAO statistics, and compared datasets is the accuracy of the data sources and the resolution of the imagery used for land cover analyses. Analysis of recoded global datasets and results from this study suggest a strong linear relationship (R2 = 0.996 for tree cover) between spatial resolution and land cover estimates.

  17. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  18. Tree felling 2014

    CERN Multimedia

    2014-01-01

    With a view to creating new landscapes and making its population of trees safer and healthier, this winter CERN will complete the tree-felling campaign started in 2010.   Tree felling will take place between 15 and 22 November on the Swiss part of the Meyrin site. This work is being carried out above all for safety reasons. The trees to be cut down are at risk of falling as they are too old and too tall to withstand the wind. In addition, the roots of poplar trees are very powerful and spread widely, potentially damaging underground networks, pavements and roadways. Compensatory tree planting campaigns will take place in the future, subject to the availability of funding, with the aim of creating coherent landscapes while also respecting the functional constraints of the site. These matters are being considered in close collaboration with the Geneva nature and countryside directorate (Direction générale de la nature et du paysage, DGNP). GS-SE Group

  19. Prediction of radiation levels in residences: A methodological comparison of CART [Classification and Regression Tree Analysis] and conventional regression

    International Nuclear Information System (INIS)

    Janssen, I.; Stebbings, J.H.

    1990-01-01

    In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs

  20. Ebolavirus Classification Based on Natural Vectors

    Science.gov (United States)

    Zheng, Hui; Yin, Changchuan; Hoang, Tung; He, Rong Lucy; Yang, Jie

    2015-01-01

    According to the WHO, ebolaviruses have resulted in 8818 human deaths in West Africa as of January 2015. To better understand the evolutionary relationship of the ebolaviruses and infer virulence from the relationship, we applied the alignment-free natural vector method to classify the newest ebolaviruses. The dataset includes three new Guinea viruses as well as 99 viruses from Sierra Leone. For the viruses of the family of Filoviridae, both genus label classification and species label classification achieve an accuracy rate of 100%. We represented the relationships among Filoviridae viruses by Unweighted Pair Group Method with Arithmetic Mean (UPGMA) phylogenetic trees and found that the filoviruses can be separated well by three genera. We performed the phylogenetic analysis on the relationship among different species of Ebolavirus by their coding-complete genomes and seven viral protein genes (glycoprotein [GP], nucleoprotein [NP], VP24, VP30, VP35, VP40, and RNA polymerase [L]). The topology of the phylogenetic tree by the viral protein VP24 shows consistency with the variations of virulence of ebolaviruses. The result suggests that VP24 be a pharmaceutical target for treating or preventing ebolaviruses. PMID:25803489

  1. Rate of tree carbon accumulation increases continuously with tree size.

    Science.gov (United States)

    Stephenson, N L; Das, A J; Condit, R; Russo, S E; Baker, P J; Beckman, N G; Coomes, D A; Lines, E R; Morris, W K; Rüger, N; Alvarez, E; Blundo, C; Bunyavejchewin, S; Chuyong, G; Davies, S J; Duque, A; Ewango, C N; Flores, O; Franklin, J F; Grau, H R; Hao, Z; Harmon, M E; Hubbell, S P; Kenfack, D; Lin, Y; Makana, J-R; Malizia, A; Malizia, L R; Pabst, R J; Pongpattananurak, N; Su, S-H; Sun, I-F; Tan, S; Thomas, D; van Mantgem, P J; Wang, X; Wiser, S K; Zavala, M A

    2014-03-06

    Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle--particularly net primary productivity and carbon storage--increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree's total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to undertand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.

  2. Statistical Sensitive Data Protection and Inference Prevention with Decision Tree Methods

    National Research Council Canada - National Science Library

    Chang, LiWu

    2003-01-01

    .... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...

  3. Decision Tree Repository and Rule Set Based Mingjiang River Estuarine Wetlands Classifaction

    Science.gov (United States)

    Zhang, W.; Li, X.; Xiao, W.

    2018-05-01

    The increasing urbanization and industrialization have led to wetland losses in estuarine area of Mingjiang River over past three decades. There has been increasing attention given to produce wetland inventories using remote sensing and GIS technology. Due to inconsistency training site and training sample, traditionally pixel-based image classification methods can't achieve a comparable result within different organizations. Meanwhile, object-oriented image classification technique shows grate potential to solve this problem and Landsat moderate resolution remote sensing images are widely used to fulfill this requirement. Firstly, the standardized atmospheric correct, spectrally high fidelity texture feature enhancement was conducted before implementing the object-oriented wetland classification method in eCognition. Secondly, we performed the multi-scale segmentation procedure, taking the scale, hue, shape, compactness and smoothness of the image into account to get the appropriate parameters, using the top and down region merge algorithm from single pixel level, the optimal texture segmentation scale for different types of features is confirmed. Then, the segmented object is used as the classification unit to calculate the spectral information such as Mean value, Maximum value, Minimum value, Brightness value and the Normalized value. The Area, length, Tightness and the Shape rule of the image object Spatial features and texture features such as Mean, Variance and Entropy of image objects are used as classification features of training samples. Based on the reference images and the sampling points of on-the-spot investigation, typical training samples are selected uniformly and randomly for each type of ground objects. The spectral, texture and spatial characteristics of each type of feature in each feature layer corresponding to the range of values are used to create the decision tree repository. Finally, with the help of high resolution reference images, the

  4. Applying Topographic Classification, Based on the Hydrological Process, to Design Habitat Linkages for Climate Change

    Directory of Open Access Journals (Sweden)

    Yongwon Mo

    2017-11-01

    Full Text Available The use of biodiversity surrogates has been discussed in the context of designing habitat linkages to support the migration of species affected by climate change. Topography has been proposed as a useful surrogate in the coarse-filter approach, as the hydrological process caused by topography such as erosion and accumulation is the basis of ecological processes. However, some studies that have designed topographic linkages as habitat linkages, so far have focused much on the shape of the topography (morphometric topographic classification with little emphasis on the hydrological processes (generic topographic classification to find such topographic linkages. We aimed to understand whether generic classification was valid for designing these linkages. First, we evaluated whether topographic classification is more appropriate for describing actual (coniferous and deciduous and potential (mammals and amphibians habitat distributions. Second, we analyzed the difference in the linkages between the morphometric and generic topographic classifications. The results showed that the generic classification represented the actual distribution of the trees, but neither the morphometric nor the generic classification could represent the potential animal distributions adequately. Our study demonstrated that the topographic classes, according to the generic classification, were arranged successively according to the flow of water, nutrients, and sediment; therefore, it would be advantageous to secure linkages with a width of 1 km or more. In addition, the edge effect would be smaller than with the morphometric classification. Accordingly, we suggest that topographic characteristics, based on the hydrological process, are required to design topographic linkages for climate change.

  5. Three-dimensional object recognition using similar triangles and decision trees

    Science.gov (United States)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  6. City housing atmospheric pollutant impact on emergency visit for asthma: A classification and regression tree approach.

    Science.gov (United States)

    Mazenq, Julie; Dubus, Jean-Christophe; Gaudart, Jean; Charpin, Denis; Viudes, Gilles; Noel, Guilhem

    2017-11-01

    Particulate matter, nitrogen dioxide (NO 2 ) and ozone are recognized as the three pollutants that most significantly affect human health. Asthma is a multifactorial disease. However, the place of residence has rarely been investigated. We compared the impact of air pollution, measured near patients' homes, on emergency department (ED) visits for asthma or trauma (controls) within the Provence-Alpes-Côte-d'Azur region. Variables were selected using classification and regression trees on asthmatic and control population, 3-99 years, visiting ED from January 1 to December 31, 2013. Then in a nested case control study, randomization was based on the day of ED visit and on defined age groups. Pollution, meteorological, pollens and viral data measured that day were linked to the patient's ZIP code. A total of 794,884 visits were reported including 6250 for asthma and 278,192 for trauma. Factors associated with an excess risk of emergency visit for asthma included short-term exposure to NO 2 , female gender, high viral load and a combination of low temperature and high humidity. Short-term exposures to high NO 2 concentrations, as assessed close to the homes of the patients, were significantly associated with asthma-related ED visits in children and adults. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Classification trees for identifying non-use of community-based long-term care services among older adults.

    Science.gov (United States)

    Penkunas, Michael James; Eom, Kirsten Yuna; Chan, Angelique Wei-Ming

    2017-10-01

    Home- and center-based long-term care (LTC) services allow older adults to remain in the community while simultaneously helping caregivers cope with the stresses associated with providing care. Despite these benefits, the uptake of community-based LTC services among older adults remains low. We analyzed data from a longitudinal study in Singapore to identify the characteristics of individuals with referrals to home-based LTC services or day rehabilitation services at the time of hospital discharge. Classification and regression tree analysis was employed to identify combinations of clinical and sociodemographic characteristics of patients and their caregivers for individuals who did not take up their referred services. Patients' level of limitation in activities of daily living (ADL) and caregivers' ethnicity and educational level were the most distinguishing characteristics for identifying older adults who failed to take up their referred home-based services. For day rehabilitation services, patients' level of ADL limitation, home size, age, and possession of a national medical savings account, as well as caregivers' education level, and gender were significant factors influencing service uptake. Identifying subgroups of patients with high rates of non-use can help clinicians target individuals who are need of community-based LTC services but unlikely to engage in formal treatment. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Is overall similarity classification less effortful than single-dimension classification?

    Science.gov (United States)

    Wills, Andy J; Milton, Fraser; Longmore, Christopher A; Hester, Sarah; Robinson, Jo

    2013-01-01

    It is sometimes argued that the implementation of an overall similarity classification is less effortful than the implementation of a single-dimension classification. In the current article, we argue that the evidence securely in support of this view is limited, and report additional evidence in support of the opposite proposition--overall similarity classification is more effortful than single-dimension classification. Using a match-to-standards procedure, Experiments 1A, 1B and 2 demonstrate that concurrent load reduces the prevalence of overall similarity classification, and that this effect is robust to changes in the concurrent load task employed, the level of time pressure experienced, and the short-term memory requirements of the classification task. Experiment 3 demonstrates that participants who produced overall similarity classifications from the outset have larger working memory capacities than those who produced single-dimension classifications initially, and Experiment 4 demonstrates that instructions to respond meticulously increase the prevalence of overall similarity classification.

  9. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers

    Directory of Open Access Journals (Sweden)

    Zhibin Xiao

    2017-02-01

    Full Text Available Recognition of transportation modes can be used in different applications including human behavior research, transport management and traffic control. Previous work on transportation mode recognition has often relied on using multiple sensors or matching Geographic Information System (GIS information, which is not possible in many cases. In this paper, an approach based on ensemble learning is proposed to infer hybrid transportation modes using only Global Position System (GPS data. First, in order to distinguish between different transportation modes, we used a statistical method to generate global features and extract several local features from sub-trajectories after trajectory segmentation, before these features were combined in the classification stage. Second, to obtain a better performance, we used tree-based ensemble models (Random Forest, Gradient Boosting Decision Tree, and XGBoost instead of traditional methods (K-Nearest Neighbor, Decision Tree, and Support Vector Machines to classify the different transportation modes. The experiment results on the later have shown the efficacy of our proposed approach. Among them, the XGBoost model produced the best performance with a classification accuracy of 90.77% obtained on the GEOLIFE dataset, and we used a tree-based ensemble method to ensure accurate feature selection to reduce the model complexity.

  10. A DIMENSION REDUCTION-BASED METHOD FOR CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA

    Directory of Open Access Journals (Sweden)

    B. Abbasi

    2015-12-01

    Full Text Available The existence of various natural objects such as grass, trees, and rivers along with artificial manmade features such as buildings and roads, make it difficult to classify ground objects. Consequently using single data or simple classification approach cannot improve classification results in object identification. Also, using of a variety of data from different sensors; increase the accuracy of spatial and spectral information. In this paper, we proposed a classification algorithm on joint use of hyperspectral and Lidar (Light Detection and Ranging data based on dimension reduction. First, some feature extraction techniques are applied to achieve more information from Lidar and hyperspectral data. Also Principal component analysis (PCA and Minimum Noise Fraction (MNF have been utilized to reduce the dimension of spectral features. The number of 30 features containing the most information of the hyperspectral images is considered for both PCA and MNF. In addition, Normalized Difference Vegetation Index (NDVI has been measured to highlight the vegetation. Furthermore, the extracted features from Lidar data calculated based on relation between every pixel of data and surrounding pixels in local neighbourhood windows. The extracted features are based on the Grey Level Co-occurrence Matrix (GLCM matrix. In second step, classification is operated in all features which obtained by MNF, PCA, NDVI and GLCM and trained by class samples. After this step, two classification maps are obtained by SVM classifier with MNF+NDVI+GLCM features and PCA+NDVI+GLCM features, respectively. Finally, the classified images are fused together to create final classification map by decision fusion based majority voting strategy.

  11. AN ADABOOST OPTIMIZED CCFIS BASED CLASSIFICATION MODEL FOR BREAST CANCER DETECTION

    Directory of Open Access Journals (Sweden)

    CHANDRASEKAR RAVI

    2017-06-01

    Full Text Available Classification is a Data Mining technique used for building a prototype of the data behaviour, using which an unseen data can be classified into one of the defined classes. Several researchers have proposed classification techniques but most of them did not emphasis much on the misclassified instances and storage space. In this paper, a classification model is proposed that takes into account the misclassified instances and storage space. The classification model is efficiently developed using a tree structure for reducing the storage complexity and uses single scan of the dataset. During the training phase, Class-based Closed Frequent ItemSets (CCFIS were mined from the training dataset in the form of a tree structure. The classification model has been developed using the CCFIS and a similarity measure based on Longest Common Subsequence (LCS. Further, the Particle Swarm Optimization algorithm is applied on the generated CCFIS, which assigns weights to the itemsets and their associated classes. Most of the classifiers are correctly classifying the common instances but they misclassify the rare instances. In view of that, AdaBoost algorithm has been used to boost the weights of the misclassified instances in the previous round so as to include them in the training phase to classify the rare instances. This improves the accuracy of the classification model. During the testing phase, the classification model is used to classify the instances of the test dataset. Breast Cancer dataset from UCI repository is used for experiment. Experimental analysis shows that the accuracy of the proposed classification model outperforms the PSOAdaBoost-Sequence classifier by 7% superior to other approaches like Naïve Bayes Classifier, Support Vector Machine Classifier, Instance Based Classifier, ID3 Classifier, J48 Classifier, etc.

  12. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Science.gov (United States)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  13. Rate of tree carbon accumulation increases continuously with tree size

    Science.gov (United States)

    Stephenson, N.L.; Das, A.J.; Condit, R.; Russo, S.E.; Baker, P.J.; Beckman, N.G.; Coomes, D.A.; Lines, E.R.; Morris, W.K.; Rüger, N.; Álvarez, E.; Blundo, C.; Bunyavejchewin, S.; Chuyong, G.; Davies, S.J.; Duque, Á.; Ewango, C.N.; Flores, O.; Franklin, J.F.; Grau, H.R.; Hao, Z.; Harmon, M.E.; Hubbell, S.P.; Kenfack, D.; Lin, Y.; Makana, J.-R.; Malizia, A.; Malizia, L.R.; Pabst, R.J.; Pongpattananurak, N.; Su, S.-H.; Sun, I-F.; Tan, S.; Thomas, D.; van Mantgem, P.J.; Wang, X.; Wiser, S.K.; Zavala, M.A.

    2014-01-01

    Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle—particularly net primary productivity and carbon storage - increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree’s total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to understand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.

  14. Treelink: data integration, clustering and visualization of phylogenetic trees.

    Science.gov (United States)

    Allende, Christian; Sohn, Erik; Little, Cedric

    2015-12-29

    Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .

  15. Application of classification trees for the qualitative differentiation of focal liver lesions suspicious for metastasis in gadolinium-EOB-DTPA-enhanced liver MR imaging.

    Science.gov (United States)

    Schelhorn, J; Benndorf, M; Dietzel, M; Burmeister, H P; Kaiser, W A; Baltzer, P A T

    2012-09-01

    To evaluate the diagnostic accuracy of qualitative descriptors alone and in combination for the classification of focal liver lesions (FLLs) suspicious for metastasis in gadolinium-EOB-DTPA-enhanced liver MR imaging. Consecutive patients with clinically suspected liver metastases were eligible for this retrospective investigation. 50 patients met the inclusion criteria. All underwent Gd-EOB-DTPA-enhanced liver MRI (T2w, chemical shift T1w, dynamic T1w). Primary liver malignancies or treated lesions were excluded. All investigations were read by two blinded observers (O1, O2). Both independently identified the presence of lesions and evaluated predefined qualitative lesion descriptors (signal intensities, enhancement pattern and morphology). A reference standard was determined under consideration of all clinical and follow-up information. Statistical analysis besides contingency tables (chi square, kappa statistics) included descriptor combinations using classification trees (CHAID methodology) as well as ROC analysis. In 38 patients, 120 FLLs (52 benign, 68 malignant) were present. 115 (48 benign, 67 malignant) were identified by the observers. The enhancement pattern, relative SI upon T2w and late enhanced T1w images contributed significantly to the differentiation of FLLs. The overall classification accuracy was 91.3 % (O1) and 88.7 % (O2), kappa = 0.902. The combination of qualitative lesion descriptors proposed in this work revealed high diagnostic accuracy and interobserver agreement in the differentiation of focal liver lesions suspicious for metastases using Gd-EOB-DTPA-enhanced liver MRI. © Georg Thieme Verlag KG Stuttgart · New York.

  16. Sampling the quality of hardwood trees

    Science.gov (United States)

    Adrian M. Gilbert

    1959-01-01

    Anyone acquainted with the conversion of hardwood trees into wood products knows that timber has a wide range in quality. Some trees will yield better products than others. So, in addition to rate of growth and size, tree values are affected by the quality of products yielded.

  17. Visualization of Decision Tree State for the Classification of Parkinson's Disease

    NARCIS (Netherlands)

    Valentijn, E

    2016-01-01

    Decision trees have been shown to be effective at classifying subjects with Parkinson’s disease when provided with features (subject scores) derived from FDG-PET data. Such subject scores have strong discriminative power but are not intuitive to understand. We therefore augment each decision node

  18. Transportation Modes Classification Using Sensors on Smartphones

    Directory of Open Access Journals (Sweden)

    Shih-Hau Fang

    2016-08-01

    Full Text Available This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user’s transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.

  19. A Color-Texture-Structure Descriptor for High-Resolution Satellite Image Classification

    Directory of Open Access Journals (Sweden)

    Huai Yu

    2016-03-01

    Full Text Available Scene classification plays an important role in understanding high-resolution satellite (HRS remotely sensed imagery. For remotely sensed scenes, both color information and texture information provide the discriminative ability in classification tasks. In recent years, substantial performance gains in HRS image classification have been reported in the literature. One branch of research combines multiple complementary features based on various aspects such as texture, color and structure. Two methods are commonly used to combine these features: early fusion and late fusion. In this paper, we propose combining the two methods under a tree of regions and present a new descriptor to encode color, texture and structure features using a hierarchical structure-Color Binary Partition Tree (CBPT, which we call the CTS descriptor. Specifically, we first build the hierarchical representation of HRS imagery using the CBPT. Then we quantize the texture and color features of dense regions. Next, we analyze and extract the co-occurrence patterns of regions based on the hierarchical structure. Finally, we encode local descriptors to obtain the final CTS descriptor and test its discriminative capability using object categorization and scene classification with HRS images. The proposed descriptor contains the spectral, textural and structural information of the HRS imagery and is also robust to changes in illuminant color, scale, orientation and contrast. The experimental results demonstrate that the proposed CTS descriptor achieves competitive classification results compared with state-of-the-art algorithms.

  20. Fragmentation of random trees

    International Nuclear Information System (INIS)

    Kalay, Z; Ben-Naim, E

    2015-01-01

    We study fragmentation of a random recursive tree into a forest by repeated removal of nodes. The initial tree consists of N nodes and it is generated by sequential addition of nodes with each new node attaching to a randomly-selected existing node. As nodes are removed from the tree, one at a time, the tree dissolves into an ensemble of separate trees, namely, a forest. We study statistical properties of trees and nodes in this heterogeneous forest, and find that the fraction of remaining nodes m characterizes the system in the limit N→∞. We obtain analytically the size density ϕ s of trees of size s. The size density has power-law tail ϕ s ∼s −α with exponent α=1+(1/m). Therefore, the tail becomes steeper as further nodes are removed, and the fragmentation process is unusual in that exponent α increases continuously with time. We also extend our analysis to the case where nodes are added as well as removed, and obtain the asymptotic size density for growing trees. (paper)

  1. Black smokers and the Tree of Life

    Science.gov (United States)

    Linich, Michael

    The molecular biology revolution has turned the classification of life on its head. Is Whittaker's five-kingdom scheme for the classification of living things no longer relevant to life science education? Coupled with this is the discovery that most microscopic life cannot yet be brought into culture. One of the key organisms making this knowledge possible is Methanococcus jannishi a microorganism found in black smokers. This workshop presents the development of the Universal Tree of Life in a historical context and then links together major concepts in the New South Wales senior science programs of Earth and Environmental Science and Biology by examining the biological and geological aspects of changes to black smokers over geological time.

  2. Object-Based Point Cloud Analysis of Full-Waveform Airborne Laser Scanning Data for Urban Vegetation Classification

    Directory of Open Access Journals (Sweden)

    Norbert Pfeifer

    2008-08-01

    Full Text Available Airborne laser scanning (ALS is a remote sensing technique well-suited for 3D vegetation mapping and structure characterization because the emitted laser pulses are able to penetrate small gaps in the vegetation canopy. The backscattered echoes from the foliage, woody vegetation, the terrain, and other objects are detected, leading to a cloud of points. Higher echo densities (> 20 echoes/m2 and additional classification variables from full-waveform (FWF ALS data, namely echo amplitude, echo width and information on multiple echoes from one shot, offer new possibilities in classifying the ALS point cloud. Currently FWF sensor information is hardly used for classification purposes. This contribution presents an object-based point cloud analysis (OBPA approach, combining segmentation and classification of the 3D FWF ALS points designed to detect tall vegetation in urban environments. The definition tall vegetation includes trees and shrubs, but excludes grassland and herbage. In the applied procedure FWF ALS echoes are segmented by a seeded region growing procedure. All echoes sorted descending by their surface roughness are used as seed points. Segments are grown based on echo width homogeneity. Next, segment statistics (mean, standard deviation, and coefficient of variation are calculated by aggregating echo features such as amplitude and surface roughness. For classification a rule base is derived automatically from a training area using a statistical classification tree. To demonstrate our method we present data of three sites with around 500,000 echoes each. The accuracy of the classified vegetation segments is evaluated for two independent validation sites. In a point-wise error assessment, where the classification is compared with manually classified 3D points, completeness and correctness better than 90% are reached for the validation sites. In comparison to many other algorithms the proposed 3D point classification works on the original

  3. CUDT: A CUDA Based Decision Tree Algorithm

    Directory of Open Access Journals (Sweden)

    Win-Tsung Lo

    2014-01-01

    Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  4. Voxel-based plaque classification in coronary intravascular optical coherence tomography images using decision trees

    Science.gov (United States)

    Kolluru, Chaitanya; Prabhu, David; Gharaibeh, Yazan; Wu, Hao; Wilson, David L.

    2018-02-01

    Intravascular Optical Coherence Tomography (IVOCT) is a high contrast, 3D microscopic imaging technique that can be used to assess atherosclerosis and guide stent interventions. Despite its advantages, IVOCT image interpretation is challenging and time consuming with over 500 image frames generated in a single pullback volume. We have developed a method to classify voxel plaque types in IVOCT images using machine learning. To train and test the classifier, we have used our unique database of labeled cadaver vessel IVOCT images accurately registered to gold standard cryoimages. This database currently contains 300 images and is growing. Each voxel is labeled as fibrotic, lipid-rich, calcified or other. Optical attenuation, intensity and texture features were extracted for each voxel and were used to build a decision tree classifier for multi-class classification. Five-fold cross-validation across images gave accuracies of 96 % +/- 0.01 %, 90 +/- 0.02% and 90 % +/- 0.01 % for fibrotic, lipid-rich and calcified classes respectively. To rectify performance degradation seen in left out vessel specimens as opposed to left out images, we are adding data and reducing features to limit overfitting. Following spatial noise cleaning, important vascular regions were unambiguous in display. We developed displays that enable physicians to make rapid determination of calcified and lipid regions. This will inform treatment decisions such as the need for devices (e.g., atherectomy or scoring balloon in the case of calcifications) or extended stent lengths to ensure coverage of lipid regions prone to injury at the edge of a stent.

  5. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii).

    Science.gov (United States)

    Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M

    2014-01-01

    Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.

  6. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii.

    Directory of Open Access Journals (Sweden)

    Heping Cao

    Full Text Available Triacylglycerols (TAG are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii, whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach, Populus trichocarpa (poplar, Ricinus communis (castor bean, Theobroma cacao (cacao and Vitis vinifera (grapevine. Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1 All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2 OLE mRNA levels were much higher in seeds than leaves or flowers; 3 OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4 OLE mRNA levels rapidly increased during seed development; and 5 OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.

  7. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    Science.gov (United States)

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.

  8. Modelling tree biomasses in Finland

    Energy Technology Data Exchange (ETDEWEB)

    Repola, J.

    2013-06-01

    Biomass equations for above- and below-ground tree components of Scots pine (Pinus sylvestris L), Norway spruce (Picea abies [L.] Karst) and birch (Betula pendula Roth and Betula pubescens Ehrh.) were compiled using empirical material from a total of 102 stands. These stands (44 Scots pine, 34 Norway spruce and 24 birch stands) were located mainly on mineral soil sites representing a large part of Finland. The biomass models were based on data measured from 1648 sample trees, comprising 908 pine, 613 spruce and 127 birch trees. Biomass equations were derived for the total above-ground biomass and for the individual tree components: stem wood, stem bark, living and dead branches, needles, stump, and roots, as dependent variables. Three multivariate models with different numbers of independent variables for above-ground biomass and one for below-ground biomass were constructed. Variables that are normally measured in forest inventories were used as independent variables. The simplest model formulations, multivariate models (1) were mainly based on tree diameter and height as independent variables. In more elaborated multivariate models, (2) and (3), additional commonly measured tree variables such as age, crown length, bark thickness and radial growth rate were added. Tree biomass modelling includes consecutive phases, which cause unreliability in the prediction of biomass. First, biomasses of sample trees should be determined reliably to decrease the statistical errors caused by sub-sampling. In this study, methods to improve the accuracy of stem biomass estimates of the sample trees were developed. In addition, the reliability of the method applied to estimate sample-tree crown biomass was tested, and no systematic error was detected. Second, the whole information content of data should be utilized in order to achieve reliable parameter estimates and applicable and flexible model structure. In the modelling approach, the basic assumption was that the biomasses of

  9. The transposition distance for phylogenetic trees

    OpenAIRE

    Rossello, Francesc; Valiente, Gabriel

    2006-01-01

    The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phylogenetic trees. In this paper, we generalize the transposition distance from fully resolved to arbi...

  10. A Method for Application of Classification Tree Models to Map Aquatic Vegetation Using Remotely Sensed Images from Different Sensors and Dates

    Directory of Open Access Journals (Sweden)

    Ying Cai

    2012-09-01

    Full Text Available In previous attempts to identify aquatic vegetation from remotely-sensed images using classification trees (CT, the images used to apply CT models to different times or locations necessarily originated from the same satellite sensor as that from which the original images used in model development came, greatly limiting the application of CT. We have developed an effective normalization method to improve the robustness of CT models when applied to images originating from different sensors and dates. A total of 965 ground-truth samples of aquatic vegetation types were obtained in 2009 and 2010 in Taihu Lake, China. Using relevant spectral indices (SI as classifiers, we manually developed a stable CT model structure and then applied a standard CT algorithm to obtain quantitative (optimal thresholds from 2009 ground-truth data and images from Landsat7-ETM+, HJ-1B-CCD, Landsat5-TM and ALOS-AVNIR-2 sensors. Optimal CT thresholds produced average classification accuracies of 78.1%, 84.7% and 74.0% for emergent vegetation, floating-leaf vegetation and submerged vegetation, respectively. However, the optimal CT thresholds for different sensor images differed from each other, with an average relative variation (RV of 6.40%. We developed and evaluated three new approaches to normalizing the images. The best-performing method (Method of 0.1% index scaling normalized the SI images using tailored percentages of extreme pixel values. Using the images normalized by Method of 0.1% index scaling, CT models for a particular sensor in which thresholds were replaced by those from the models developed for images originating from other sensors provided average classification accuracies of 76.0%, 82.8% and 68.9% for emergent vegetation, floating-leaf vegetation and submerged vegetation, respectively. Applying the CT models developed for normalized 2009 images to 2010 images resulted in high classification (78.0%–93.3% and overall (92.0%–93.1% accuracies. Our

  11. Signal classification for acoustic neutrino detection

    International Nuclear Information System (INIS)

    Neff, M.; Anton, G.; Enzenhöfer, A.; Graf, K.; Hößl, J.; Katz, U.; Lahmann, R.; Richardt, C.

    2012-01-01

    This article focuses on signal classification for deep-sea acoustic neutrino detection. In the deep sea, the background of transient signals is very diverse. Approaches like matched filtering are not sufficient to distinguish between neutrino-like signals and other transient signals with similar signature, which are forming the acoustic background for neutrino detection in the deep-sea environment. A classification system based on machine learning algorithms is analysed with the goal to find a robust and effective way to perform this task. For a well-trained model, a testing error on the level of 1% is achieved for strong classifiers like Random Forest and Boosting Trees using the extracted features of the signal as input and utilising dense clusters of sensors instead of single sensors.

  12. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    Science.gov (United States)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  13. Function-centered modeling of engineering systems using the goal tree-success tree technique and functional primitives

    International Nuclear Information System (INIS)

    Modarres, Mohammad; Cheon, Se Woo

    1999-01-01

    Most of the complex systems are formed through some hierarchical evolution. Therefore, those systems can be best described through hierarchical frameworks. This paper describes some fundamental attributes of complex physical systems and several hierarchies such as functional, behavioral, goal/condition, and event hierarchies, then presents a function-centered approach to system modeling. Based on the function-centered concept, this paper describes the joint goal tree-success tree (GTST) and the master logic diagram (MLD) as a framework for developing models of complex physical systems. A function-based lexicon for classifying the most common elements of engineering systems for use in the GTST-MLD framework has been proposed. The classification is based on the physical conservation laws that govern the engineering systems. Functional descriptions based on conservation laws provide a simple and rich vocabulary for modeling complex engineering systems

  14. Active learning strategies for the deduplication of electronic patient data using classification trees.

    Science.gov (United States)

    Sariyar, M; Borg, A; Pommerening, K

    2012-10-01

    Supervised record linkage methods often require a clerical review to gain informative training data. Active learning means to actively prompt the user to label data with special characteristics in order to minimise the review costs. We conducted an empirical evaluation to investigate whether a simple active learning strategy using binary comparison patterns is sufficient or if string metrics together with a more sophisticated algorithm are necessary to achieve high accuracies with a small training set. Based on medical registry data with different numbers of attributes, we used active learning to acquire training sets for classification trees, which were then used to classify the remaining data. Active learning for binary patterns means that every distinct comparison pattern represents a stratum from which one item is sampled. Active learning for patterns consisting of the Levenshtein string metric values uses an iterative process where the most informative and representative examples are added to the training set. In this context, we extended the active learning strategy by Sarawagi and Bhamidipaty (2002). On the original data set, active learning based on binary comparison patterns leads to the best results. When dropping four or six attributes, using string metrics leads to better results. In both cases, not more than 200 manually reviewed training examples are necessary. In record linkage applications where only forename, name and birthday are available as attributes, we suggest the sophisticated active learning strategy based on string metrics in order to achieve highly accurate results. We recommend the simple strategy if more attributes are available, as in our study. In both cases, active learning significantly reduces the amount of manual involvement in training data selection compared to usual record linkage settings. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. Analysis of Chi-square Automatic Interaction Detection (CHAID) and Classification and Regression Tree (CRT) for Classification of Corn Production

    Science.gov (United States)

    Susanti, Yuliana; Zukhronah, Etik; Pratiwi, Hasih; Respatiwulan; Sri Sulistijowati, H.

    2017-11-01

    To achieve food resilience in Indonesia, food diversification by exploring potentials of local food is required. Corn is one of alternating staple food of Javanese society. For that reason, corn production needs to be improved by considering the influencing factors. CHAID and CRT are methods of data mining which can be used to classify the influencing variables. The present study seeks to dig up information on the potentials of local food availability of corn in regencies and cities in Java Island. CHAID analysis yields four classifications with accuracy of 78.8%, while CRT analysis yields seven classifications with accuracy of 79.6%.

  16. Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods.

    Science.gov (United States)

    Goloboff, Pablo A

    2014-10-01

    Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the "selected" and "informative" addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner

  17. Iris Image Classification Based on Hierarchical Visual Codebook.

    Science.gov (United States)

    Zhenan Sun; Hui Zhang; Tieniu Tan; Jianyu Wang

    2014-06-01

    Iris recognition as a reliable method for personal identification has been well-studied with the objective to assign the class label of each iris image to a unique subject. In contrast, iris image classification aims to classify an iris image to an application specific category, e.g., iris liveness detection (classification of genuine and fake iris images), race classification (e.g., classification of iris images of Asian and non-Asian subjects), coarse-to-fine iris identification (classification of all iris images in the central database into multiple categories). This paper proposes a general framework for iris image classification based on texture analysis. A novel texture pattern representation method called Hierarchical Visual Codebook (HVC) is proposed to encode the texture primitives of iris images. The proposed HVC method is an integration of two existing Bag-of-Words models, namely Vocabulary Tree (VT), and Locality-constrained Linear Coding (LLC). The HVC adopts a coarse-to-fine visual coding strategy and takes advantages of both VT and LLC for accurate and sparse representation of iris texture. Extensive experimental results demonstrate that the proposed iris image classification method achieves state-of-the-art performance for iris liveness detection, race classification, and coarse-to-fine iris identification. A comprehensive fake iris image database simulating four types of iris spoof attacks is developed as the benchmark for research of iris liveness detection.

  18. An ordinal classification approach for CTG categorization.

    Science.gov (United States)

    Georgoulas, George; Karvelis, Petros; Gavrilis, Dimitris; Stylios, Chrysostomos D; Nikolakopoulos, George

    2017-07-01

    Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.

  19. Phylogenetic trees in bioinformatics

    Energy Technology Data Exchange (ETDEWEB)

    Burr, Tom L [Los Alamos National Laboratory

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  20. Comparison of pixel and object-based classification for burned area mapping using SPOT-6 images

    Directory of Open Access Journals (Sweden)

    Elif Sertel

    2016-07-01

    Full Text Available On 30 May 2013, a forest fire occurred in Izmir, Turkey causing damage to both forest and fruit trees within the region. In this research, pre- and post-fire SPOT-6 images obtained on 30 April 2013 and 31 May 2013 were used to identify the extent of forest fire within the region. SPOT-6 images of the study region were orthorectified and classified using pixel and object-based classification (OBC algorithms to accurately delineate the boundaries of burned areas. The present results show that for OBC using only normalized difference vegetation index (NDVI thresholds is not sufficient enough to map the burn scars; however, creating a new and simple rule set that included mean brightness values of near infrared and red channels in addition to mean NDVI values of segments considerably improved the accuracy of classification. According to the accuracy assessment results, the burned area was mapped with a 0.9322 kappa value in OBC, while a 0.7433 kappa value was observed in pixel-based classification. Lastly, classification results were integrated with the forest management map to determine the effected forest types after the fire to be used by the National Forest Directorate for their operational activities to effectively manage the fire, response and recovery processes.

  1. A Metric on Phylogenetic Tree Shapes.

    Science.gov (United States)

    Colijn, C; Plazzotta, G

    2018-01-01

    The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees' branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  2. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    Energy Technology Data Exchange (ETDEWEB)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K. [Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT (United Kingdom); McEwen, Jason D., E-mail: dr.michelle.lochner@gmail.com [Mullard Space Science Laboratory, University College London, Surrey RH5 6NT (United Kingdom)

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  3. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    International Nuclear Information System (INIS)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.; McEwen, Jason D.

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  4. Munitions Classification Library

    Science.gov (United States)

    2016-04-04

    members of the community to make their own additions to any, or all, of the classification libraries . The next phase entailed data collection over less......Include area code) 04/04/2016 Final Report August 2014 - August 2015 MUNITIONS CLASSIFICATION LIBRARY Mr. Craig Murray, Parsons Dr. Thomas H. Bell, Leidos

  5. Nonbinary tree-based phylogenetic networks

    OpenAIRE

    Jetten, Laura; van Iersel, Leo

    2016-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and st...

  6. Floral markers of strawberry tree (Arbutus unedo L.) honey.

    Science.gov (United States)

    Tuberoso, Carlo I G; Bifulco, Ersilia; Caboni, Pierluigi; Cottiglia, Filippo; Cabras, Paolo; Floris, Ignazio

    2010-01-13

    Strawberry tree honey, due to its characteristic bitter taste, is one of the most typical Mediterranean honeys, with Sardinia being one of the largest producers. According to specific chemical studies, homogentisic acid was identified as a possible marker of this honey. This work, based on HPLC-DAD-MS/MS analysis of strawberry tree (Arbutus unedo L.) honeys, previously selected by sensory evaluation and melissopalynological analysis, showed that, in addition to the above-mentioned acid, there were other high levels of substances useful for the botanical classification of this unifloral honey. Two of these compounds were isolated and identified as (+/-)-2-cis,4-trans-abscisic acid (c,t-ABA) and (+/-)-2-trans,4-trans-abscisic acid (t,t-ABA). A third compound, a new natural product named unedone, was characterized as an epoxidic derivative of the above-mentioned acids. Structures of c,t-ABA, t,t-ABA, and unedone were elucidated on the basis of extensive 1D and 2D NMR experiments, as well as HPLC-MS/MS and Q-TOF analysis. In selected honeys the average amounts of c,t-ABA, t,t-ABA, and unedone were 176.2+/-25.4, 162.3+/-21.1, and 32.9+/-7.1 mg/kg, respectively. Analysis of the A. unedo nectar confirmed the floral origin of these compounds found in the honey. Abscisic acids were found in other unifloral honeys but not in such high amount and with a constant ratio of about 1:1. For this reason, besides homogentisic acid, these compounds could be used as complementary markers of strawberry tree honey.

  7. Phylogenetic trees and Euclidean embeddings.

    Science.gov (United States)

    Layer, Mark; Rhodes, John A

    2017-01-01

    It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.

  8. PhyTB: Phylogenetic tree visualisation and sample positioning for M. tuberculosis

    KAUST Repository

    Benavente, Ernest D

    2015-05-13

    Background Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting. Results We have developed a web-based tool called PhyTB (http://pathogenseq.lshtm.ac.uk/phytblive/index.php webcite) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates. Conclusion PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms (http://sourceforge.net/projects/phylotrack webcite).

  9. Effects of nurse trees, spacing, and tree species on biomass production in mixed forest plantations

    DEFF Research Database (Denmark)

    Nord-Larsen, Thomas; Meilby, Henrik

    2016-01-01

    Growing concern about increasing concentrations of greenhouse gases in the atmosphere, and resulting global climate change, has spurred a growing demand for renewable energy. In this study, we hypothesized that a nurse tree crop may provide additional early yields of biomass for fuel, while...... was in most cases reduced due to competition. However, provided timely thinning of nurse trees, the qualitative development of the trees will allow for long-term timber production....

  10. Decision tree analysis in subarachnoid hemorrhage: prediction of outcome parameters during the course of aneurysmal subarachnoid hemorrhage using decision tree analysis.

    Science.gov (United States)

    Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela

    2018-01-19

    OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.

  11. Boosting bonsai trees for handwritten/printed text discrimination

    Science.gov (United States)

    Ricquebourg, Yann; Raymond, Christian; Poirriez, Baptiste; Lemaitre, Aurélie; Coüasnon, Bertrand

    2013-12-01

    Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.

  12. Classification of the financial sustainability of health insurance beneficiaries through data mining techniques

    Directory of Open Access Journals (Sweden)

    Sílvia Maria Dias Pedro Rebouças

    2016-09-01

    Full Text Available Advances in information technologies have led to the storage of large amounts of data by organizations. An analysis of this data through data mining techniques is important support for decision-making. This article aims to apply techniques for the classification of the beneficiaries of an operator of health insurance in Brazil, according to their financial sustainability, via their sociodemographic characteristics and their healthcare cost history. Beneficiaries with a loss ratio greater than 0.75 are considered unsustainable. The sample consists of 38875 beneficiaries, active between the years 2011 and 2013. The techniques used were logistic regression and classification trees. The performance of the models was compared to accuracy rates and receiver operating Characteristic curves (ROC curves, by determining the area under the curves (AUC. The results showed that most of the sample is composed of sustainable beneficiaries. The logistic regression model had a 68.43% accuracy rate with AUC of 0.7501, and the classification tree obtained 67.76% accuracy and an AUC of 0.6855. Age and the type of plan were the most important variables related to the profile of the beneficiaries in the classification. The highlights with regard to healthcare costs were annual spending on consultation and on dental insurance.

  13. Dual-tree complex wavelet for medical image watermarking

    International Nuclear Information System (INIS)

    Mavudila, K.R.; Ndaye, B.M.; Masmoudi, L.; Hassanain, N.; Cherkaoui, M.

    2010-01-01

    In order to transmit medical data between hospitals, we insert the information for each patient in the image and its diagnosis, the watermarking consist to insert a message in the image and try to find it with the maximum possible fidelity. This paper presents a blind watermarking scheme in wavelet transform domain dual tree (DTT), who increasing the robustness and preserves the image quality. This system is transparent to the user and allows image integrity control. In addition, it provides information on the location of potential alterations and an evaluation of image modifications which is of major importance in a medico-legal framework. An example using head magnetic resonance and mammography imaging illustrates the overall method. Wavelet techniques can be successfully applied in various image processing methods, namely in image de noising, segmentation, classification, watermarking and others. In this paper we discussed the application of dual tree complex wavelet transform (D T-CWT), which has significant advantages over classic discrete wavelet transform (DWT), for certain image processing problems. The D T-CWT is a form of discreet wavelet transform which generates complex coefficients by using a dual tree of wavelet filters to obtain their real and imaginary parts. The main part of the paper is devoted to profit the exceptional quality for D T-CWT, compared to classical DWT, for a blind medical image watermarking, our schemes are using for the performance bivariate shrinkage with local variance estimation and are robust of attacks and favourably preserves the visual quality. Experimental results show that embedded watermarks using CWT give good image quality and are robust in comparison with the classical DWT.

  14. Decision trees and decision committee applied to star/galaxy separation problem

    Science.gov (United States)

    Vasconcellos, Eduardo Charles

    Vasconcellos et al [1] study the efficiency of 13 diferente decision tree algorithms applied to photometric data in the Sloan Digital Sky Digital Survey Data Release Seven (SDSS-DR7) to perform star/galaxy separation. Each algorithm is defined by a set fo parameters which, when varied, produce diferente final classifications trees. In that work we extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. We find that Functional Tree algorithm (FT) yields the best results by the mean completeness function (galaxy true positive rate) in two magnitude intervals:14=19 (82.1%). We compare FT classification to the SDSS parametric, 2DPHOT and Ball et al (2006) classifications. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ( 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 train six FT classifiers with random selected objects from the same 884,126 SDSS-DR7 objects with spectroscopic data that we use before. Both, the decision commitee and our previous single FT classifier will be applied to the new ojects from SDSS data releses eight, nine and ten. Finally we will compare peformances of both methods in this new data set. [1] Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Fraga Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R.. Decision Tree Classifiers for Star/Galaxy Separation. The Astronomical Journal, Volume 141, Issue 6, 2011.

  15. EFFECTIVE MULTI-RESOLUTION TRANSFORM IDENTIFICATION FOR CHARACTERIZATION AND CLASSIFICATION OF TEXTURE GROUPS

    Directory of Open Access Journals (Sweden)

    S. Arivazhagan

    2011-11-01

    Full Text Available Texture classification is important in applications of computer image analysis for characterization or classification of images based on local spatial variations of intensity or color. Texture can be defined as consisting of mutually related elements. This paper proposes an experimental approach for identification of suitable multi-resolution transform for characterization and classification of different texture groups based on statistical and co-occurrence features derived from multi-resolution transformed sub bands. The statistical and co-occurrence feature sets are extracted for various multi-resolution transforms such as Discrete Wavelet Transform (DWT, Stationary Wavelet Transform (SWT, Double Density Wavelet Transform (DDWT and Dual Tree Complex Wavelet Transform (DTCWT and then, the transform that maximizes the texture classification performance for the particular texture group is identified.

  16. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    Science.gov (United States)

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  17. Nonbinary Tree-Based Phylogenetic Networks.

    Science.gov (United States)

    Jetten, Laura; van Iersel, Leo

    2018-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can, for example, represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.

  18. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    Science.gov (United States)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  19. A Classification Framework Applied to Cancer Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Hussein Hijazi

    2013-01-01

    Full Text Available Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM, bagging, and random forest on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression increase the prediction accuracy as compared to using gene expression alone.

  20. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  1. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units

    International Nuclear Information System (INIS)

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien

    2015-01-01

    Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables

  2. Using decision trees and their ensembles for analysis of NIR spectroscopic data

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey V.

    and interpretation of the models. In this presentation, we are going to discuss an applicability of decision trees based methods (including gradient boosting) for solving classification and regression tasks with NIR spectra as predictors. We will cover such aspects as evaluation, optimization and validation......Advanced machine learning methods, like convolutional neural networks and decision trees, became extremely popular in the last decade. This, first of all, is directly related to the current boom in Big data analysis, where traditional statistical methods are not efficient. According to the kaggle.......com — the most popular online resource for Big data problems and solutions — methods based on decision trees and their ensembles are most widely used for solving the problems. It can be noted that the decision trees and convolutional neural networks are not very popular in Chemometrics. One of the reasons...

  3. A Climatic Classification for Citrus Winter Survival in China.

    Science.gov (United States)

    Shou, Bo Huang

    1991-05-01

    The citrus tree is susceptible to frost damage. Winter injury to citrus from freezing weather is the major meteorological problem in the northern pail of citrus growing regions in China. Based on meteorological data collected at 120 stations in southern China and on the extent of citrus freezing injury, five climatic regions for citrus winter survival in China were developed. They were: 1) no citrus tree injury. 2) light injury to mandarins (citrus reticulate) or moderate injury to oranges (citrus sinensis), 3) moderate injury to mandarins or heavy injury to oranges, 4) heavy injury to mandarins, and 5) impossible citrus tree growth. This citrus climatic classification was an attempt to provide guidelines for regulation of citrus production, to effectively utilize land and climatic resources, to chose suitable citrus varieties, and to develop methods to prevent injury by freezing.

  4. Los Angeles 1-Million tree canopy cover assessment

    Science.gov (United States)

    Gregory E. McPherson; James R. Simpson; Qingfu Xiao; Wu Chunxia

    2008-01-01

    The Million Trees LA initiative intends to chart a course for sustainable growth through planting and stewardship of trees. The purpose of this study was to measure Los Angeles's existing tree canopy cover (TCC), determine if space exists for 1 million additional trees, and estimate future benefits from the planting. High resolution QuickBird remote sensing data,...

  5. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    Science.gov (United States)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  6. Modelling modulus of elasticity of Pinus pinaster Ait. in northwestern Spain with standing tree acoustic measurements, tree, stand and site variables

    Directory of Open Access Journals (Sweden)

    Esther Merlo

    2014-04-01

    Full Text Available Aim of study: Modelling the structural quality of Pinus pinaster Ait. wood on the basis of measurements made on standing trees is essential because of the importance of the species in the Galician forestry and timber industries and the good mechanical properties of its wood. In this study, we investigated how timber stiffness is affected by tree and stand properties, climatic and edaphic characteristics and competition. Area of study: The study was performed in Galicia, north-western Spain.Material and methods: Ten pure and even-aged P. pinaster stands were selected and tree and stand variables and the stress wave velocity of 410 standing trees were measured. A sub-sample of 73 trees, representing the variability in acoustic velocity, were felled and sawed into structural timber pieces (224 which were subjected to a bending test to determine the modulus of elasticity (MOE. Main results: Linear models including wood properties explained more than 97%, 73% and 60% of the observed MOE variability at site, tree and board level, respectively, with acoustic velocity and wood density as the main regressors. Other linear models, which did not include wood density, explained more than 88%, 69% and 55% of the observed MOE variability at site, tree and board level, respectively, with acoustic velocity as the main regressor. Moreover, a classification tree for estimating the visual grade according to standard UNE 56544:2011 was developed. Research highlights: The results have demonstrated the usefulness of acoustic velocity for predicting MOE in standing trees. The use of the fitted equations together with existing dynamic growth models will enable preliminary assessment of timber stiffness in relation to different silvicultural alternatives used with this species.Keywords: stress wave velocity, modulus of elasticity, site index, competition index, stepwise regression, CART.

  7. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    Science.gov (United States)

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  8. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides

    Directory of Open Access Journals (Sweden)

    Stanislawski Jerzy

    2013-01-01

    Full Text Available Abstract Background Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%. The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile to 0.5 CPU-hours (simplified 3D profile to seconds (machine learning. Conclusions We showed that the simplified profile generation method does not introduce an error with regard to the original method, while

  9. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.

    Science.gov (United States)

    Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd

    2013-01-17

    Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset

  10. Soil classification basing on the spectral characteristics of topsoil samples

    Science.gov (United States)

    Liu, Huanjun; Zhang, Xiaokang; Zhang, Xinle

    2016-04-01

    Soil taxonomy plays an important role in soil utility and management, but China has only course soil map created based on 1980s data. New technology, e.g. spectroscopy, could simplify soil classification. The study try to classify soils basing on the spectral characteristics of topsoil samples. 148 topsoil samples of typical soils, including Black soil, Chernozem, Blown soil and Meadow soil, were collected from Songnen plain, Northeast China, and the room spectral reflectance in the visible and near infrared region (400-2500 nm) were processed with weighted moving average, resampling technique, and continuum removal. Spectral indices were extracted from soil spectral characteristics, including the second absorption positions of spectral curve, the first absorption vale's area, and slope of spectral curve at 500-600 nm and 1340-1360 nm. Then K-means clustering and decision tree were used respectively to build soil classification model. The results indicated that 1) the second absorption positions of Black soil and Chernozem were located at 610 nm and 650 nm respectively; 2) the spectral curve of the meadow is similar to its adjacent soil, which could be due to soil erosion; 3) decision tree model showed higher classification accuracy, and accuracy of Black soil, Chernozem, Blown soil and Meadow are 100%, 88%, 97%, 50% respectively, and the accuracy of Blown soil could be increased to 100% by adding one more spectral index (the first two vole's area) to the model, which showed that the model could be used for soil classification and soil map in near future.

  11. Classifying dysmorphic syndromes by using artificial neural network based hierarchical decision tree.

    Science.gov (United States)

    Özdemir, Merve Erkınay; Telatar, Ziya; Eroğul, Osman; Tunca, Yusuf

    2018-05-01

    Dysmorphic syndromes have different facial malformations. These malformations are significant to an early diagnosis of dysmorphic syndromes and contain distinctive information for face recognition. In this study we define the certain features of each syndrome by considering facial malformations and classify Fragile X, Hurler, Prader Willi, Down, Wolf Hirschhorn syndromes and healthy groups automatically. The reference points are marked on the face images and ratios between the points' distances are taken into consideration as features. We suggest a neural network based hierarchical decision tree structure in order to classify the syndrome types. We also implement k-nearest neighbor (k-NN) and artificial neural network (ANN) classifiers to compare classification accuracy with our hierarchical decision tree. The classification accuracy is 50, 73 and 86.7% with k-NN, ANN and hierarchical decision tree methods, respectively. Then, the same images are shown to a clinical expert who achieve a recognition rate of 46.7%. We develop an efficient system to recognize different syndrome types automatically in a simple, non-invasive imaging data, which is independent from the patient's age, sex and race at high accuracy. The promising results indicate that our method can be used for pre-diagnosis of the dysmorphic syndromes by clinical experts.

  12. Assessing the Effectiveness of Statistical Classification Techniques in Predicting Future Employment of Participants in the Temporary Assistance for Needy Families Program

    Science.gov (United States)

    Montoya, Isaac D.

    2008-01-01

    Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…

  13. Mapping trees outside forests using high-resolution aerial imagery: a comparison of pixel- and object based classification approaches

    Science.gov (United States)

    Dacia M. Meneguzzo; Greg C. Liknes; Mark D. Nelson

    2013-01-01

    Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics....

  14. Towards Automatic Trunk Classification on Young Conifers

    DEFF Research Database (Denmark)

    Petri, Stig; Immerkær, John

    2009-01-01

    In the garden nursery industry providing young Nordmann firs for Christmas tree plantations, there is a rising interest in automatic classification of their products to ensure consistently high quality and reduce the cost of manual labor. This paper describes a fully automatic single-view algorit...... performance of the algorithm by incorporating color information into the data considered by the dynamic programming algorithm....

  15. Toward functional classification of neuronal types.

    Science.gov (United States)

    Sharpee, Tatyana O

    2014-09-17

    How many types of neurons are there in the brain? This basic neuroscience question remains unsettled despite many decades of research. Classification schemes have been proposed based on anatomical, electrophysiological, or molecular properties. However, different schemes do not always agree with each other. This raises the question of whether one can classify neurons based on their function directly. For example, among sensory neurons, can a classification scheme be devised that is based on their role in encoding sensory stimuli? Here, theoretical arguments are outlined for how this can be achieved using information theory by looking at optimal numbers of cell types and paying attention to two key properties: correlations between inputs and noise in neural responses. This theoretical framework could help to map the hierarchical tree relating different neuronal classes within and across species. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. REMOTE SENSING IMAGE CLASSIFICATION APPLIED TO THE FIRST NATIONAL GEOGRAPHICAL INFORMATION CENSUS OF CHINA

    Directory of Open Access Journals (Sweden)

    X. Yu

    2016-06-01

    Full Text Available Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC and Maximum Likelihood Classification Method (MLC in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.

  17. Classification algorithms using adaptive partitioning

    KAUST Repository

    Binev, Peter; Cohen, Albert; Dahmen, Wolfgang; DeVore, Ronald

    2014-01-01

    © 2014 Institute of Mathematical Statistics. Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335.1353; Mach. Learn. 66 (2007) 209.242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter - of margin conditions and a rate s of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness β of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Holder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

  18. Classification algorithms using adaptive partitioning

    KAUST Repository

    Binev, Peter

    2014-12-01

    © 2014 Institute of Mathematical Statistics. Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335.1353; Mach. Learn. 66 (2007) 209.242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter - of margin conditions and a rate s of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness β of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Holder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

  19. Using laser altimetry-based segmentation to refine automated tree identification in managed forests of the Black Hills, South Dakota

    Science.gov (United States)

    Eric Rowell; Carl Selelstad; Lee Vierling; Lloyd Queen; Wayne Sheppard

    2006-01-01

    The success of a local maximum (LM) tree detection algorithm for detecting individual trees from lidar data depends on stand conditions that are often highly variable. A laser height variance and percent canopy cover (PCC) classification is used to segment the landscape by stand condition prior to stem detection. We test the performance of the LM algorithm using canopy...

  20. IND - THE IND DECISION TREE PACKAGE

    Science.gov (United States)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  1. Multisensor multiresolution data fusion for improvement in classification

    Science.gov (United States)

    Rubeena, V.; Tiwari, K. C.

    2016-04-01

    The rapid advancements in technology have facilitated easy availability of multisensor and multiresolution remote sensing data. Multisensor, multiresolution data contain complementary information and fusion of such data may result in application dependent significant information which may otherwise remain trapped within. The present work aims at improving classification by fusing features of coarse resolution hyperspectral (1 m) LWIR and fine resolution (20 cm) RGB data. The classification map comprises of eight classes. The class names are Road, Trees, Red Roof, Grey Roof, Concrete Roof, Vegetation, bare Soil and Unclassified. The processing methodology for hyperspectral LWIR data comprises of dimensionality reduction, resampling of data by interpolation technique for registering the two images at same spatial resolution, extraction of the spatial features to improve classification accuracy. In the case of fine resolution RGB data, the vegetation index is computed for classifying the vegetation class and the morphological building index is calculated for buildings. In order to extract the textural features, occurrence and co-occurence statistics is considered and the features will be extracted from all the three bands of RGB data. After extracting the features, Support Vector Machine (SVMs) has been used for training and classification. To increase the classification accuracy, post processing steps like removal of any spurious noise such as salt and pepper noise is done which is followed by filtering process by majority voting within the objects for better object classification.

  2. Tweet-based Target Market Classification Using Ensemble Method

    Directory of Open Access Journals (Sweden)

    Muhammad Adi Khairul Anshary

    2016-09-01

    Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.

  3. Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients.

    Science.gov (United States)

    Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L

    2012-08-07

    Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in

  4. Classification and regression tree (CART model to predict pulmonary tuberculosis in hospitalized patients

    Directory of Open Access Journals (Sweden)

    Aguiar Fabio S

    2012-08-01

    Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with

  5. Tests with VHR images for the identification of olive trees and other fruit trees in the European Union

    Science.gov (United States)

    Masson, Josiane; Soille, Pierre; Mueller, Rick

    2004-10-01

    In the context of the Common Agricultural Policy (CAP) there is a strong interest of the European Commission for counting and individually locating fruit trees. An automatic counting algorithm developed by the JRC (OLICOUNT) was used in the past for olive trees only, on 1m black and white orthophotos but with limits in case of young trees or irregular groves. This study investigates the improvement of fruit tree identification using VHR images on a large set of data in three test sites, one in Creta (Greece; one in the south-east of France with a majority of olive trees and associated fruit trees, and the last one in Florida on citrus trees. OLICOUNT was compared with two other automatic tree counting, applications, one using the CRISP software on citrus trees and the other completely automatic based on regional minima (morphological image analysis). Additional investigation was undertaken to refine the methods. This paper describes the automatic methods and presents the results derived from the tests.

  6. New stopping rules for dendrogram classification in TWINSPAN

    Directory of Open Access Journals (Sweden)

    Omid Esmailzadeh

    2015-12-01

    Full Text Available The aim of this study is to propose a modification of TWINSPAN algorithm with introducing new stopping rules for TWINSPAN. Modified TWINSPAN combines the analysis of heterogeneity of the clusters prior to each division to prevent the imposed divisions of homogeneous clusters and it also solved the limitation of classical TWINSPAN in which the number of clusters increases power of two. For this purpose, ecological groups of Box tree stands in Farim forests were classified with using classical and modified TWINSPAN basis of plant species cover percentage of 60 plots with 400 m2 surface area which were made by releve method (by consideration of indicator stand concept. In this relation, five different heterogeneity measures including Whittaker’s beta diversity and total inertia, Sorensen, Jaccard and Orlo´ci dissimilarity indices which representing diversity and distance indices respectively were involved. Sample plots were also classified from basis of topographical properties using cluster analysis with emphasizing Euclidean distance coefficient and Wards clustering method. Results showed that using of two sets of heterogeneity indices lead to different classification dendrograms. In this relation, results of Whittaker’s beta with total inertia as diversity indices were similar and the other three dissimilarity indices have shown similar behavior. Finally, our results reiterated that modified TWINSPAN did not alter the logic of the TWINSPAN classification, but it increased the flexibility of TWINSPAN dendrogram with changing the hierarchy of divisions in the final classification of ecological groups of Box tree stands in Farim forests.

  7. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    Science.gov (United States)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  8. Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery.

    Science.gov (United States)

    Li, Guiying; Lu, Dengsheng; Moran, Emilio; Hetrick, Scott

    2011-01-01

    This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms - maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes.

  9. Hierarchically structured identification and classification method for vibrational monitoring of reactor components

    International Nuclear Information System (INIS)

    Saedtler, E.

    1981-01-01

    The dissertation discusses: 1. Approximative filter algorithms for identification of systems and hierarchical structures. 2. Adaptive statistical pattern recognition and classification. 3. Parameter selection, extraction, and modelling for an automatic control system. 4. Design of a decision tree and an adaptive diagnostic system. (orig./RW) [de

  10. Monitoring Urban Tree Cover Using Object-Based Image Analysis and Public Domain Remotely Sensed Data

    Directory of Open Access Journals (Sweden)

    Meghan Halabisky

    2011-10-01

    Full Text Available Urban forest ecosystems provide a range of social and ecological services, but due to the heterogeneity of these canopies their spatial extent is difficult to quantify and monitor. Traditional per-pixel classification methods have been used to map urban canopies, however, such techniques are not generally appropriate for assessing these highly variable landscapes. Landsat imagery has historically been used for per-pixel driven land use/land cover (LULC classifications, but the spatial resolution limits our ability to map small urban features. In such cases, hyperspatial resolution imagery such as aerial or satellite imagery with a resolution of 1 meter or below is preferred. Object-based image analysis (OBIA allows for use of additional variables such as texture, shape, context, and other cognitive information provided by the image analyst to segment and classify image features, and thus, improve classifications. As part of this research we created LULC classifications for a pilot study area in Seattle, WA, USA, using OBIA techniques and freely available public aerial photography. We analyzed the differences in accuracies which can be achieved with OBIA using multispectral and true-color imagery. We also compared our results to a satellite based OBIA LULC and discussed the implications of per-pixel driven vs. OBIA-driven field sampling campaigns. We demonstrated that the OBIA approach can generate good and repeatable LULC classifications suitable for tree cover assessment in urban areas. Another important finding is that spectral content appeared to be more important than spatial detail of hyperspatial data when it comes to an OBIA-driven LULC.

  11. Land use/cover classification in the Brazilian Amazon using satellite images.

    Science.gov (United States)

    Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant'anna, Sidnei João Siqueira

    2012-09-01

    Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data.

  12. Effects of elevated carbon dioxide and nitrogen addition on foliar stoichiometry of nitrogen and phosphorus of five tree species in subtropical model forest ecosystems

    International Nuclear Information System (INIS)

    Huang Wenjuan; Zhou Guoyi; Liu Juxiu; Zhang Deqiang; Xu Zhihong; Liu Shizhong

    2012-01-01

    The effects of elevated carbon dioxide (CO 2 ) and nitrogen (N) addition on foliar N and phosphorus (P) stoichiometry were investigated in five native tree species (four non-N 2 fixers and one N 2 fixer) in open-top chambers in southern China from 2005 to 2009. The high foliar N:P ratios induced by high foliar N and low foliar P indicate that plants may be more limited by P than by N. The changes in foliar N:P ratios were largely determined by P dynamics rather than N under both elevated CO 2 and N addition. Foliar N:P ratios in the non-N 2 fixers showed some negative responses to elevated CO 2 , while N addition reduced foliar N:P ratios in the N 2 fixer. The results suggest that N addition would facilitate the N 2 fixer rather than the non-N 2 fixers to regulate the stoichiometric balance under elevated CO 2 . - Highlights: ► Five native tree species in southern China were more limited by P than by N. ► Shifts in foliar N:P ratios were driven by P dynamic under the global change. ► N addition lowered foliar N:P ratios in the N 2 fixer under elevated CO 2 . - N addition could facilitate the N 2 fixer rather than the non-N 2 fixers to regulate foliar N and P stoichiometry under elevated CO 2 in subtropical forests.

  13. Automated Diatom Classification (Part A: Handcrafted Feature Approaches

    Directory of Open Access Journals (Sweden)

    Gloria Bueno

    2017-07-01

    Full Text Available This paper deals with automatic taxa identification based on machine learning methods. The aim is therefore to automatically classify diatoms, in terms of pattern recognition terminology. Diatoms are a kind of algae microorganism with high biodiversity at the species level, which are useful for water quality assessment. The most relevant features for diatom description and classification have been selected using an extensive dataset of 80 taxa with a minimum of 100 samples/taxon augmented to 300 samples/taxon. In addition to published morphological, statistical and textural descriptors, a new textural descriptor, Local Binary Patterns (LBP, to characterize the diatom’s valves, and a log Gabor implementation not tested before for this purpose are introduced in this paper. Results show an overall accuracy of 98.11% using bagging decision trees and combinations of descriptors. Finally, some phycological features of diatoms that are still difficult to integrate in computer systems are discussed for future work.

  14. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

    Science.gov (United States)

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

    2015-01-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  15. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

    Science.gov (United States)

    Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

    2015-12-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  16. Use of UAV-Borne Spectrometer for Land Cover Classification

    Directory of Open Access Journals (Sweden)

    Sowmya Natesan

    2018-04-01

    Full Text Available Unmanned aerial vehicles (UAV are being used for low altitude remote sensing for thematic land classification using visible light and multi-spectral sensors. The objective of this work was to investigate the use of UAV equipped with a compact spectrometer for land cover classification. The UAV platform used was a DJI Flamewheel F550 hexacopter equipped with GPS and Inertial Measurement Unit (IMU navigation sensors, and a Raspberry Pi processor and camera module. The spectrometer used was the FLAME-NIR, a near-infrared spectrometer for hyperspectral measurements. RGB images and spectrometer data were captured simultaneously. As spectrometer data do not provide continuous terrain coverage, the locations of their ground elliptical footprints were determined from the bundle adjustment solution of the captured images. For each of the spectrometer ground ellipses, the land cover signature at the footprint location was determined to enable the characterization, identification, and classification of land cover elements. To attain a continuous land cover classification map, spatial interpolation was carried out from the irregularly distributed labeled spectrometer points. The accuracy of the classification was assessed using spatial intersection with the object-based image classification performed using the RGB images. Results show that in homogeneous land cover, like water, the accuracy of classification is 78% and in mixed classes, like grass, trees and manmade features, the average accuracy is 50%, thus, indicating the contribution of hyperspectral measurements of low altitude UAV-borne spectrometers to improve land cover classification.

  17. Comparing the performance of flat and hierarchical Habitat/Land-Cover classification models in a NATURA 2000 site

    Science.gov (United States)

    Gavish, Yoni; O'Connell, Jerome; Marsh, Charles J.; Tarantino, Cristina; Blonda, Palma; Tomaselli, Valeria; Kunin, William E.

    2018-02-01

    The increasing need for high quality Habitat/Land-Cover (H/LC) maps has triggered considerable research into novel machine-learning based classification models. In many cases, H/LC classes follow pre-defined hierarchical classification schemes (e.g., CORINE), in which fine H/LC categories are thematically nested within more general categories. However, none of the existing machine-learning algorithms account for this pre-defined hierarchical structure. Here we introduce a novel Random Forest (RF) based application of hierarchical classification, which fits a separate local classification model in every branching point of the thematic tree, and then integrates all the different local models to a single global prediction. We applied the hierarchal RF approach in a NATURA 2000 site in Italy, using two land-cover (CORINE, FAO-LCCS) and one habitat classification scheme (EUNIS) that differ from one another in the shape of the class hierarchy. For all 3 classification schemes, both the hierarchical model and a flat model alternative provided accurate predictions, with kappa values mostly above 0.9 (despite using only 2.2-3.2% of the study area as training cells). The flat approach slightly outperformed the hierarchical models when the hierarchy was relatively simple, while the hierarchical model worked better under more complex thematic hierarchies. Most misclassifications came from habitat pairs that are thematically distant yet spectrally similar. In 2 out of 3 classification schemes, the additional constraints of the hierarchical model resulted with fewer such serious misclassifications relative to the flat model. The hierarchical model also provided valuable information on variable importance which can shed light into "black-box" based machine learning algorithms like RF. We suggest various ways by which hierarchical classification models can increase the accuracy and interpretability of H/LC classification maps.

  18. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    Directory of Open Access Journals (Sweden)

    Santana Isabel

    2011-08-01

    Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

  19. Land Cover Mapping using GEOBIA to Estimate Loss of Salacca zalacca Trees in Landslide Area of Clapar, Madukara District of Banjarnegara

    Science.gov (United States)

    Permata, Anggi; Juniansah, Anwar; Nurcahyati, Eka; Dimas Afrizal, Mousafi; Adnan Shafry Untoro, Muhammad; Arifatha, Na'ima; Ramadhani Yudha Adiwijaya, Raden; Farda, Nur Mohammad

    2016-11-01

    Landslide is an unpredictable natural disaster which commonly happens in highslope area. Aerial photography in small format is one of acquisition method that can reach and obtain high resolution spatial data faster than other methods, and provide data such as orthomosaic and Digital Surface Model (DSM). The study area contained landslide area in Clapar, Madukara District of Banjarnegara. Aerial photographs of landslide area provided advantage in objects visibility. Object's characters such as shape, size, and texture were clearly seen, therefore GEOBIA (Geography Object Based Image Analysis) was compatible as method for classifying land cover in study area. Dissimilar with PPA (PerPixel Analyst) method that used spectral information as base object detection, GEOBIA could use spatial elements as classification basis to establish a land cover map with better accuracy. GEOBIA method used classification hierarchy to divide post disaster land cover into three main objects: vegetation, landslide/soil, and building. Those three were required to obtain more detailed information that can be used in estimating loss caused by landslide and establishing land cover map in landslide area. Estimating loss in landslide area related to damage in Salak (Salacca zalacca) plantations. This estimation towards quantity of Salak tree that were drifted away by landslide was calculated in assumption that every tree damaged by landslide had same age and production class with other tree that weren't damaged. Loss calculation was done by approximating quantity of damaged trees in landslide area with data of trees around area that were acquired from GEOBIA classification method.

  20. Coalescent methods for estimating phylogenetic trees.

    Science.gov (United States)

    Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V

    2009-10-01

    We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.

  1. Large-scale gene function analysis with the PANTHER classification system.

    Science.gov (United States)

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  2. Chronic subdural hematoma: Surgical management and outcome in 986 cases: A classification and regression tree approach

    Science.gov (United States)

    Rovlias, Aristedis; Theodoropoulos, Spyridon; Papoutsakis, Dimitrios

    2015-01-01

    Background: Chronic subdural hematoma (CSDH) is one of the most common clinical entities in daily neurosurgical practice which carries a most favorable prognosis. However, because of the advanced age and medical problems of patients, surgical therapy is frequently associated with various complications. This study evaluated the clinical features, radiological findings, and neurological outcome in a large series of patients with CSDH. Methods: A classification and regression tree (CART) technique was employed in the analysis of data from 986 patients who were operated at Asclepeion General Hospital of Athens from January 1986 to December 2011. Burr holes evacuation with closed system drainage has been the operative technique of first choice at our institution for 29 consecutive years. A total of 27 prognostic factors were examined to predict the outcome at 3-month postoperatively. Results: Our results indicated that neurological status on admission was the best predictor of outcome. With regard to the other data, age, brain atrophy, thickness and density of hematoma, subdural accumulation of air, and antiplatelet and anticoagulant therapy were found to correlate significantly with prognosis. The overall cross-validated predictive accuracy of CART model was 85.34%, with a cross-validated relative error of 0.326. Conclusions: Methodologically, CART technique is quite different from the more commonly used methods, with the primary benefit of illustrating the important prognostic variables as related to outcome. Since, the ideal therapy for the treatment of CSDH is still under debate, this technique may prove useful in developing new therapeutic strategies and approaches for patients with CSDH. PMID:26257985

  3. The hydrological vulnerability of western North American boreal tree species based on ground-based observations of tree mortality

    Science.gov (United States)

    Hember, R. A.; Kurz, W. A.; Coops, N. C.

    2017-12-01

    Several studies indicate that climate change has increased rates of tree mortality, adversely affecting timber supply and carbon storage in western North American boreal forests. Statistical models of tree mortality can play a complimentary role in detecting and diagnosing forest change. Yet, such models struggle to address real-world complexity, including expectations that hydrological vulnerability arises from both drought stress and excess-water stress, and that these effects vary by species, tree size, and competitive status. Here, we describe models that predict annual probability of tree mortality (Pm) of common boreal tree species based on tree height (H), biomass of larger trees (BLT), soil water content (W), reference evapotranspiration (E), and two-way interactions. We show that interactions among H and hydrological variables are consistently significant. Vulnerability to extreme droughts consistently increases as H approaches maximum observed values of each species, while some species additionally show increasing vulnerability at low H. Some species additionally show increasing vulnerability to low W under high BLT, or increasing drought vulnerability under low BLT. These results suggest that vulnerability of trees to increasingly severe droughts depends on the hydraulic efficiency, competitive status, and microclimate of individual trees. Static simulations of Pm across a 1-km grid (i.e., with time-independent inputs of H, BLT, and species composition) indicate complex spatial patterns in the time trends during 1965-2014 and a mean change in Pm of 42 %. Lastly, we discuss how the size-dependence of hydrological vulnerability, in concert with increasingly severe drought events, may shape future responses of stand-level biomass production to continued warming and increasing carbon dioxide concentration in the region.

  4. Exploratory Use of Decision Tree Analysis in Classification of Outcome in Hypoxic–Ischemic Brain Injury

    Directory of Open Access Journals (Sweden)

    Thanh G. Phan

    2018-03-01

    Full Text Available BackgroundPrognostication following hypoxic ischemic encephalopathy (brain injury is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data.MethodThe inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003–2011. Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS, features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume associates of severe disability and death. We used the area under the ROC (auROC to determine accuracy of model. There were 41 (63.7% males patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82–1.00. At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86–1.00.ConclusionOur findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.

  5. Exploratory Use of Decision Tree Analysis in Classification of Outcome in Hypoxic-Ischemic Brain Injury.

    Science.gov (United States)

    Phan, Thanh G; Chen, Jian; Singhal, Shaloo; Ma, Henry; Clissold, Benjamin B; Ly, John; Beare, Richard

    2018-01-01

    Prognostication following hypoxic ischemic encephalopathy (brain injury) is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data. The inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003-2011). Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS), features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume) associates of severe disability and death. We used the area under the ROC (auROC) to determine accuracy of model. There were 41 (63.7% males) patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82-1.00). At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86-1.00). Our findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.

  6. Audio stream classification for multimedia database search

    Science.gov (United States)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  7. Modeling Wood Fibre Length in Black Spruce (Picea mariana (Mill. BSP Based on Ecological Land Classification

    Directory of Open Access Journals (Sweden)

    Elisha Townshend

    2015-09-01

    Full Text Available Effective planning to optimize the forest value chain requires accurate and detailed information about the resource; however, estimates of the distribution of fibre properties on the landscape are largely unavailable prior to harvest. Our objective was to fit a model of the tree-level average fibre length related to ecosite classification and other forest inventory variables depicted at the landscape scale. A series of black spruce increment cores were collected at breast height from trees in nine different ecosite groups within the boreal forest of northeastern Ontario, and processed using standard techniques for maceration and fibre length measurement. Regression tree analysis and random forests were used to fit hierarchical classification models and find the most important predictor variables for the response variable area-weighted mean stem-level fibre length. Ecosite group was the best predictor in the regression tree. Longer mean fibre-length was associated with more productive ecosites that supported faster growth. The explanatory power of the model of fitted data was good; however, random forests simulations indicated poor generalizability. These results suggest the potential to develop localized models linking wood fibre length in black spruce to landscape-level attributes, and improve the sustainability of forest management by identifying ideal locations to harvest wood that has desirable fibre characteristics.

  8. Classification of Several Optically Complex Waters in China Using in Situ Remote Sensing Reflectance

    Directory of Open Access Journals (Sweden)

    Qian Shen

    2015-11-01

    Full Text Available Determining the dominant optically active substances in water bodies via classification can improve the accuracy of bio-optical and water quality parameters estimated by remote sensing. This study provides four robust centroid sets from in situ remote sensing reflectance (Rrs (λ data presenting typical optical types obtained by plugging different similarity measures into fuzzy c-means (FCM clustering. Four typical types of waters were studied: (1 highly mixed eutrophic waters, with the proportion of absorption of colored dissolved organic matter (CDOM, phytoplankton, and non-living particulate matter at approximately 20%, 20%, and 60% respectively; (2 CDOM-dominated relatively clear waters, with approximately 45% by proportion of CDOM absorption; (3 nonliving solids-dominated waters, with approximately 88% by proportion of absorption of nonliving particulate matter; and (4 cyanobacteria-composed scum. We also simulated spectra from seven ocean color satellite sensors to assess their classification ability. POLarization and Directionality of the Earth's Reflectances (POLDER, Sentinel-2A, and MEdium Resolution Imaging Spectrometer (MERIS were found to perform better than the rest. Further, a classification tree for MERIS, in which the characteristics of Rrs (709/Rrs (681, Rrs (560/Rrs (709, Rrs (560/Rrs (620, and Rrs (709/Rrs (761 are integrated, is also proposed in this paper. The overall accuracy and Kappa coefficient of the proposed classification tree are 76.2% and 0.632, respectively.

  9. Stacked Denoising Autoencoders Applied to Star/Galaxy Classification

    Science.gov (United States)

    Qin, Hao-ran; Lin, Ji-ming; Wang, Jun-yi

    2017-04-01

    In recent years, the deep learning algorithm, with the characteristics of strong adaptability, high accuracy, and structural complexity, has become more and more popular, but it has not yet been used in astronomy. In order to solve the problem that the star/galaxy classification accuracy is high for the bright source set, but low for the faint source set of the Sloan Digital Sky Survey (SDSS) data, we introduced the new deep learning algorithm, namely the SDA (stacked denoising autoencoder) neural network and the dropout fine-tuning technique, which can greatly improve the robustness and antinoise performance. We randomly selected respectively the bright source sets and faint source sets from the SDSS DR12 and DR7 data with spectroscopic measurements, and made preprocessing on them. Then, we randomly selected respectively the training sets and testing sets without replacement from the bright source sets and faint source sets. At last, using these training sets we made the training to obtain the SDA models of the bright sources and faint sources in the SDSS DR7 and DR12, respectively. We compared the test result of the SDA model on the DR12 testing set with the test results of the Library for Support Vector Machines (LibSVM), J48 decision tree, Logistic Model Tree (LMT), Support Vector Machine (SVM), Logistic Regression, and Decision Stump algorithm, and compared the test result of the SDA model on the DR7 testing set with the test results of six kinds of decision trees. The experiments show that the SDA has a better classification accuracy than other machine learning algorithms for the faint source sets of DR7 and DR12. Especially, when the completeness function is used as the evaluation index, compared with the decision tree algorithms, the correctness rate of SDA has improved about 15% for the faint source set of SDSS-DR7.

  10. Molecular classification of pesticides including persistent organic pollutants, phenylurea and sulphonylurea herbicides.

    Science.gov (United States)

    Torrens, Francisco; Castellano, Gloria

    2014-06-05

    Pesticide residues in wine were analyzed by liquid chromatography-tandem mass spectrometry. Retentions are modelled by structure-property relationships. Bioplastic evolution is an evolutionary perspective conjugating effect of acquired characters and evolutionary indeterminacy-morphological determination-natural selection principles; its application to design co-ordination index barely improves correlations. Fractal dimensions and partition coefficient differentiate pesticides. Classification algorithms are based on information entropy and its production. Pesticides allow a structural classification by nonplanarity, and number of O, S, N and Cl atoms and cycles; different behaviours depend on number of cycles. The novelty of the approach is that the structural parameters are related to retentions. Classification algorithms are based on information entropy. When applying procedures to moderate-sized sets, excessive results appear compatible with data suffering a combinatorial explosion. However, equipartition conjecture selects criterion resulting from classification between hierarchical trees. Information entropy permits classifying compounds agreeing with principal component analyses. Periodic classification shows that pesticides in the same group present similar properties; those also in equal period, maximum resemblance. The advantage of the classification is to predict the retentions for molecules not included in the categorization. Classification extends to phenyl/sulphonylureas and the application will be to predict their retentions.

  11. Components of Antagonism and Mutualism in Ips pini–Fungal Interactions: Relationship to a Life History of Colonizing Highly Stressed and Dead Trees

    Science.gov (United States)

    Brian J. Kopper; Kier D. Klepzig; Kenneth F. Raffa

    2004-01-01

    Efforts to describe the complex relationships between bark beetles and the ophiostomatoid (stain) fungi they transport have largely resulted in a dichotomous classification. These symbioses have been viewed as either mutualistic (i.e., fungi help bark beetles colonize living trees by overcoming tree defenses or by providing nutrients after colonization in return for...

  12. Comparison of models of automatic classification of textural patterns of mineral presents in Colombian coals

    International Nuclear Information System (INIS)

    Lopez Carvajal, Jaime; Branch Bedoya, John Willian

    2005-01-01

    The automatic classification of objects is a very interesting approach under several problem domains. This paper outlines some results obtained under different classification models to categorize textural patterns of minerals using real digital images. The data set used was characterized by a small size and noise presence. The implemented models were the Bayesian classifier, Neural Network (2-5-1), support vector machine, decision tree and 3-nearest neighbors. The results after applying crossed validation show that the Bayesian model (84%) proved better predictive capacity than the others, mainly due to its noise robustness behavior. The neuronal network (68%) and the SVM (67%) gave promising results, because they could be improved increasing the data amount used, while the decision tree (55%) and K-NN (54%) did not seem to be adequate for this problem, because of their sensibility to noise

  13. Million trees Los Angeles canopy cover and benefit assessment

    Science.gov (United States)

    E.G. McPherson; J.R. Simpson; Q. Xiao; C. Wu

    2011-01-01

    The Million Trees LA initiative intends to improve Los Angeles’s environment through planting and stewardship of 1 million trees. The purpose of this study was to measure Los Angeles’s existing tree canopy cover (TCC), determine if space exists for 1 million additional trees, and estimate future benefits from the planting. High-resolution QuickBird remote sensing data...

  14. Geospatial relationships of tree species damage caused by Hurricane Katrina in south Mississippi

    Science.gov (United States)

    Mark W. Garrigues; Zhaofei Fan; David L. Evans; Scott D. Roberts; William H. Cooke III

    2012-01-01

    Hurricane Katrina generated substantial impacts on the forests and biological resources of the affected area in Mississippi. This study seeks to use classification tree analysis (CTA) to determine which variables are significant in predicting hurricane damage (shear or windthrow) in the Southeast Mississippi Institute for Forest Inventory District. Logistic regressions...

  15. Handling Imbalanced Data Sets in Multistage Classification

    Science.gov (United States)

    López, M.

    Multistage classification is a logical approach, based on a divide-and-conquer solution, for dealing with problems with a high number of classes. The classification problem is divided into several sequential steps, each one associated to a single classifier that works with subgroups of the original classes. In each level, the current set of classes is split into smaller subgroups of classes until they (the subgroups) are composed of only one class. The resulting chain of classifiers can be represented as a tree, which (1) simplifies the classification process by using fewer categories in each classifier and (2) makes it possible to combine several algorithms or use different attributes in each stage. Most of the classification algorithms can be biased in the sense of selecting the most populated class in overlapping areas of the input space. This can degrade a multistage classifier performance if the training set sample frequencies do not reflect the real prevalence in the population. Several techniques such as applying prior probabilities, assigning weights to the classes, or replicating instances have been developed to overcome this handicap. Most of them are designed for two-class (accept-reject) problems. In this article, we evaluate several of these techniques as applied to multistage classification and analyze how they can be useful for astronomy. We compare the results obtained by classifying a data set based on Hipparcos with and without these methods.

  16. Seasonal trends in separability of leaf reflectance spectra for Ailanthus altissima and four other tree species

    Science.gov (United States)

    Burkholder, Aaron

    This project investigated the spectral separability of the invasive species Ailanthus altissima, commonly called tree of heaven, and four other native species. Leaves were collected from Ailanthus and four native tree species from May 13 through August 24, 2008, and spectral reflectance factor measurements were gathered for each tree using an ASD (Boulder, Colorado) FieldSpec Pro full-range spectroradiometer. The original data covered the range from 350-2500 nm, with one reflectance measurement collected per one nm wavelength. To reduce dimensionality, the measurements were resampled to the actual resolution of the spectrometer's sensors, and regions of atmospheric absorption were removed. Continuum removal was performed on the reflectance data, resulting in a second dataset. For both the reflectance and continuum removed datasets, least angle regression (LARS) and random forest classification were used to identify a single set of optimal wavelengths across all sampled dates, a set of optimal wavelengths for each date, and the dates for which Ailanthus is most separable from other species. It was found that classification accuracy varies both with dates and bands used. Contrary to expectations that early spring would provide the best separability, the lowest classification error was observed on July 22 for the reflectance data, and on May 13, July 11 and August 1 for the continuum removed data. This suggests that July and August are also potentially good months for species differentiation. Applying continuum removal in many cases reduced classification error, although not consistently. Band selection seems to be more important for reflectance data in that it results in greater improvement in classification accuracy, and LARS appears to be an effective band selection tool. The optimal spectral bands were selected from across the spectrum, often with bands from the blue (401-431 nm), NIR (1115 nm) and SWIR (1985-1995 nm), suggesting that hyperspectral sensors with

  17. Classification of nervous system withdrawn and approved drugs with ToxPrint features via machine learning strategies.

    Science.gov (United States)

    Onay, Aytun; Onay, Melih; Abul, Osman

    2017-04-01

    Early-phase virtual screening of candidate drug molecules plays a key role in pharmaceutical industry from data mining and machine learning to prevent adverse effects of the drugs. Computational classification methods can distinguish approved drugs from withdrawn ones. We focused on 6 data sets including maximum 110 approved and 110 withdrawn drugs for all and nervous system diseases to distinguish approved drugs from withdrawn ones. In this study, we used support vector machines (SVMs) and ensemble methods (EMs) such as boosted and bagged trees to classify drugs into approved and withdrawn categories. Also, we used CORINA Symphony program to identify Toxprint chemotypes including over 700 predefined chemotypes for determination of risk and safety assesment of candidate drug molecules. In addition, we studied nervous system withdrawn drugs to determine the key fragments with The ParMol package including gSpan algorithm. According to our results, the descriptors named as the number of total chemotypes and bond CN_amine_aliphatic_generic were more significant descriptors. The developed Medium Gaussian SVM model reached 78% prediction accuracy on test set for drug data set including all disease. Here, bagged tree and linear SVM models showed 89% of accuracies for phycholeptics and psychoanaleptics drugs. A set of discriminative fragments in nervous system withdrawn drug (NSWD) data sets was obtained. These fragments responsible for the drugs removed from market were benzene, toluene, N,N-dimethylethylamine, crotylamine, 5-methyl-2,4-heptadiene, octatriene and carbonyl group. This paper covers the development of computational classification methods to distinguish approved drugs from withdrawn ones. In addition, the results of this study indicated the identification of discriminative fragments is of significance to design a new nervous system approved drugs with interpretation of the structures of the NSWDs. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. An Efficient Ensemble Learning Method for Gene Microarray Classification

    Directory of Open Access Journals (Sweden)

    Alireza Osareh

    2013-01-01

    Full Text Available The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  19. Up in the tree--the overlooked richness of bryophytes and lichens in tree crowns.

    Science.gov (United States)

    Boch, Steffen; Müller, Jörg; Prati, Daniel; Blaser, Stefan; Fischer, Markus

    2013-01-01

    Assessing diversity is among the major tasks in ecology and conservation science. In ecological and conservation studies, epiphytic cryptogams are usually sampled up to accessible heights in forests. Thus, their diversity, especially of canopy specialists, likely is underestimated. If the proportion of those species differs among forest types, plot-based diversity assessments are biased and may result in misleading conservation recommendations. We sampled bryophytes and lichens in 30 forest plots of 20 m × 20 m in three German regions, considering all substrates, and including epiphytic litter fall. First, the sampling of epiphytic species was restricted to the lower 2 m of trees and shrubs. Then, on one representative tree per plot, we additionally recorded epiphytic species in the crown, using tree climbing techniques. Per tree, on average 54% of lichen and 20% of bryophyte species were overlooked if the crown was not been included. After sampling all substrates per plot, including the bark of all shrubs and trees, still 38% of the lichen and 4% of the bryophyte species were overlooked if the tree crown of the sampled tree was not included. The number of overlooked lichen species varied strongly among regions. Furthermore, the number of overlooked bryophyte and lichen species per plot was higher in European beech than in coniferous stands and increased with increasing diameter at breast height of the sampled tree. Thus, our results indicate a bias of comparative studies which might have led to misleading conservation recommendations of plot-based diversity assessments.

  20. A global reference database from very high resolution commercial satellite data and methodology for application to Landsat derived 30 m continuous field tree cover data

    Science.gov (United States)

    Pengra, Bruce; Long, Jordan; Dahal, Devendra; Stehman, Stephen V.; Loveland, Thomas R.

    2015-01-01

    The methodology for selection, creation, and application of a global remote sensing validation dataset using high resolution commercial satellite data is presented. High resolution data are obtained for a stratified random sample of 500 primary sampling units (5 km  ×  5 km sample blocks), where the stratification based on Köppen climate classes is used to distribute the sample globally among biomes. The high resolution data are classified to categorical land cover maps using an analyst mediated classification workflow. Our initial application of these data is to evaluate a global 30 m Landsat-derived, continuous field tree cover product. For this application, the categorical reference classification produced at 2 m resolution is converted to percent tree cover per 30 m pixel (secondary sampling unit)for comparison to Landsat-derived estimates of tree cover. We provide example results (based on a subsample of 25 sample blocks in South America) illustrating basic analyses of agreement that can be produced from these reference data. Commercial high resolution data availability and data quality are shown to provide a viable means of validating continuous field tree cover. When completed, the reference classifications for the full sample of 500 blocks will be released for public use.

  1. An object-based approach for tree species extraction from digital orthophoto maps

    Science.gov (United States)

    Jamil, Akhtar; Bayram, Bulent

    2018-05-01

    Tree segmentation is an active and ongoing research area in the field of photogrammetry and remote sensing. It is more challenging due to both intra-class and inter-class similarities among various tree species. In this study, we exploited various statistical features for extraction of hazelnut trees from 1 : 5000 scaled digital orthophoto maps. Initially, the non-vegetation areas were eliminated using traditional normalized difference vegetation index (NDVI) followed by application of mean shift segmentation for transforming the pixels into meaningful homogeneous objects. In order to eliminate false positives, morphological opening and closing was employed on candidate objects. A number of heuristics were also derived to eliminate unwanted effects such as shadow and bounding box aspect ratios, before passing them into the classification stage. Finally, a knowledge based decision tree was constructed to distinguish the hazelnut trees from rest of objects which include manmade objects and other type of vegetation. We evaluated the proposed methodology on 10 sample orthophoto maps obtained from Giresun province in Turkey. The manually digitized hazelnut tree boundaries were taken as reference data for accuracy assessment. Both manually digitized and segmented tree borders were converted into binary images and the differences were calculated. According to the obtained results, the proposed methodology obtained an overall accuracy of more than 85 % for all sample images.

  2. Galaxy And Mass Assembly: automatic morphological classification of galaxies using statistical learning

    Science.gov (United States)

    Sreejith, Sreevarsha; Pereverzyev, Sergiy, Jr.; Kelvin, Lee S.; Marleau, Francine R.; Haltmeier, Markus; Ebner, Judith; Bland-Hawthorn, Joss; Driver, Simon P.; Graham, Alister W.; Holwerda, Benne W.; Hopkins, Andrew M.; Liske, Jochen; Loveday, Jon; Moffett, Amanda J.; Pimbblet, Kevin A.; Taylor, Edward N.; Wang, Lingyu; Wright, Angus H.

    2018-03-01

    We apply four statistical learning methods to a sample of 7941 galaxies (z test the feasibility of using automated algorithms to classify galaxies. Using 10 features measured for each galaxy (sizes, colours, shape parameters, and stellar mass), we apply the techniques of Support Vector Machines, Classification Trees, Classification Trees with Random Forest (CTRF) and Neural Networks, and returning True Prediction Ratios (TPRs) of 75.8 per cent, 69.0 per cent, 76.2 per cent, and 76.0 per cent, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification (`unanimous disagreement') serves as a potential indicator of human error in classification, occurring in ˜ 9 per cent of ellipticals, ˜ 9 per cent of little blue spheroids, ˜ 14 per cent of early-type spirals, ˜ 21 per cent of intermediate-type spirals, and ˜ 4 per cent of late-type spirals and irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy data sets. Adopting the CTRF algorithm, the TPRs of the five galaxy types are : E, 70.1 per cent; LBS, 75.6 per cent; S0-Sa, 63.6 per cent; Sab-Scd, 56.4 per cent, and Sd-Irr, 88.9 per cent. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS, and S0-Sa) and disc-dominated (Sab-Scd and Sd-Irr), achieving an overall accuracy of 89.8 per cent. This translates into an accuracy of 84.9 per cent for spheroid-dominated systems and 92.5 per cent for disc-dominated systems.

  3. An Isometric Mapping Based Co-Location Decision Tree Algorithm

    Science.gov (United States)

    Zhou, G.; Wei, J.; Zhou, X.; Zhang, R.; Huang, W.; Sha, H.; Chen, J.

    2018-05-01

    Decision tree (DT) induction has been widely used in different pattern classification. However, most traditional DTs have the disadvantage that they consider only non-spatial attributes (ie, spectral information) as a result of classifying pixels, which can result in objects being misclassified. Therefore, some researchers have proposed a co-location decision tree (Cl-DT) method, which combines co-location and decision tree to solve the above the above-mentioned traditional decision tree problems. Cl-DT overcomes the shortcomings of the existing DT algorithms, which create a node for each value of a given attribute, which has a higher accuracy than the existing decision tree approach. However, for non-linearly distributed data instances, the euclidean distance between instances does not reflect the true positional relationship between them. In order to overcome these shortcomings, this paper proposes an isometric mapping method based on Cl-DT (called, (Isomap-based Cl-DT), which is a method that combines heterogeneous and Cl-DT together. Because isometric mapping methods use geodetic distances instead of Euclidean distances between non-linearly distributed instances, the true distance between instances can be reflected. The experimental results and several comparative analyzes show that: (1) The extraction method of exposed carbonate rocks is of high accuracy. (2) The proposed method has many advantages, because the total number of nodes, the number of leaf nodes and the number of nodes are greatly reduced compared to Cl-DT. Therefore, the Isomap -based Cl-DT algorithm can construct a more accurate and faster decision tree.

  4. AN ISOMETRIC MAPPING BASED CO-LOCATION DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    G. Zhou

    2018-05-01

    Full Text Available Decision tree (DT induction has been widely used in different pattern classification. However, most traditional DTs have the disadvantage that they consider only non-spatial attributes (ie, spectral information as a result of classifying pixels, which can result in objects being misclassified. Therefore, some researchers have proposed a co-location decision tree (Cl-DT method, which combines co-location and decision tree to solve the above the above-mentioned traditional decision tree problems. Cl-DT overcomes the shortcomings of the existing DT algorithms, which create a node for each value of a given attribute, which has a higher accuracy than the existing decision tree approach. However, for non-linearly distributed data instances, the euclidean distance between instances does not reflect the true positional relationship between them. In order to overcome these shortcomings, this paper proposes an isometric mapping method based on Cl-DT (called, (Isomap-based Cl-DT, which is a method that combines heterogeneous and Cl-DT together. Because isometric mapping methods use geodetic distances instead of Euclidean distances between non-linearly distributed instances, the true distance between instances can be reflected. The experimental results and several comparative analyzes show that: (1 The extraction method of exposed carbonate rocks is of high accuracy. (2 The proposed method has many advantages, because the total number of nodes, the number of leaf nodes and the number of nodes are greatly reduced compared to Cl-DT. Therefore, the Isomap -based Cl-DT algorithm can construct a more accurate and faster decision tree.

  5. Fuzzy tree automata and syntactic pattern recognition.

    Science.gov (United States)

    Lee, E T

    1982-04-01

    An approach of representing patterns by trees and processing these trees by fuzzy tree automata is described. Fuzzy tree automata are defined and investigated. The results include that the class of fuzzy root-to-frontier recognizable ¿-trees is closed under intersection, union, and complementation. Thus, the class of fuzzy root-to-frontier recognizable ¿-trees forms a Boolean algebra. Fuzzy tree automata are applied to processing fuzzy tree representation of patterns based on syntactic pattern recognition. The grade of acceptance is defined and investigated. Quantitative measures of ``approximate isosceles triangle,'' ``approximate elongated isosceles triangle,'' ``approximate rectangle,'' and ``approximate cross'' are defined and used in the illustrative examples of this approach. By using these quantitative measures, a house, a house with high roof, and a church are also presented as illustrative examples. In addition, three fuzzy tree automata are constructed which have the capability of processing the fuzzy tree representations of ``fuzzy houses,'' ``houses with high roofs,'' and ``fuzzy churches,'' respectively. The results may have useful applications in pattern recognition, image processing, artificial intelligence, pattern database design and processing, image science, and pictorial information systems.

  6. Classification of lung sounds using higher-order statistics: A divide-and-conquer approach.

    Science.gov (United States)

    Naves, Raphael; Barbosa, Bruno H G; Ferreira, Danton D

    2016-06-01

    Lung sound auscultation is one of the most commonly used methods to evaluate respiratory diseases. However, the effectiveness of this method depends on the physician's training. If the physician does not have the proper training, he/she will be unable to distinguish between normal and abnormal sounds generated by the human body. Thus, the aim of this study was to implement a pattern recognition system to classify lung sounds. We used a dataset composed of five types of lung sounds: normal, coarse crackle, fine crackle, monophonic and polyphonic wheezes. We used higher-order statistics (HOS) to extract features (second-, third- and fourth-order cumulants), Genetic Algorithms (GA) and Fisher's Discriminant Ratio (FDR) to reduce dimensionality, and k-Nearest Neighbors and Naive Bayes classifiers to recognize the lung sound events in a tree-based system. We used the cross-validation procedure to analyze the classifiers performance and the Tukey's Honestly Significant Difference criterion to compare the results. Our results showed that the Genetic Algorithms outperformed the Fisher's Discriminant Ratio for feature selection. Moreover, each lung class had a different signature pattern according to their cumulants showing that HOS is a promising feature extraction tool for lung sounds. Besides, the proposed divide-and-conquer approach can accurately classify different types of lung sounds. The classification accuracy obtained by the best tree-based classifier was 98.1% for classification accuracy on training, and 94.6% for validation data. The proposed approach achieved good results even using only one feature extraction tool (higher-order statistics). Additionally, the implementation of the proposed classifier in an embedded system is feasible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  7. Fast method of constructing image correlations to build a free network based on image multivocabulary trees

    Science.gov (United States)

    Zhan, Zongqian; Wang, Xin; Wei, Minglu

    2015-05-01

    In image-based three-dimensional (3-D) reconstruction, one topic of growing importance is how to quickly obtain a 3-D model from a large number of images. The retrieval of the correct and relevant images for the model poses a considerable technological challenge. The "image vocabulary tree" has been proposed as a method to search for similar images. However, a significant drawback of this approach is identified in its low time efficiency and barely satisfactory classification result. The method proposed is inspired by, and improves upon, some recent methods. Specifically, vocabulary quality is considered and multivocabulary trees are designed to improve the classification result. A marked improvement was, indeed, observed in our evaluation of the proposed method. To improve time efficiency, graphics processing unit (GPU) computer unified device architecture parallel computation is applied in the multivocabulary trees. The results of the experiments showed that the GPU was three to four times more efficient than the enumeration matching and CPU methods when the number of images is large. This paper presents a reliable reference method for the rapid construction of a free network to be used for the computing of 3-D information.

  8. Classification of surface types using SIR-C/X-SAR, Mount Everest Area, Tibet

    Science.gov (United States)

    Albright, Thomas P.; Painter, Thomas H.; Roberts, Dar A.; Shi, Jiancheng; Dozier, Jeff; Fielding, Eric

    1998-01-01

    Imaging radar is a promising tool for mapping snow and ice cover in alpine regions. It combines a high-resolution, day or night, all-weather imaging capability with sensitivity to hydrologic and climatic snow and ice parameters. We use the spaceborne imaging radar-C/X-band synthetic aperture radar (SIR-C/X-SAR) to map snow and glacial ice on the rugged north slope of Mount Everest. From interferometrically derived digital elevation data, we compute the terrain calibration factor and cosine of the local illumination angle. We then process and terrain-correct radar data sets acquired on April 16, 1994. In addition to the spectral data, we include surface slope to improve discrimination among several surface types. These data sets are then used in a decision tree to generate an image classification. This method is successful in identifying and mapping scree/talus, dry snow, dry snow-covered glacier, wet snow-covered glacier, and rock-covered glacier, as corroborated by comparison with existing surface cover maps and other ancillary information. Application of the classification scheme to data acquired on October 7 of the same year yields accurate results for most surface types but underreports the extent of dry snow cover.

  9. A Proposal for Cardiac Arrhythmia Classification using Complexity Measures

    Directory of Open Access Journals (Sweden)

    AROTARITEI, D.

    2017-08-01

    Full Text Available Cardiovascular diseases are one of the major problems of humanity and therefore one of their component, arrhythmia detection and classification drawn an increased attention worldwide. The presence of randomness in discrete time series, like those arising in electrophysiology, is firmly connected with computational complexity measure. This connection can be used, for instance, in the analysis of RR-intervals of electrocardiographic (ECG signal, coded as binary string, to detect and classify arrhythmia. Our approach uses three algorithms (Lempel-Ziv, Sample Entropy and T-Code to compute the information complexity applied and a classification tree to detect 13 types of arrhythmia with encouraging results. To overcome the computational effort required for complexity calculus, a cloud computing solution with executable code deployment is also proposed.

  10. Arenal-type pyroclastic flows: A probabilistic event tree risk analysis

    Science.gov (United States)

    Meloy, Anthony F.

    2006-09-01

    A quantitative hazard-specific scenario-modelling risk analysis is performed at Arenal volcano, Costa Rica for the newly recognised Arenal-type pyroclastic flow (ATPF) phenomenon using an event tree framework. These flows are generated by the sudden depressurisation and fragmentation of an active basaltic andesite lava pool as a result of a partial collapse of the crater wall. The deposits of this type of flow include angular blocks and juvenile clasts, which are rarely found in other types of pyroclastic flow. An event tree analysis (ETA) is a useful tool and framework in which to analyse and graphically present the probabilities of the occurrence of many possible events in a complex system. Four event trees are created in the analysis, three of which are extended to investigate the varying individual risk faced by three generic representatives of the surrounding community: a resident, a worker, and a tourist. The raw numerical risk estimates determined by the ETA are converted into a set of linguistic expressions (i.e. VERY HIGH, HIGH, MODERATE etc.) using an established risk classification scale. Three individually tailored semi-quantitative risk maps are then created from a set of risk conversion tables to show how the risk varies for each individual in different areas around the volcano. In some cases, by relocating from the north to the south, the level of risk can be reduced by up to three classes. While the individual risk maps may be broadly applicable, and therefore of interest to the general community, the risk maps and associated probability values generated in the ETA are intended to be used by trained professionals and government agencies to evaluate the risk and effectively manage the long-term development of infrastructure and habitation. With the addition of fresh monitoring data, the combination of both long- and short-term event trees would provide a comprehensive and consistent method of risk analysis (both during and pre-crisis), and as such

  11. Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification

    Science.gov (United States)

    Weller, Andrew F.; Harris, Anthony J.; Ware, J. Andrew; Jarvis, Paul S.

    2006-11-01

    The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (χ 2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.

  12. Benefits of tree mixes in carbon plantings

    Science.gov (United States)

    Hulvey, Kristin B.; Hobbs, Richard J.; Standish, Rachel J.; Lindenmayer, David B.; Lach, Lori; Perring, Michael P.

    2013-10-01

    Increasingly governments and the private sector are using planted forests to offset carbon emissions. Few studies, however, examine how tree diversity -- defined here as species richness and/or stand composition -- affects carbon storage in these plantings. Using aboveground tree biomass as a proxy for carbon storage, we used meta-analysis to compare carbon storage in tree mixtures with monoculture plantings. Tree mixes stored at least as much carbon as monocultures consisting of the mixture's most productive species and at times outperformed monoculture plantings. In mixed-species stands, individual species, and in particular nitrogen-fixing trees, increased stand biomass. Further motivations for incorporating tree richness into planted forests include the contribution of diversity to total forest carbon-pool development, carbon-pool stability and the provision of extra ecosystem services. Our findings suggest a two-pronged strategy for designing carbon plantings including: (1) increased tree species richness; and (2) the addition of species that contribute to carbon storage and other target functions.

  13. Neighbourhood-scale urban forest ecosystem classification.

    Science.gov (United States)

    Steenberg, James W N; Millward, Andrew A; Duinker, Peter N; Nowak, David J; Robinson, Pamela J

    2015-11-01

    Urban forests are now recognized as essential components of sustainable cities, but there remains uncertainty concerning how to stratify and classify urban landscapes into units of ecological significance at spatial scales appropriate for management. Ecosystem classification is an approach that entails quantifying the social and ecological processes that shape ecosystem conditions into logical and relatively homogeneous management units, making the potential for ecosystem-based decision support available to urban planners. The purpose of this study is to develop and propose a framework for urban forest ecosystem classification (UFEC). The multifactor framework integrates 12 ecosystem components that characterize the biophysical landscape, built environment, and human population. This framework is then applied at the neighbourhood scale in Toronto, Canada, using hierarchical cluster analysis. The analysis used 27 spatially-explicit variables to quantify the ecosystem components in Toronto. Twelve ecosystem classes were identified in this UFEC application. Across the ecosystem classes, tree canopy cover was positively related to economic wealth, especially income. However, education levels and homeownership were occasionally inconsistent with the expected positive relationship with canopy cover. Open green space and stocking had variable relationships with economic wealth and were more closely related to population density, building intensity, and land use. The UFEC can provide ecosystem-based information for greening initiatives, tree planting, and the maintenance of the existing canopy. Moreover, its use has the potential to inform the prioritization of limited municipal resources according to ecological conditions and to concerns of social equity in the access to nature and distribution of ecosystem service supply. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. A non-parametric, supervised classification of vegetation types on the Kaibab National Forest using decision trees

    Science.gov (United States)

    Suzanne M. Joy; R. M. Reich; Richard T. Reynolds

    2003-01-01

    Traditional land classification techniques for large areas that use Landsat Thematic Mapper (TM) imagery are typically limited to the fixed spatial resolution of the sensors (30m). However, the study of some ecological processes requires land cover classifications at finer spatial resolutions. We model forest vegetation types on the Kaibab National Forest (KNF) in...

  15. Relationships between diameter and height of trees in natural ...

    African Journals Online (AJOL)

    Relationships between diameter and height of trees in natural tropical forest in Tanzania. Wilson A Mugasha, Ole M Bollandsås, Tron Eid. Abstract. The relationship between tree height (h) and tree diameter at breast height (dbh) is an important element describing forest stands. In addition, h often is a required variable in ...

  16. Using different classification models in wheat grading utilizing visual features

    Science.gov (United States)

    Basati, Zahra; Rasekh, Mansour; Abbaspour-Gilandeh, Yousef

    2018-04-01

    Wheat is one of the most important strategic crops in Iran and in the world. The major component that distinguishes wheat from other grains is the gluten section. In Iran, sunn pest is one of the most important factors influencing the characteristics of wheat gluten and in removing it from a balanced state. The existence of bug-damaged grains in wheat will reduce the quality and price of the product. In addition, damaged grains reduce the enrichment of wheat and the quality of bread products. In this study, after preprocessing and segmentation of images, 25 features including 9 colour features, 10 morphological features, and 6 textual statistical features were extracted so as to classify healthy and bug-damaged wheat grains of Azar cultivar of four levels of moisture content (9, 11.5, 14 and 16.5% w.b.) and two lighting colours (yellow light, the composition of yellow and white lights). Using feature selection methods in the WEKA software and the CfsSubsetEval evaluator, 11 features were chosen as inputs of artificial neural network, decision tree and discriment analysis classifiers. The results showed that the decision tree with the J.48 algorithm had the highest classification accuracy of 90.20%. This was followed by artificial neural network classifier with the topology of 11-19-2 and discrimient analysis classifier at 87.46 and 81.81%, respectively

  17. Improving Classification of Airborne Laser Scanning Echoes in the Forest-Tundra Ecotone Using Geostatistical and Statistical Measures

    Directory of Open Access Journals (Sweden)

    Nadja Stumberg

    2014-05-01

    Full Text Available The vegetation in the forest-tundra ecotone zone is expected to be highly affected by climate change and requires effective monitoring techniques. Airborne laser scanning (ALS has been proposed as a tool for the detection of small pioneer trees for such vast areas using laser height and intensity data. The main objective of the present study was to assess a possible improvement in the performance of classifying tree and nontree laser echoes from high-density ALS data. The data were collected along a 1000 km long transect stretching from southern to northern Norway. Different geostatistical and statistical measures derived from laser height and intensity values were used to extent and potentially improve more simple models ignoring the spatial context. Generalised linear models (GLM and support vector machines (SVM were employed as classification methods. Total accuracies and Cohen’s kappa coefficients were calculated and compared to those of simpler models from a previous study. For both classification methods, all models revealed total accuracies similar to the results of the simpler models. Concerning classification performance, however, the comparison of the kappa coefficients indicated a significant improvement for some models both using GLM and SVM, with classification accuracies >94%.

  18. Up in the tree--the overlooked richness of bryophytes and lichens in tree crowns.

    Directory of Open Access Journals (Sweden)

    Steffen Boch

    Full Text Available Assessing diversity is among the major tasks in ecology and conservation science. In ecological and conservation studies, epiphytic cryptogams are usually sampled up to accessible heights in forests. Thus, their diversity, especially of canopy specialists, likely is underestimated. If the proportion of those species differs among forest types, plot-based diversity assessments are biased and may result in misleading conservation recommendations. We sampled bryophytes and lichens in 30 forest plots of 20 m × 20 m in three German regions, considering all substrates, and including epiphytic litter fall. First, the sampling of epiphytic species was restricted to the lower 2 m of trees and shrubs. Then, on one representative tree per plot, we additionally recorded epiphytic species in the crown, using tree climbing techniques. Per tree, on average 54% of lichen and 20% of bryophyte species were overlooked if the crown was not been included. After sampling all substrates per plot, including the bark of all shrubs and trees, still 38% of the lichen and 4% of the bryophyte species were overlooked if the tree crown of the sampled tree was not included. The number of overlooked lichen species varied strongly among regions. Furthermore, the number of overlooked bryophyte and lichen species per plot was higher in European beech than in coniferous stands and increased with increasing diameter at breast height of the sampled tree. Thus, our results indicate a bias of comparative studies which might have led to misleading conservation recommendations of plot-based diversity assessments.

  19. Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation.

    Science.gov (United States)

    Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J

    2013-12-15

    The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.

  20. Multispectral LiDAR Data for Land Cover Classification of Urban Areas

    Directory of Open Access Journals (Sweden)

    Salem Morsy

    2017-04-01

    Full Text Available Airborne Light Detection And Ranging (LiDAR systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy.

  1. Multispectral LiDAR Data for Land Cover Classification of Urban Areas.

    Science.gov (United States)

    Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed

    2017-04-26

    Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy.

  2. An Active Learning Framework for Hyperspectral Image Classification Using Hierarchical Segmentation

    Science.gov (United States)

    Zhang, Zhou; Pasolli, Edoardo; Crawford, Melba M.; Tilton, James C.

    2015-01-01

    Augmenting spectral data with spatial information for image classification has recently gained significant attention, as classification accuracy can often be improved by extracting spatial information from neighboring pixels. In this paper, we propose a new framework in which active learning (AL) and hierarchical segmentation (HSeg) are combined for spectral-spatial classification of hyperspectral images. The spatial information is extracted from a best segmentation obtained by pruning the HSeg tree using a new supervised strategy. The best segmentation is updated at each iteration of the AL process, thus taking advantage of informative labeled samples provided by the user. The proposed strategy incorporates spatial information in two ways: 1) concatenating the extracted spatial features and the original spectral features into a stacked vector and 2) extending the training set using a self-learning-based semi-supervised learning (SSL) approach. Finally, the two strategies are combined within an AL framework. The proposed framework is validated with two benchmark hyperspectral datasets. Higher classification accuracies are obtained by the proposed framework with respect to five other state-of-the-art spectral-spatial classification approaches. Moreover, the effectiveness of the proposed pruning strategy is also demonstrated relative to the approaches based on a fixed segmentation.

  3. Phylogenetic tree reconstruction accuracy and model fit when proportions of variable sites change across the tree.

    Science.gov (United States)

    Shavit Grievink, Liat; Penny, David; Hendy, Michael D; Holland, Barbara R

    2010-05-01

    Commonly used phylogenetic models assume a homogeneous process through time in all parts of the tree. However, it is known that these models can be too simplistic as they do not account for nonhomogeneous lineage-specific properties. In particular, it is now widely recognized that as constraints on sequences evolve, the proportion and positions of variable sites can vary between lineages causing heterotachy. The extent to which this model misspecification affects tree reconstruction is still unknown. Here, we evaluate the effect of changes in the proportions and positions of variable sites on model fit and tree estimation. We consider 5 current models of nucleotide sequence evolution in a Bayesian Markov chain Monte Carlo framework as well as maximum parsimony (MP). We show that for a tree with 4 lineages where 2 nonsister taxa undergo a change in the proportion of variable sites tree reconstruction under the best-fitting model, which is chosen using a relative test, often results in the wrong tree. In this case, we found that an absolute test of model fit is a better predictor of tree estimation accuracy. We also found further evidence that MP is not immune to heterotachy. In addition, we show that increased sampling of taxa that have undergone a change in proportion and positions of variable sites is critical for accurate tree reconstruction.

  4. Tree compression with top trees

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Landau, Gad M.

    2013-01-01

    We introduce a new compression scheme for labeled trees based on top trees [3]. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...

  5. Tree compression with top trees

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Landau, Gad M.

    2015-01-01

    We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...

  6. Rapid Erosion Modeling in a Western Kenya Watershed using Visible Near Infrared Reflectance, Classification Tree Analysis and 137Cesium.

    Science.gov (United States)

    deGraffenried, Jeff B; Shepherd, Keith D

    2009-12-15

    Human induced soil erosion has severe economic and environmental impacts throughout the world. It is more severe in the tropics than elsewhere and results in diminished food production and security. Kenya has limited arable land and 30 percent of the country experiences severe to very severe human induced soil degradation. The purpose of this research was to test visible near infrared diffuse reflectance spectroscopy (VNIR) as a tool for rapid assessment and benchmarking of soil condition and erosion severity class. The study was conducted in the Saiwa River watershed in the northern Rift Valley Province of western Kenya, a tropical highland area. Soil 137 Cs concentration was measured to validate spectrally derived erosion classes and establish the background levels for difference land use types. Results indicate VNIR could be used to accurately evaluate a large and diverse soil data set and predict soil erosion characteristics. Soil condition was spectrally assessed and modeled. Analysis of mean raw spectra indicated significant reflectance differences between soil erosion classes. The largest differences occurred between 1,350 and 1,950 nm with the largest separation occurring at 1,920 nm. Classification and Regression Tree (CART) analysis indicated that the spectral model had practical predictive success (72%) with Receiver Operating Characteristic (ROC) of 0.74. The change in 137 Cs concentrations supported the premise that VNIR is an effective tool for rapid screening of soil erosion condition.

  7. Integrating human and machine intelligence in galaxy morphology classification tasks

    Science.gov (United States)

    Beck, Melanie R.; Scarlata, Claudia; Fortson, Lucy F.; Lintott, Chris J.; Simmons, B. D.; Galloway, Melanie A.; Willett, Kyle W.; Dickinson, Hugh; Masters, Karen L.; Marshall, Philip J.; Wright, Darryl

    2018-06-01

    Quantifying galaxy morphology is a challenging yet scientifically rewarding task. As the scale of data continues to increase with upcoming surveys, traditional classification methods will struggle to handle the load. We present a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme, we increase the classification rate nearly 5-fold classifying 226 124 galaxies in 92 d of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7 per cent accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides at least a factor of 8 increase in the classification rate, classifying 210 803 galaxies in just 32 d of GZ2 project time with 93.1 per cent accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.

  8. Radioecological investigations on tree rings of spruce

    International Nuclear Information System (INIS)

    Haas, G.; Mueller, A.

    1995-01-01

    Tree ring analysis contributes essentially to the explanation of physiological and element-specific transport phenomena in trees. After the accident in Chernobyl the behaviour of Cs-134 and Cs-137 in trees is most informative for the prediction of the future development of the distribution of these elements. In this study the uptake and the long term behaviour of Cs-134, Cs-137, Pb-210, Ra-226, Ra-228, K-40, Th-228, Th-230, Th-232, U-234 and U-238 in tree rings of spruce are examined by α- and γ-spectrometry. All samples are dried at 105C and ashed at 450C in a muffle furnace. The distributions found in the tree rings vary for different radionuclides. A soil profile from the spruce stand provides additional information

  9. Using tree diversity to compare phylogenetic heuristics.

    Science.gov (United States)

    Sul, Seung-Jin; Matthews, Suzanne; Williams, Tiffani L

    2009-04-29

    Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic heuristics are used to search stochastically for the best-scoring trees in tree space. Given that better tree scores are believed to be better approximations of the true phylogeny, traditional evaluation techniques have used tree scores to determine the heuristics that find the best scores in the fastest time. We develop new techniques to evaluate phylogenetic heuristics based on both tree scores and topologies to compare Pauprat and Rec-I-DCM3, two popular Maximum Parsimony search algorithms. Our results show that although Pauprat and Rec-I-DCM3 find the trees with the same best scores, topologically these trees are quite different. Furthermore, the Rec-I-DCM3 trees cluster distinctly from the Pauprat trees. In addition to our heatmap visualizations of using parsimony scores and the Robinson-Foulds distance to compare best-scoring trees found by the two heuristics, we also develop entropy-based methods to show the diversity of the trees found. Overall, Pauprat identifies more diverse trees than Rec-I-DCM3. Overall, our work shows that there is value to comparing heuristics beyond the parsimony scores that they find. Pauprat is a slower heuristic than Rec-I-DCM3. However, our work shows that there is tremendous value in using Pauprat to reconstruct trees-especially since it finds identical scoring but topologically distinct trees. Hence, instead of discounting Pauprat, effort should go in improving its implementation. Ultimately, improved performance measures lead to better phylogenetic heuristics and will result in better approximations of the true evolutionary history of the organisms of interest.

  10. Vessel-guided airway segmentation based on voxel classification

    DEFF Research Database (Denmark)

    Lo, Pechin Chien Pau; Sporring, Jon; Ashraf, Haseem

    2008-01-01

    This paper presents a method for improving airway tree segmentation using vessel orientation information. We use the fact that an airway branch is always accompanied by an artery, with both structures having similar orientations. This work is based on a  voxel classification airway segmentation...... method proposed previously. The probability of a voxel belonging to the airway, from the voxel classification method, is augmented with an orientation similarity measure as a criterion for region growing. The orientation similarity measure of a voxel indicates how similar is the orientation...... of the surroundings of a voxel, estimated based on a tube model, is to that of a neighboring vessel. The proposed method is tested on 20 CT images from different subjects selected randomly from a lung cancer screening study. Length of the airway branches from the results of the proposed method are significantly...

  11. The Iqmulus Urban Showcase: Automatic Tree Classification and Identification in Huge Mobile Mapping Point Clouds

    Science.gov (United States)

    Böhm, J.; Bredif, M.; Gierlinger, T.; Krämer, M.; Lindenberg, R.; Liu, K.; Michel, F.; Sirmacek, B.

    2016-06-01

    Current 3D data capturing as implemented on for example airborne or mobile laser scanning systems is able to efficiently sample the surface of a city by billions of unselective points during one working day. What is still difficult is to extract and visualize meaningful information hidden in these point clouds with the same efficiency. This is where the FP7 IQmulus project enters the scene. IQmulus is an interactive facility for processing and visualizing big spatial data. In this study the potential of IQmulus is demonstrated on a laser mobile mapping point cloud of 1 billion points sampling ~ 10 km of street environment in Toulouse, France. After the data is uploaded to the IQmulus Hadoop Distributed File System, a workflow is defined by the user consisting of retiling the data followed by a PCA driven local dimensionality analysis, which runs efficiently on the IQmulus cloud facility using a Spark implementation. Points scattering in 3 directions are clustered in the tree class, and are separated next into individual trees. Five hours of processing at the 12 node computing cluster results in the automatic identification of 4000+ urban trees. Visualization of the results in the IQmulus fat client helps users to appreciate the results, and developers to identify remaining flaws in the processing workflow.

  12. Application of In-Segment Multiple Sampling in Object-Based Classification

    Directory of Open Access Journals (Sweden)

    Nataša Đurić

    2014-12-01

    Full Text Available When object-based analysis is applied to very high-resolution imagery, pixels within the segments reveal large spectral inhomogeneity; their distribution can be considered complex rather than normal. When normality is violated, the classification methods that rely on the assumption of normally distributed data are not as successful or accurate. It is hard to detect normality violations in small samples. The segmentation process produces segments that vary highly in size; samples can be very big or very small. This paper investigates whether the complexity within the segment can be addressed using multiple random sampling of segment pixels and multiple calculations of similarity measures. In order to analyze the effect sampling has on classification results, statistics and probability value equations of non-parametric two-sample Kolmogorov-Smirnov test and parametric Student’s t-test are selected as similarity measures in the classification process. The performance of both classifiers was assessed on a WorldView-2 image for four land cover classes (roads, buildings, grass and trees and compared to two commonly used object-based classifiers—k-Nearest Neighbor (k-NN and Support Vector Machine (SVM. Both proposed classifiers showed a slight improvement in the overall classification accuracies and produced more accurate classification maps when compared to the ground truth image.

  13. An Efficient Method of Vibration Diagnostics For Rotating Machinery Using a Decision Tree

    Directory of Open Access Journals (Sweden)

    Bo Suk Yang

    2000-01-01

    Full Text Available This paper describes an efficient method to automatize vibration diagnosis for rotating machinery using a decision tree, which is applicable to vibration diagnosis expert system. Decision tree is a widely known formalism for expressing classification knowledge and has been used successfully in many diverse areas such as character recognition, medical diagnosis, and expert systems, etc. In order to build a decision tree for vibration diagnosis, we have to define classes and attributes. A set of cases based on past experiences is also needed. This training set is inducted using a result-cause matrix newly developed in the present work instead of using a conventionally implemented cause-result matrix. This method was applied to diagnostics for various cases taken from published work. It is found that the present method predicts causes of the abnormal vibration for test cases with high reliability.

  14. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  15. Combined Kernel-Based BDT-SMO Classification of Hyperspectral Fused Images

    Directory of Open Access Journals (Sweden)

    Fenghua Huang

    2014-01-01

    Full Text Available To solve the poor generalization and flexibility problems that single kernel SVM classifiers have while classifying combined spectral and spatial features, this paper proposed a solution to improve the classification accuracy and efficiency of hyperspectral fused images: (1 different radial basis kernel functions (RBFs are employed for spectral and textural features, and a new combined radial basis kernel function (CRBF is proposed by combining them in a weighted manner; (2 the binary decision tree-based multiclass SMO (BDT-SMO is used in the classification of hyperspectral fused images; (3 experiments are carried out, where the single radial basis function- (SRBF- based BDT-SMO classifier and the CRBF-based BDT-SMO classifier are used, respectively, to classify the land usages of hyperspectral fused images, and genetic algorithms (GA are used to optimize the kernel parameters of the classifiers. The results show that, compared with SRBF, CRBF-based BDT-SMO classifiers display greater classification accuracy and efficiency.

  16. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    Energy Technology Data Exchange (ETDEWEB)

    Möller, A. [Research School of Astronomy and Astrophysics, Australian National University, Canberra, ACT 2611 (Australia); Ruhlmann-Kleider, V.; Leloup, C.; Neveu, J.; Palanque-Delabrouille, N.; Rich, J. [Irfu, SPP, CEA Saclay, F-91191 Gif sur Yvette Cedex (France); Carlberg, R. [Department of Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto, ON M5S 3H8 (Canada); Lidman, C. [Australian Astronomical Observatory, North Ryde, NSW 2113 (Australia); Pritchet, C., E-mail: anais.moller@anu.edu.au, E-mail: vanina.ruhlmann-kleider@cea.fr, E-mail: clement.leloup@cea.fr, E-mail: jneveu@lal.in2p3.fr, E-mail: nathalie.palanque-delabrouille@cea.fr, E-mail: james.rich@cea.fr, E-mail: raymond.carlberg@utoronto.ca, E-mail: chris.lidman@aao.gov.au, E-mail: pritchet@uvic.ca [Department of Physics and Astronomy, University of Victoria, P.O. Box 3055, Victoria, BC V8W 3P6 (Canada)

    2016-12-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 < z < 1.1). Our method consists of two stages: feature extraction (obtaining the SN redshift from photometry and estimating light-curve shape parameters) and machine learning classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high- z SN survey with application to real SN data.

  17. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    International Nuclear Information System (INIS)

    Möller, A.; Ruhlmann-Kleider, V.; Leloup, C.; Neveu, J.; Palanque-Delabrouille, N.; Rich, J.; Carlberg, R.; Lidman, C.; Pritchet, C.

    2016-01-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 < z < 1.1). Our method consists of two stages: feature extraction (obtaining the SN redshift from photometry and estimating light-curve shape parameters) and machine learning classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high- z SN survey with application to real SN data.

  18. [Mahalanobis distance based hyperspectral characteristic discrimination of leaves of different desert tree species].

    Science.gov (United States)

    Lin, Hai-jun; Zhang, Hui-fang; Gao, Ya-qi; Li, Xia; Yang, Fan; Zhou, Yan-fei

    2014-12-01

    The hyperspectral reflectance of Populus euphratica, Tamarix hispida, Haloxylon ammodendron and Calligonum mongolicum in the lower reaches of Tarim River and Turpan Desert Botanical Garden was measured by using the HR-768 field-portable spectroradiometer. The method of continuum removal, first derivative reflectance and second derivative reflectance were used to deal with the original spectral data of four tree species. The method of Mahalanobis Distance was used to select the bands with significant differences in the original spectral data and transform spectral data to identify the different tree species. The progressive discrimination analyses were used to test the selective bands used to identify different tree species. The results showed that The Mahalanobis Distance method was an effective method in feature band extraction. The bands for identifying different tree species were most near-infrared bands. The recognition accuracy of four methods was 85%, 93.8%, 92.4% and 95.5% respectively. Spectrum transform could improve the recognition accuracy. The recognition accuracy of different research objects and different spectrum transform methods were different. The research provided evidence for desert tree species classification, monitoring biodiversity and the analysis of area in desert by using large scale remote sensing method.

  19. Decision tree analysis as a supplementary tool to enhance histomorphological differentiation when distinguishing human from non-human cranial bone in both burnt and unburnt states: A feasibility study.

    Science.gov (United States)

    Simmons, T; Goodburn, B; Singhrao, S K

    2016-01-01

    This feasibility study was undertaken to describe and record the histological characteristics of burnt and unburnt cranial bone fragments from human and non-human bones. Reference series of fully mineralized, transverse sections of cranial bone, from all variables and specimen states, were prepared by manual cutting and semi-automated grinding and polishing methods. A photomicrograph catalogue reflecting differences in burnt and unburnt bone from human and non-humans was recorded and qualitative analysis was performed using an established classification system based on primary bone characteristics. The histomorphology associated with human and non-human samples was, for the main part, preserved following burning at high temperature. Clearly, fibro-lamellar complex tissue subtypes, such as plexiform or laminar primary bone, were only present in non-human bones. A decision tree analysis based on histological features provided a definitive identification key for distinguishing human from non-human bone, with an accuracy of 100%. The decision tree for samples where burning was unknown was 96% accurate, and multi-step classification to taxon was possible with 100% accuracy. The results of this feasibility study strongly suggest that histology remains a viable alternative technique if fragments of cranial bone require forensic examination in both burnt and unburnt states. The decision tree analysis may provide an additional but vital tool to enhance data interpretation. Further studies are needed to assess variation in histomorphology taking into account other cranial bones, ontogeny, species and burning conditions. © The Author(s) 2015.

  20. Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics.

    Science.gov (United States)

    Grünewald, Stefan; Long, Yangjing; Wu, Yaokun

    2018-03-09

    Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.

  1. Generalising tree traversals and tree transformations to DAGs

    DEFF Research Database (Denmark)

    Bahr, Patrick; Axelsson, Emil

    2017-01-01

    We present a recursion scheme based on attribute grammars that can be transparently applied to trees and acyclic graphs. Our recursion scheme allows the programmer to implement a tree traversal or a tree transformation and then apply it to compact graph representations of trees instead. The resul......We present a recursion scheme based on attribute grammars that can be transparently applied to trees and acyclic graphs. Our recursion scheme allows the programmer to implement a tree traversal or a tree transformation and then apply it to compact graph representations of trees instead...... as the complementing theory with a number of examples....

  2. Remote sensing survey of Chinese tallow tree in the Toledo Bend Reservoir area, Louisiana and Texas

    Science.gov (United States)

    Ramsey, Elijah W.; Rangoonwala, Amina; Bannister, Terri; Suzuoki, Yukihiro

    2013-01-01

    We applied Hyperion sensor satellite data acquired by the National Aeronautics and Space Administration’s Earth Observing-1 (EO-1) satellite in conjunction with reconnaissance surveys to map the occurrences of the invasive Chinese tallow tree (Triadica sebifera) in the Toledo Bend Reservoir study area of northwestern Louisiana and northeastern Texas. The rationale for application of high spectral resolution EO-1 Hyperion data was based on the successful use of Hyperion data in the mapping of Chinese tallow tree in southwestern Louisiana in 2005. In contrast to the single Hyperion image used in the 2005 project, more than 20 EO-1 Hyperion and Advanced Land Imager (ALI) images of the study area were collected in 2009 and 2010 during the fall senescence when Chinese tallow tree leaves turn red. Atmospherically corrected reflectance spectra of Hyperion imagery collected at ground and aerial observation locations provided the input datasets used in the program for spectral discrimination analysis. Discrimination analysis was used to identify spectral indicator sets to best explain variance contained in the input databases. The expectation was that at least one set of Hyperion-based indicator spectra would uniquely identify occurrences of red-leaf Chinese tallow tree; however, no combination of Hyperion-based reflectance datasets produced a unique identifier. The inability to discover a unique spectral indicator resulted primarily from relatively sparse coverage by red-leaf Chinese tallow tree within the study area (percentage of coverage was less than 5 percent per 30- by 30-meter Hyperion pixel). To enhance the performance of the spectral discrimination analysis, leaf and canopy spectra of Chinese tallow tree were added to the input datasets to guide the indicator selection. In addition, input databases were segregated by land class obtained from an ALI-based landcover classification in order to reduce the input variance and to promote spectral discrimination of red

  3. Rooting phylogenetic trees under the coalescent model using site pattern probabilities.

    Science.gov (United States)

    Tian, Yuan; Kubatko, Laura

    2017-12-19

    Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.

  4. Estimating Leaf Water Potential of Giant Sequoia Trees from Airborne Hyperspectral Imagery

    Science.gov (United States)

    Francis, E. J.; Asner, G. P.

    2015-12-01

    Recent drought-induced forest dieback events have motivated research on the mechanisms of tree survival and mortality during drought. Leaf water potential, a measure of the force exerted by the evaporation of water from the leaf surface, is an indicator of plant water stress and can help predict tree mortality in response to drought. Scientists have traditionally measured water potentials on a tree-by-tree basis, but have not been able to produce maps of tree water potential at the scale of a whole forest, leaving forest managers unaware of forest drought stress patterns and their ecosystem-level consequences. Imaging spectroscopy, a technique for remote measurement of chemical properties, has been used to successfully estimate leaf water potentials in wheat and maize crops and pinyon-pine and juniper trees, but these estimates have never been scaled to the canopy level. We used hyperspectral reflectance data collected by the Carnegie Airborne Observatory (CAO) to map leaf water potentials of giant sequoia trees (Sequoiadendron giganteum) in an 800-hectare grove in Sequoia National Park. During the current severe drought in California, we measured predawn and midday leaf water potentials of 48 giant sequoia trees, using the pressure bomb method on treetop foliage samples collected with tree-climbing techniques. The CAO collected hyperspectral reflectance data at 1-meter resolution from the same grove within 1-2 weeks of the tree-level measurements. A partial least squares regression was used to correlate reflectance data extracted from the 48 focal trees with their water potentials, producing a model that predicts water potential of giant sequoia trees. Results show that giant sequoia trees can be mapped in the imagery with a classification accuracy of 0.94, and we predicted the water potential of the mapped trees to assess 1) similarities and differences between a leaf water potential map and a canopy water content map produced from airborne hyperspectral data, 2

  5. Improving Oil Palm Classification in the Peruvian Amazon by Combining Active and Passive Remote Sensing Data

    Science.gov (United States)

    Gutierrez-Velez, V. H.; DeFries, R. S.

    2011-12-01

    Oil palm expansion has led to clearing of extensive forest areas in the tropics. However quantitative assessments of the magnitude of oil palm expansion to deforestation have been challenging due in large part to the limitations presented by conventional optical data sets for discriminating plantations from forests and other tree cover vegetations. Recently available information from active remote sensors has opened the possibility of using these data sources to overcome these limitations. The purpose of this analysis is to evaluate the accuracy of oil palm classification when using ALOS/PALSAR active satellite data in conjunction with Landsat information, compared to the use of Landsat data only. The analysis takes place in a focused region around the city of Pucallpa in the Ucayali province of the Peruvian Amazon for the year 2010. Oil palm plantations were separated in five categories consisting of four age classes (0-3, 3-5, 5-10 and > 10 yrs) and an additional class accounting for degraded plantations older than 15 yr. Other land covers were water bodies, unvegetated land, short and tall grass, fallow, secondary vegetation, and forest. Classifications were performed using random forests. Training points for calibration and validation consisted of 411 polygons measured in areas representative of the land covers of interest and totaled 6,367 ha. Overall classification accuracy increased from 89.9% using only Landsat data sets to 94.3% using both Landast and ALOS/PALSAR. Both user's and producer's accuracy increased in all classes when using both data sets except for producer's accuracy in short grass which decreased by 1%. The largest increase in user's accuracy was obtained in oil palm plantations older than 10 years from 62 to 80% while producer's accuracy improved the most in plantations in age class 3-5 from 63 to 80%. Results demonstrate the suitability of data from ALOS/PALSAR and other active remote sensors to improve classification of oil palm

  6. Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

    Directory of Open Access Journals (Sweden)

    Flávia T Martins-Bedê

    2010-07-01

    Full Text Available Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG. We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

  7. Explicit area-based accuracy assessment for mangrove tree crown delineation using Geographic Object-Based Image Analysis (GEOBIA)

    Science.gov (United States)

    Kamal, Muhammad; Johansen, Kasper

    2017-10-01

    Effective mangrove management requires spatially explicit information of mangrove tree crown map as a basis for ecosystem diversity study and health assessment. Accuracy assessment is an integral part of any mapping activities to measure the effectiveness of the classification approach. In geographic object-based image analysis (GEOBIA) the assessment of the geometric accuracy (shape, symmetry and location) of the created image objects from image segmentation is required. In this study we used an explicit area-based accuracy assessment to measure the degree of similarity between the results of the classification and reference data from different aspects, including overall quality (OQ), user's accuracy (UA), producer's accuracy (PA) and overall accuracy (OA). We developed a rule set to delineate the mangrove tree crown using WorldView-2 pan-sharpened image. The reference map was obtained by visual delineation of the mangrove tree crowns boundaries form a very high-spatial resolution aerial photograph (7.5cm pixel size). Ten random points with a 10 m radius circular buffer were created to calculate the area-based accuracy assessment. The resulting circular polygons were used to clip both the classified image objects and reference map for area comparisons. In this case, the area-based accuracy assessment resulted 64% and 68% for the OQ and OA, respectively. The overall quality of the calculation results shows the class-related area accuracy; which is the area of correctly classified as tree crowns was 64% out of the total area of tree crowns. On the other hand, the overall accuracy of 68% was calculated as the percentage of all correctly classified classes (tree crowns and canopy gaps) in comparison to the total class area (an entire image). Overall, the area-based accuracy assessment was simple to implement and easy to interpret. It also shows explicitly the omission and commission error variations of object boundary delineation with colour coded polygons.

  8. Surface tree languages and parallel derivation trees

    NARCIS (Netherlands)

    Engelfriet, Joost

    1976-01-01

    The surface tree languages obtained by top-down finite state transformation of monadic trees are exactly the frontier-preserving homomorphic images of sets of derivation trees of ETOL systems. The corresponding class of tree transformation languages is therefore equal to the class of ETOL languages.

  9. Preventing KPI Violations in Business Processes based on Decision Tree Learning and Proactive Runtime Adaptation

    Directory of Open Access Journals (Sweden)

    Dimka Karastoyanova

    2012-01-01

    Full Text Available The performance of business processes is measured and monitored in terms of Key Performance Indicators (KPIs. If the monitoring results show that the KPI targets are violated, the underlying reasons have to be identified and the process should be adapted accordingly to address the violations. In this paper we propose an integrated monitoring, prediction and adaptation approach for preventing KPI violations of business process instances. KPIs are monitored continuously while the process is executed. Additionally, based on KPI measurements of historical process instances we use decision tree learning to construct classification models which are then used to predict the KPI value of an instance while it is still running. If a KPI violation is predicted, we identify adaptation requirements and adaptation strategies in order to prevent the violation.

  10. Up in the Tree – The Overlooked Richness of Bryophytes and Lichens in Tree Crowns

    Science.gov (United States)

    Boch, Steffen; Müller, Jörg; Prati, Daniel; Blaser, Stefan; Fischer, Markus

    2013-01-01

    Assessing diversity is among the major tasks in ecology and conservation science. In ecological and conservation studies, epiphytic cryptogams are usually sampled up to accessible heights in forests. Thus, their diversity, especially of canopy specialists, likely is underestimated. If the proportion of those species differs among forest types, plot-based diversity assessments are biased and may result in misleading conservation recommendations. We sampled bryophytes and lichens in 30 forest plots of 20 m × 20 m in three German regions, considering all substrates, and including epiphytic litter fall. First, the sampling of epiphytic species was restricted to the lower 2 m of trees and shrubs. Then, on one representative tree per plot, we additionally recorded epiphytic species in the crown, using tree climbing techniques. Per tree, on average 54% of lichen and 20% of bryophyte species were overlooked if the crown was not been included. After sampling all substrates per plot, including the bark of all shrubs and trees, still 38% of the lichen and 4% of the bryophyte species were overlooked if the tree crown of the sampled tree was not included. The number of overlooked lichen species varied strongly among regions. Furthermore, the number of overlooked bryophyte and lichen species per plot was higher in European beech than in coniferous stands and increased with increasing diameter at breast height of the sampled tree. Thus, our results indicate a bias of comparative studies which might have led to misleading conservation recommendations of plot-based diversity assessments. PMID:24358373

  11. Dynamics in small worlds of tree topologies of wireless sensor networks

    DEFF Research Database (Denmark)

    Li, Qiao; Zhang, Baihai; Fan, Zhun

    2012-01-01

    Tree topologies, which construct spatial graphs with large characteristic path lengths and small clustering coefficients, are ubiquitous in deployments of wireless sensor networks. Small worlds are investigated in tree-based networks. Due to link additions, characteristic path lengths reduce...... rapidly and clustering coefficients increase greatly. A tree abstract, Cayley tree, is considered for the study of the navigation algorithm, which runs automatically in the small worlds of tree-based networks. In the further study, epidemics in the small worlds of tree-based wireless sensor networks...

  12. Determination of the ecological connectivity between landscape patches obtained using the knowledge engineer (expert) classification technique

    Science.gov (United States)

    Selim, Serdar; Sonmez, Namik Kemal; Onur, Isin; Coslu, Mesut

    2017-10-01

    Connection of similar landscape patches with ecological corridors supports habitat quality of these patches, increases urban ecological quality, and constitutes an important living and expansion area for wild life. Furthermore, habitat connectivity provided by urban green areas is supporting biodiversity in urban areas. In this study, possible ecological connections between landscape patches, which were achieved by using Expert classification technique and modeled with probabilistic connection index. Firstly, the reflection responses of plants to various bands are used as data in hypotheses. One of the important features of this method is being able to use more than one image at the same time in the formation of the hypothesis. For this reason, before starting the application of the Expert classification, the base images are prepared. In addition to the main image, the hypothesis conditions were also created for each class with the NDVI image which is commonly used in the vegetation researches. Besides, the results of the previously conducted supervised classification were taken into account. We applied this classification method by using the raster imagery with user-defined variables. Hereupon, to provide ecological connections of the tree cover which was achieved from the classification, we used Probabilistic Connection (PC) index. The probabilistic connection model which is used for landscape planning and conservation studies via detecting and prioritization critical areas for ecological connection characterizes the possibility of direct connection between habitats. As a result we obtained over % 90 total accuracy in accuracy assessment analysis. We provided ecological connections with PC index and we created inter-connected green spaces system. Thus, we offered and implicated green infrastructure system model takes place in the agenda of recent years.

  13. Automatic classification of blank substrate defects

    Science.gov (United States)

    Boettiger, Tom; Buck, Peter; Paninjath, Sankaranarayanan; Pereira, Mark; Ronald, Rob; Rost, Dan; Samir, Bhamidipati

    2014-10-01

    Mask preparation stages are crucial in mask manufacturing, since this mask is to later act as a template for considerable number of dies on wafer. Defects on the initial blank substrate, and subsequent cleaned and coated substrates, can have a profound impact on the usability of the finished mask. This emphasizes the need for early and accurate identification of blank substrate defects and the risk they pose to the patterned reticle. While Automatic Defect Classification (ADC) is a well-developed technology for inspection and analysis of defects on patterned wafers and masks in the semiconductors industry, ADC for mask blanks is still in the early stages of adoption and development. Calibre ADC is a powerful analysis tool for fast, accurate, consistent and automatic classification of defects on mask blanks. Accurate, automated classification of mask blanks leads to better usability of blanks by enabling defect avoidance technologies during mask writing. Detailed information on blank defects can help to select appropriate job-decks to be written on the mask by defect avoidance tools [1][4][5]. Smart algorithms separate critical defects from the potentially large number of non-critical defects or false defects detected at various stages during mask blank preparation. Mechanisms used by Calibre ADC to identify and characterize defects include defect location and size, signal polarity (dark, bright) in both transmitted and reflected review images, distinguishing defect signals from background noise in defect images. The Calibre ADC engine then uses a decision tree to translate this information into a defect classification code. Using this automated process improves classification accuracy, repeatability and speed, while avoiding the subjectivity of human judgment compared to the alternative of manual defect classification by trained personnel [2]. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at MP Mask

  14. TreePics: visualizing trees with pictures

    Directory of Open Access Journals (Sweden)

    Nicolas Puillandre

    2017-09-01

    Full Text Available While many programs are available to edit phylogenetic trees, associating pictures with branch tips in an efficient and automatic way is not an available option. Here, we present TreePics, a standalone software that uses a web browser to visualize phylogenetic trees in Newick format and that associates pictures (typically, pictures of the voucher specimens to the tip of each branch. Pictures are visualized as thumbnails and can be enlarged by a mouse rollover. Further, several pictures can be selected and displayed in a separate window for visual comparison. TreePics works either online or in a full standalone version, where it can display trees with several thousands of pictures (depending on the memory available. We argue that TreePics can be particularly useful in a preliminary stage of research, such as to quickly detect conflicts between a DNA-based phylogenetic tree and morphological variation, that may be due to contamination that needs to be removed prior to final analyses, or the presence of species complexes.

  15. A Method to Quantify Plant Availability and Initiating Event Frequency Using a Large Event Tree, Small Fault Tree Model

    International Nuclear Information System (INIS)

    Kee, Ernest J.; Sun, Alice; Rodgers, Shawn; Popova, ElmiraV; Nelson, Paul; Moiseytseva, Vera; Wang, Eric

    2006-01-01

    South Texas Project uses a large fault tree to produce scenarios (minimal cut sets) used in quantification of plant availability and event frequency predictions. On the other hand, the South Texas Project probabilistic risk assessment model uses a large event tree, small fault tree for quantifying core damage and radioactive release frequency predictions. The South Texas Project is converting its availability and event frequency model to use a large event tree, small fault in an effort to streamline application support and to provide additional detail in results. The availability and event frequency model as well as the applications it supports (maintenance and operational risk management, system engineering health assessment, preventive maintenance optimization, and RIAM) are briefly described. A methodology to perform availability modeling in a large event tree, small fault tree framework is described in detail. How the methodology can be used to support South Texas Project maintenance and operations risk management is described in detail. Differences with other fault tree methods and other recently proposed methods are discussed in detail. While the methods described are novel to the South Texas Project Risk Management program and to large event tree, small fault tree models, concepts in the area of application support and availability modeling have wider applicability to the industry. (authors)

  16. The View from the Trees: Nocturnal Bull Ants, Myrmecia midas, Use the Surrounding Panorama While Descending from Trees.

    Science.gov (United States)

    Freas, Cody A; Wystrach, Antione; Narendra, Ajay; Cheng, Ken

    2018-01-01

    Solitary foraging ants commonly use visual cues from their environment for navigation. Foragers are known to store visual scenes from the surrounding panorama for later guidance to known resources and to return successfully back to the nest. Several ant species travel not only on the ground, but also climb trees to locate resources. The navigational information that guides animals back home during their descent, while their body is perpendicular to the ground, is largely unknown. Here, we investigate in a nocturnal ant, Myrmecia midas , whether foragers travelling down a tree use visual information to return home. These ants establish nests at the base of a tree on which they forage and in addition, they also forage on nearby trees. We collected foragers and placed them on the trunk of the nest tree or a foraging tree in multiple compass directions. Regardless of the displacement location, upon release ants immediately moved to the side of the trunk facing the nest during their descent. When ants were released on non-foraging trees near the nest, displaced foragers again travelled around the tree to the side facing the nest. All the displaced foragers reached the correct side of the tree well before reaching the ground. However, when the terrestrial cues around the tree were blocked, foragers were unable to orient correctly, suggesting that the surrounding panorama is critical to successful orientation on the tree. Through analysis of panoramic pictures, we show that views acquired at the base of the foraging tree nest can provide reliable nest-ward orientation up to 1.75 m above the ground. We discuss, how animals descending from trees compare their current scene to a memorised scene and report on the similarities in visually guided behaviour while navigating on the ground and descending from trees.

  17. The View from the Trees: Nocturnal Bull Ants, Myrmecia midas, Use the Surrounding Panorama While Descending from Trees

    Directory of Open Access Journals (Sweden)

    Cody A. Freas

    2018-01-01

    Full Text Available Solitary foraging ants commonly use visual cues from their environment for navigation. Foragers are known to store visual scenes from the surrounding panorama for later guidance to known resources and to return successfully back to the nest. Several ant species travel not only on the ground, but also climb trees to locate resources. The navigational information that guides animals back home during their descent, while their body is perpendicular to the ground, is largely unknown. Here, we investigate in a nocturnal ant, Myrmecia midas, whether foragers travelling down a tree use visual information to return home. These ants establish nests at the base of a tree on which they forage and in addition, they also forage on nearby trees. We collected foragers and placed them on the trunk of the nest tree or a foraging tree in multiple compass directions. Regardless of the displacement location, upon release ants immediately moved to the side of the trunk facing the nest during their descent. When ants were released on non-foraging trees near the nest, displaced foragers again travelled around the tree to the side facing the nest. All the displaced foragers reached the correct side of the tree well before reaching the ground. However, when the terrestrial cues around the tree were blocked, foragers were unable to orient correctly, suggesting that the surrounding panorama is critical to successful orientation on the tree. Through analysis of panoramic pictures, we show that views acquired at the base of the foraging tree nest can provide reliable nest-ward orientation up to 1.75 m above the ground. We discuss, how animals descending from trees compare their current scene to a memorised scene and report on the similarities in visually guided behaviour while navigating on the ground and descending from trees.

  18. A Pruning Neural Network Model in Credit Classification Analysis

    Directory of Open Access Journals (Sweden)

    Yajiao Tang

    2018-01-01

    Full Text Available Nowadays, credit classification models are widely applied because they can help financial decision-makers to handle credit classification issues. Among them, artificial neural networks (ANNs have been widely accepted as the convincing methods in the credit industry. In this paper, we propose a pruning neural network (PNN and apply it to solve credit classification problem by adopting the well-known Australian and Japanese credit datasets. The model is inspired by synaptic nonlinearity of a dendritic tree in a biological neural model. And it is trained by an error back-propagation algorithm. The model is capable of realizing a neuronal pruning function by removing the superfluous synapses and useless dendrites and forms a tidy dendritic morphology at the end of learning. Furthermore, we utilize logic circuits (LCs to simulate the dendritic structures successfully which makes PNN be implemented on the hardware effectively. The statistical results of our experiments have verified that PNN obtains superior performance in comparison with other classical algorithms in terms of accuracy and computational efficiency.

  19. Overfitting Reduction of Text Classification Based on AdaBELM

    Directory of Open Access Journals (Sweden)

    Xiaoyue Feng

    2017-07-01

    Full Text Available Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM, suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.

  20. THE IQMULUS URBAN SHOWCASE: AUTOMATIC TREE CLASSIFICATION AND IDENTIFICATION IN HUGE MOBILE MAPPING POINT CLOUDS

    Directory of Open Access Journals (Sweden)

    J. Böhm

    2016-06-01

    Full Text Available Current 3D data capturing as implemented on for example airborne or mobile laser scanning systems is able to efficiently sample the surface of a city by billions of unselective points during one working day. What is still difficult is to extract and visualize meaningful information hidden in these point clouds with the same efficiency. This is where the FP7 IQmulus project enters the scene. IQmulus is an interactive facility for processing and visualizing big spatial data. In this study the potential of IQmulus is demonstrated on a laser mobile mapping point cloud of 1 billion points sampling ~ 10 km of street environment in Toulouse, France. After the data is uploaded to the IQmulus Hadoop Distributed File System, a workflow is defined by the user consisting of retiling the data followed by a PCA driven local dimensionality analysis, which runs efficiently on the IQmulus cloud facility using a Spark implementation. Points scattering in 3 directions are clustered in the tree class, and are separated next into individual trees. Five hours of processing at the 12 node computing cluster results in the automatic identification of 4000+ urban trees. Visualization of the results in the IQmulus fat client helps users to appreciate the results, and developers to identify remaining flaws in the processing workflow.

  1. A COMPARITIVE STUDY USING GEOMETRIC AND VERTICAL PROFILE FEATURES DERIVED FROM AIRBORNE LIDAR FOR CLASSIFYING TREE GENERA

    Directory of Open Access Journals (Sweden)

    C. Ko

    2012-07-01

    Full Text Available We present a comparative study between two different approaches for tree genera classification using descriptors derived from tree geometry and those derived from the vertical profile analysis of LiDAR point data. The different methods provide two perspectives for processing LiDAR point clouds for tree genera identification. The geometric perspective analyzes individual tree crowns in relation to valuable information related to characteristics of clusters and line segments derived within crowns and overall tree shapes to highlight the spatial distribution of LiDAR points within the crown. Conversely, analyzing vertical profiles retrieves information about the point distributions with respect to height percentiles; this perspective emphasizes of the importance that point distributions at specific heights express, accommodating for the decreased point density with respect to depth of canopy penetration by LiDAR pulses. The targeted species include white birch, maple, oak, poplar, white pine and jack pine at a study site northeast of Sault Ste. Marie, Ontario, Canada.

  2. Photosynthetic and growth response of sugar maple (Acer saccharum Marsh.) mature trees and seedlings to calcium, magnesium, and nitrogen additions in the Catskill Mountains, NY, USA

    Science.gov (United States)

    Momen, Bahram; Behling, Shawna J; Lawrence, Gregory B.; Sullivan, Joseph H

    2015-01-01

    Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the

  3. Photosynthetic and Growth Response of Sugar Maple (Acer saccharum Marsh.) Mature Trees and Seedlings to Calcium, Magnesium, and Nitrogen Additions in the Catskill Mountains, NY, USA.

    Science.gov (United States)

    Momen, Bahram; Behling, Shawna J; Lawrence, Greg B; Sullivan, Joseph H

    2015-01-01

    Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the forest floor

  4. Beyond Tree Throw: Wind, Water, Rock and the Mechanics of Tree-Driven Bedrock Physical Weathering

    Science.gov (United States)

    Marshall, J. A.; Anderson, R. S.; Dawson, T. E.; Dietrich, W. E.; Minear, J. T.

    2017-12-01

    Tree throw is often invoked as the dominant process in converting bedrock to soil and thus helping to build the Critical Zone (CZ). In addition, observations of tree roots lifting sidewalk slabs, occupying cracks, and prying slabs of rock from cliff faces have led to a general belief in the power of plant growth forces. These common observations have led to conceptual models with trees at the center of the soil genesis process. This is despite the observation that tree throw is rare in many forested settings, and a dearth of field measurements that quantify the magnitude of growth forces. While few trees blow down, every tree grows roots, inserting many tens of percent of its mass below ground. Yet we lack data quantifying the role of trees in both damaging bedrock and detaching it (and thus producing soil). By combing force measurements at the tree-bedrock interface with precipitation, solar radiation, wind speed, and wind-driven tree sway data we quantified the magnitude and frequency of tree-driven soil-production mechanisms from two contrasting climatic and lithologic regimes (Boulder and Eel Creek CZ Observatories). Preliminary data suggests that in settings with relatively thin soils, trees can damage and detach rock due to diurnal fluctuations, wind response and rainfall events. Surprisingly, our data suggests that forces from roots and trunks growing against bedrock are insufficient to pry rock apart or damage bedrock although much more work is needed in this area. The frequency, magnitude and style of wind-driven tree forces at the bedrock interface varies considerably from one to another species. This suggests that tree properties such as mass, elasticity, stiffness and branch structure determine whether trees respond to gusts big or small, move at the same frequency as large wind gusts, or are able to self-dampen near-ground sway response to extended wind forces. Our measurements of precipitation-driven and daily fluctuations in root pressures exerted on

  5. A Hierarchical Object-oriented Urban Land Cover Classification Using WorldView-2 Imagery and Airborne LiDAR data

    Science.gov (United States)

    Wu, M. F.; Sun, Z. C.; Yang, B.; Yu, S. S.

    2016-11-01

    In order to reduce the “salt and pepper” in pixel-based urban land cover classification and expand the application of fusion of multi-source data in the field of urban remote sensing, WorldView-2 imagery and airborne Light Detection and Ranging (LiDAR) data were used to improve the classification of urban land cover. An approach of object- oriented hierarchical classification was proposed in our study. The processing of proposed method consisted of two hierarchies. (1) In the first hierarchy, LiDAR Normalized Digital Surface Model (nDSM) image was segmented to objects. The NDVI, Costal Blue and nDSM thresholds were set for extracting building objects. (2) In the second hierarchy, after removing building objects, WorldView-2 fused imagery was obtained by Haze-ratio-based (HR) fusion, and was segmented. A SVM classifier was applied to generate road/parking lot, vegetation and bare soil objects. (3) Trees and grasslands were split based on an nDSM threshold (2.4 meter). The results showed that compared with pixel-based and non-hierarchical object-oriented approach, proposed method provided a better performance of urban land cover classification, the overall accuracy (OA) and overall kappa (OK) improved up to 92.75% and 0.90. Furthermore, proposed method reduced “salt and pepper” in pixel-based classification, improved the extraction accuracy of buildings based on LiDAR nDSM image segmentation, and reduced the confusion between trees and grasslands through setting nDSM threshold.

  6. A Multi-Classification Method of Improved SVM-based Information Fusion for Traffic Parameters Forecasting

    Directory of Open Access Journals (Sweden)

    Hongzhuan Zhao

    2016-04-01

    Full Text Available With the enrichment of perception methods, modern transportation system has many physical objects whose states are influenced by many information factors so that it is a typical Cyber-Physical System (CPS. Thus, the traffic information is generally multi-sourced, heterogeneous and hierarchical. Existing research results show that the multisourced traffic information through accurate classification in the process of information fusion can achieve better parameters forecasting performance. For solving the problem of traffic information accurate classification, via analysing the characteristics of the multi-sourced traffic information and using redefined binary tree to overcome the shortcomings of the original Support Vector Machine (SVM classification in information fusion, a multi-classification method using improved SVM in information fusion for traffic parameters forecasting is proposed. The experiment was conducted to examine the performance of the proposed scheme, and the results reveal that the method can get more accurate and practical outcomes.

  7. Neighborhood-preserving mapping between trees

    DEFF Research Database (Denmark)

    Baumbach, Jan; Ibragimov, R.; Guo, Jian-Ying

    2013-01-01

    (v)). Here, for a graph G and a vertex v, we use N(v) to denote the set of vertices which have distance at most i to v in G. We call this problem Neighborhood-Preserving Mapping (NPM). The main result of this paper is a complete dichotomy of the classical complexity of NPM on trees with respect to different...... values of l,d,k. Additionally, we present two dynamic programming algorithms for the case that one of the input trees is a path....

  8. Exploring precrash maneuvers using classification trees and random forests.

    Science.gov (United States)

    Harb, Rami; Yan, Xuedong; Radwan, Essam; Su, Xiaogang

    2009-01-01

    Taking evasive actions vis-à-vis critical traffic situations impending to motor vehicle crashes endows drivers an opportunity to avoid the crash occurrence or at least diminish its severity. This study explores the drivers, vehicles, and environments' characteristics associated with crash avoidance maneuvers (i.e., evasive actions or no evasive actions). Rear-end collisions, head-on collisions, and angle collisions are analyzed separately using decision trees and the significance of the variables on the binary response variable (evasive actions or no evasive actions) is determined. Moreover, the random forests method is employed to rank the importance of the drivers/vehicles/environments characteristics on crash avoidance maneuvers. According to the exploratory analyses' results, drivers' visibility obstruction, drivers' physical impairment, drivers' distraction are associated with crash avoidance maneuvers in all three types of accidents. Moreover, speed limit is associated with rear-end collisions' avoidance maneuvers and vehicle type is correlated with head-on collisions and angle collisions' avoidance maneuvers. It is recommended that future research investigates further the explored trends (e.g., physically impaired drivers, visibility obstruction) using driving simulators which may help in legislative initiatives and in-vehicle technology recommendations.

  9. Using the Dual-Tree Complex Wavelet Transform for Improved Fabric Defect Detection

    Directory of Open Access Journals (Sweden)

    Hermanus Vermaak

    2016-01-01

    Full Text Available The dual-tree complex wavelet transform (DTCWT solves the problems of shift variance and low directional selectivity in two and higher dimensions found with the commonly used discrete wavelet transform (DWT. It has been proposed for applications such as texture classification and content-based image retrieval. In this paper, the performance of the dual-tree complex wavelet transform for fabric defect detection is evaluated. As experimental samples, the fabric images from TILDA, a textile texture database from the Workgroup on Texture Analysis of the German Research Council (DFG, are used. The mean energies of real and imaginary parts of complex wavelet coefficients taken separately are identified as effective features for the purpose of fabric defect detection. Then it is shown that the use of the dual-tree complex wavelet transform yields greater performance as compared to the undecimated wavelet transform (UDWT with a detection rate of 4.5% to 15.8% higher depending on the fabric type.

  10. Fenologics characteristics of the ‘Siciliano’ lemon tree on two rootstocks influenced by liming and boron addition

    Directory of Open Access Journals (Sweden)

    Hélio Grassi Filho

    2004-09-01

    Full Text Available The current study was developed with disturbed samples of an Oxisol, in which ‘Siciliano’ lemon trees seedlings (C. limon were grafted on sour orange tree (C. aurantium and rangpur lime tree (C. limonia. The experiment consisted of three basis saturation levels (50, 70 and 90 percent and three boron doses (0.5; 1.5 and 4.5 mg dm-3 in the planting with 3x3x2 factorial experimental design with four replications. Mineral composition of the "Siciliano" lemon leaves as well as root system development in sour orange tree were higher than the rangpur lime tree. There was no effect in the interaction basis saturarion level and the boron doses for any of the evaluated parameters.O presente estudo foi desenvolvido na UNESP/Botucatu, São Paulo, Brasil, num solo identificado como Oxisol, onde foram plantadas mudas de limoeiro ‘Siciliano’ (C. limon enxertadas em laranjeira ‘Azeda’ (C. aurantium e em limoeiro ‘Cravo’ (C. limonia. O experimento consistiu em três níveis de saturação por bases (50%, 70% e 90% e três doses de boro (0,5; 1,5 e 4,5 mg dm-3 no plantio em esquema fatorial de 3x3x2, com quatro repetições. Houve diferentes comportamentos entre os porta-enxertos no que se refere à composição mineral de folhas de limoeiro ‘Siciliano’, bem como, no desenvolvimento do sistema radicular, sendo maior na laranjeira azeda em relação ao limoeiro cravo. Não houve nenhum efeito na interação de níveis de saturação por bases e doses de boro para nenhum dos parâmetros avaliados.

  11. IMPROVEMENT EVALUATION ON CERAMIC ROOF EXTRACTION USING WORLDVIEW-2 IMAGERY AND GEOGRAPHIC DATA MINING APPROACH

    Directory of Open Access Journals (Sweden)

    V. S. Brum-Bastos

    2016-06-01

    Full Text Available Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel–based approaches on high resolution images. Geographic Object–Based Image Analysis (GEOBIA has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard “Blue-Green-Red-Near Infrared” bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1 eight multispectral and panchromatic bands, and 2 four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user’s skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1 the common user (smaller trees or 2 a more skilled user with coding and/or data mining abilities (bigger trees. In overall the classification was improved by the addition of the four new bands for both types of users.

  12. Drought-associated tree mortality: Global patterns and insights from tree-ring studies in the southwestern U.S.A

    Science.gov (United States)

    Macalady, Alison Kelly

    changes correctly classified the status of ˜70% of trees. Climate responses and competitive interactions partly explained growth differences between dying and surviving trees, with muted response to wet/cool conditions and enhanced sensitivity to competition from congeners linked to growth patterns associated with death. Discrimination and validation of models of mortality risk varied widely across sites and drought events, indicating shifting growth-mortality relationships and differences in mortality processes across space and time (Appendix B). Pre-formed defense anatomy is strongly associated with pinon survivorship over a range of sites and stand conditions. Models of mortality risk that account for both growth and resin duct attributes had ≈10 19 more support than models that contained only growth. The greatest improvement in classification was among trees from the 2000s drought, suggesting an enhanced role for tree defense allocation and/or bark beetle activity during recent warm versus historic cool drought. Accounting for defense characteristics and growth-defense allocation is likely to be important for improving representation of drought-associated mortality (Appendix C). Pinon resin duct chronologies contain climate responses that are coherent and distinct from those of radial growth. Growth responds positively and strongly to previous fall and current winter precipitation, and negatively to late spring and early summer temperature. A relatively equal positive resin duct response to winter precipitation and positive response to mid-to-late summer drought suggests that changes in climate will affect tree defense anatomy in complex ways, with the outcome determined by seasonal changes in precipitation and temperature (Appendix D).

  13. Applying post classification change detection technique to monitor an Egyptian coastal zone (Abu Qir Bay

    Directory of Open Access Journals (Sweden)

    Mamdouh M. El-Hattab

    2016-06-01

    Full Text Available Land cover changes considered as one of the important global phenomena exerting perhaps one of the most significant effects on the environment than any other factor. It is, therefore, vital that accurate data on land cover changes are made available to facilitate the understanding of the link between land cover changes and environmental changes to allow planners to make effective decisions. In this paper, the post classification approach was used to detect and assess land cover changes of one of the important coastal zones in Egypt, Abu Qir Bay zone, based on the comparative analysis of independently produced classification images of the same area at different dates. In addition to satellite images, socioeconomic data were used with the aid of land use model EGSLR to indicate relation between land cover and land use changes. Results indicated that changes in different land covers reflected the changes in occupation status in specific zones. For example, in the south of Idku Lake zone, it was observed that the occupation of settlers changed from being unskilled workers to fishermen based on the expansion of the area of fish farms. Change rates increased dramatically in the period from 2004 to 2013 as remarkable negative changes were found especially in fruits and palm trees (i.e. loss of about 66 km2 of land having fruits and palm trees due to industrialization in the coastal area. Also, a rapid urbanization was monitored along the coastline of Abu Qir Bay zone due to the political conditions in Egypt (25th of January Revolution within this period and which resulted to the temporary absence of monitoring systems to regulate urbanization.

  14. Border trees of complex networks

    International Nuclear Information System (INIS)

    Villas Boas, Paulino R; Rodrigues, Francisco A; Travieso, Gonzalo; Fontoura Costa, Luciano da

    2008-01-01

    The comprehensive characterization of the structure of complex networks is essential to understand the dynamical processes which guide their evolution. The discovery of the scale-free distribution and the small-world properties of real networks were fundamental to stimulate more realistic models and to understand important dynamical processes related to network growth. However, the properties of the network borders (nodes with degree equal to 1), one of its most fragile parts, remained little investigated and understood. The border nodes may be involved in the evolution of structures such as geographical networks. Here we analyze the border trees of complex networks, which are defined as the subgraphs without cycles connected to the remainder of the network (containing cycles) and terminating into border nodes. In addition to describing an algorithm for identification of such tree subgraphs, we also consider how their topological properties can be quantified in terms of their depth and number of leaves. We investigate the properties of border trees for several theoretical models as well as real-world networks. Among the obtained results, we found that more than half of the nodes of some real-world networks belong to the border trees. A power-law with cut-off was observed for the distribution of the depth and number of leaves of the border trees. An analysis of the local role of the nodes in the border trees was also performed

  15. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  16. Post-fire tree establishment patterns at the alpine treeline ecotone: Mount Rainier National Park, Washington, USA

    Science.gov (United States)

    Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth

    2009-01-01

    We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.

  17. ETE: a python Environment for Tree Exploration.

    Science.gov (United States)

    Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni

    2010-01-13

    Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  18. ETE: a python Environment for Tree Exploration

    Directory of Open Access Journals (Sweden)

    Gabaldón Toni

    2010-01-01

    Full Text Available Abstract Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE, a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  19. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  20. Analysis of effects of manhole covers on motorcycle driver maneuvers: a nonparametric classification tree approach.

    Science.gov (United States)

    Chang, Li-Yen

    2014-01-01

    A manhole cover is a removable plate forming the lid over the opening of a manhole to allow traffic to pass over the manhole and to prevent people from falling in. Because most manhole covers are placed in roadway traffic lanes, if these manhole covers are not appropriately installed or maintained, they can represent unexpected hazards on the road, especially for motorcycle drivers. The objective of this study is to identify the effects of manhole cover characteristics as well as driver factors and traffic and roadway conditions on motorcycle driver maneuvers. A video camera was used to record motorcycle drivers' maneuvers when they encountered an inappropriately installed or maintained manhole cover. Information on 3059 drivers' maneuver decisions was recorded. Classification and regression tree (CART) models were applied to explore factors that can significantly affect motorcycle driver maneuvers when passing a manhole cover. Nearly 50 percent of the motorcycle drivers decelerated or changed their driving path to reduce the effects of the manhole cover. The manhole cover characteristics including the level difference between manhole cover and pavement, the pavement condition over the manhole cover, and the size of the manhole cover can significantly affect motorcycle driver maneuvers. Other factors, including traffic conditions, lane width, motorcycle speed, and loading conditions, also have significant effects on motorcycle driver maneuvers. To reduce the effects and potential risks from the manhole covers, highway authorities not only need to make sure that any newly installed manhole covers are as level as possible but also need to regularly maintain all the manhole covers to ensure that they are in good condition. In the long run, the size of manhole covers should be kept as small as possible so that the impact of manhole covers on motorcycle drivers can be effectively reduced. Supplemental materials are available for this article. Go to the publisher

  1. The space of ultrametric phylogenetic trees.

    Science.gov (United States)

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. ColorTree: a batch customization tool for phylogenic trees.

    Science.gov (United States)

    Chen, Wei-Hua; Lercher, Martin J

    2009-07-31

    Genome sequencing projects and comparative genomics studies typically aim to trace the evolutionary history of large gene sets, often requiring human inspection of hundreds of phylogenetic trees. If trees are checked for compatibility with an explicit null hypothesis (e.g., the monophyly of certain groups), this daunting task is greatly facilitated by an appropriate coloring scheme. In this note, we introduce ColorTree, a simple yet powerful batch customization tool for phylogenic trees. Based on pattern matching rules, ColorTree applies a set of customizations to an input tree file, e.g., coloring labels or branches. The customized trees are saved to an output file, which can then be viewed and further edited by Dendroscope (a freely available tree viewer). ColorTree runs on any Perl installation as a stand-alone command line tool, and its application can thus be easily automated. This way, hundreds of phylogenic trees can be customized for easy visual inspection in a matter of minutes. ColorTree allows efficient and flexible visual customization of large tree sets through the application of a user-supplied configuration file to multiple tree files.

  3. Analysis of composition-based metagenomic classification.

    Science.gov (United States)

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in

  4. Design of a hybrid model for cardiac arrhythmia classification based on Daubechies wavelet transform.

    Science.gov (United States)

    Rajagopal, Rekha; Ranganathan, Vidhyapriya

    2018-06-05

    Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.

  5. Important LiDAR metrics for discriminating forest tree species in Central Europe

    Science.gov (United States)

    Shi, Yifang; Wang, Tiejun; Skidmore, Andrew K.; Heurich, Marco

    2018-03-01

    Numerous airborne LiDAR-derived metrics have been proposed for classifying tree species. Yet an in-depth ecological and biological understanding of the significance of these metrics for tree species mapping remains largely unexplored. In this paper, we evaluated the performance of 37 frequently used LiDAR metrics derived under leaf-on and leaf-off conditions, respectively, for discriminating six different tree species in a natural forest in Germany. We firstly assessed the correlation between these metrics. Then we applied a Random Forest algorithm to classify the tree species and evaluated the importance of the LiDAR metrics. Finally, we identified the most important LiDAR metrics and tested their robustness and transferability. Our results indicated that about 60% of LiDAR metrics were highly correlated to each other (|r| > 0.7). There was no statistically significant difference in tree species mapping accuracy between the use of leaf-on and leaf-off LiDAR metrics. However, combining leaf-on and leaf-off LiDAR metrics significantly increased the overall accuracy from 58.2% (leaf-on) and 62.0% (leaf-off) to 66.5% as well as the kappa coefficient from 0.47 (leaf-on) and 0.51 (leaf-off) to 0.58. Radiometric features, especially intensity related metrics, provided more consistent and significant contributions than geometric features for tree species discrimination. Specifically, the mean intensity of first-or-single returns as well as the mean value of echo width were identified as the most robust LiDAR metrics for tree species discrimination. These results indicate that metrics derived from airborne LiDAR data, especially radiometric metrics, can aid in discriminating tree species in a mixed temperate forest, and represent candidate metrics for tree species classification and monitoring in Central Europe.

  6. Identifying the critical success factors in the coverage of low vision services using the classification analysis and regression tree methodology.

    Science.gov (United States)

    Chiang, Peggy Pei-Chia; Xie, Jing; Keeffe, Jill Elizabeth

    2011-04-25

    To identify the critical success factors (CSF) associated with coverage of low vision services. Data were collected from a survey distributed to Vision 2020 contacts, government, and non-government organizations (NGOs) in 195 countries. The Classification and Regression Tree Analysis (CART) was used to identify the critical success factors of low vision service coverage. Independent variables were sourced from the survey: policies, epidemiology, provision of services, equipment and infrastructure, barriers to services, human resources, and monitoring and evaluation. Socioeconomic and demographic independent variables: health expenditure, population statistics, development status, and human resources in general, were sourced from the World Health Organization (WHO), World Bank, and the United Nations (UN). The findings identified that having >50% of children obtaining devices when prescribed (χ(2) = 44; P 3 rehabilitation workers per 10 million of population (χ(2) = 4.50; P = 0.034), higher percentage of population urbanized (χ(2) = 14.54; P = 0.002), a level of private investment (χ(2) = 14.55; P = 0.015), and being fully funded by government (χ(2) = 6.02; P = 0.014), are critical success factors associated with coverage of low vision services. This study identified the most important predictors for countries with better low vision coverage. The CART is a useful and suitable methodology in survey research and is a novel way to simplify a complex global public health issue in eye care.

  7. Can forest dieback and tree death be predicted by prior changes in wood anatomy?

    Science.gov (United States)

    Colangelo, Michele; Julio Camarero, Jesus; De Micco, Veronica; Borghetti, Marco; Gentilesca, Tiziana; Sanchez-Salguero, Raul; Ripullone, Francesco

    2017-04-01

    Climate warming is expected to amplify drought stress resulting in more intense and widespread dieback episodes and increasing mortality rates. Studies on quantitative wood anatomy and dendrochronology have demonstrated their potential to supply useful information on the causes of tree decline, although this approach is basically observational and retrospective. Moreover, the long-term reconstruction of wood anatomical features, strictly linked to the evolution of xylem anatomy plasticity through time, allow investigating hydraulic adjustments of trees. In this study, we analyzed wood-anatomical variables in two Italian oak forests where recent episodes of dieback and mortality have been reported. We analyzed in coexisting now-dead and living trees the following wood-anatomical variables: annual tree-ring area, earlywood (EW) and latewood (LW) areas, absolute and relative (%) areas occupied by vessels in the EW and LW, EW and LW vessel areas, EW and LW vessel density and vessel diameter classification. We also calculated the hydraulic diameter (Dh) for all vessels measured within each ring by weighting individual conduit diameters to correspond to the average Hagen-Poiseuille lumen theoretical hydraulic conductivity for a vessel size. Wood-anatomical analyses showed that declining and dead trees were more sensitive to drought stress compared to non declining trees, indicating different susceptibility to water shortage between trees. Dead trees did not form earlywood vessels with smaller lumen diameter than surviving trees but tended to form wider latewood vessels with a higher percentage of vessel area. We discuss the results and implications focusing on those proved more sensitive to the phenomena of decline and mortality.

  8. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  9. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    Science.gov (United States)

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. a Two-Step Classification Approach to Distinguishing Similar Objects in Mobile LIDAR Point Clouds

    Science.gov (United States)

    He, H.; Khoshelham, K.; Fraser, C.

    2017-09-01

    Nowadays, lidar is widely used in cultural heritage documentation, urban modeling, and driverless car technology for its fast and accurate 3D scanning ability. However, full exploitation of the potential of point cloud data for efficient and automatic object recognition remains elusive. Recently, feature-based methods have become very popular in object recognition on account of their good performance in capturing object details. Compared with global features describing the whole shape of the object, local features recording the fractional details are more discriminative and are applicable for object classes with considerable similarity. In this paper, we propose a two-step classification approach based on point feature histograms and the bag-of-features method for automatic recognition of similar objects in mobile lidar point clouds. Lamp post, street light and traffic sign are grouped as one category in the first-step classification for their inter similarity compared with tree and vehicle. A finer classification of the lamp post, street light and traffic sign based on the result of the first-step classification is implemented in the second step. The proposed two-step classification approach is shown to yield a considerable improvement over the conventional one-step classification approach.

  11. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study.

    Science.gov (United States)

    Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad

    2014-09-01

    The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  12. Classification of Learning Styles in Virtual Learning Environment Using J48 Decision Tree

    Science.gov (United States)

    Maaliw, Renato R. III; Ballera, Melvin A.

    2017-01-01

    The usage of data mining has dramatically increased over the past few years and the education sector is leveraging this field in order to analyze and gain intuitive knowledge in terms of the vast accumulated data within its confines. The primary objective of this study is to compare the results of different classification techniques such as Naïve…

  13. Molecular and physiological responses to abiotic stress in forest trees and their relevance to tree improvement.

    Science.gov (United States)

    Harfouche, Antoine; Meilan, Richard; Altman, Arie

    2014-11-01

    Abiotic stresses, such as drought, salinity and cold, are the major environmental stresses that adversely affect tree growth and, thus, forest productivity, and play a major role in determining the geographic distribution of tree species. Tree responses and tolerance to abiotic stress are complex biological processes that are best analyzed at a systems level using genetic, genomic, metabolomic and phenomic approaches. This will expedite the dissection of stress-sensing and signaling networks to further support efficient genetic improvement programs. Enormous genetic diversity for stress tolerance exists within some forest-tree species, and due to advances in sequencing technologies the molecular genetic basis for this diversity has been rapidly unfolding in recent years. In addition, the use of emerging phenotyping technologies extends the suite of traits that can be measured and will provide us with a better understanding of stress tolerance. The elucidation of abiotic stress-tolerance mechanisms will allow for effective pyramiding of multiple tolerances in a single tree through genetic engineering. Here we review recent progress in the dissection of the molecular basis of abiotic stress tolerance in forest trees, with special emphasis on Populus, Pinus, Picea, Eucalyptus and Quercus spp. We also outline practices that will enable the deployment of trees engineered for abiotic stress tolerance to land owners. Finally, recommendations for future work are discussed. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Evaluation and Comparison of QuickBird and ADS40-SH52 Multispectral Imagery for Mapping Iberian Wild Pear Trees (Pyrus bourgaeana, Decne in a Mediterranean Mixed Forest

    Directory of Open Access Journals (Sweden)

    Salvador Arenas-Castro

    2014-06-01

    Full Text Available The availability of images with very high spatial and spectral resolution from airborne sensors or those aboard satellites is opening new possibilities for the analysis of fine-scale vegetation, such as the identification and classification of individual tree species. To evaluate the potential of these images, a study was carried out to compare the spatial, spectral and temporal resolution between QuickBird and ADS40-SH52 imagery, in order to discriminate and identify, within the mixed Mediterranean forest, individuals of the Iberian wild pear (Pyrus bourgaeana. This is a typical species of the Mediterranean forest, but its biology and ecology are still poorly known. The images were subjected to different correction processes and data were homogenized. Vegetation classes and individual trees were identified on the images, which were classified from two types of supervised classification (Maximum Likelihood and Support Vector Machines on a pixel-by-pixel basis. The classification values were satisfactory. The classifiers were compared, and Support Vector Machines was the algorithm that provided the best results in terms of overall accuracy. The QuickBird image showed higher overall accuracy (86.16% when the Support Vector Machines algorithm was applied. In addition, individuals of Iberian wild pear were discriminated with probability of over 55%, when the Maximum Likelihood algorithm was applied. From the perspective of improving the sampling effort, these results are a starting point for facilitating research on the abundance, distribution and spatial structure of P. bourgaeana at different scales, in order to quantify the conservation status of this species.

  15. Effects of phosphorus addition on nitrogen cycle and fluxes of N2O and CH4 in tropical tree plantation soils in Thailand

    Directory of Open Access Journals (Sweden)

    Taiki Mori

    2017-04-01

    Full Text Available An incubation experiment was conducted to test the effects of phosphorus (P addition on nitrous oxide (N2O emissions and methane (CH4 uptakes, using tropical tree plantation soils in Thailand. Soil samples were taken from five forest stands—Acacia auriculiformis, Acacia mangium, Eucalyptus camaldulensis, Hopea odorata, and Xylia xylocarpa—and incubated at 80% water holding capacity. P addition stimulated N2O emissions only in Xylia xylocarpa soils. Since P addition tended to increase net ammonification rates in Xylia xylocarpa soils, the stimulated N2O emissions were suggested to be due to the stimulated nitrogen (N cycle by P addition and the higher N supply for nitrification and denitrification. In other soils, P addition had no effects on N2O emissions or soil N properties, except that P addition tended to increase the soil microbial biomass N in Acacia auriculiformis soils. No effects of P addition were observed on CH4 uptakes in any soil. It is suggested that P addition on N2O and CH4 fluxes at the study site were not significant, at least under laboratory conditions.

  16. TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.

    Science.gov (United States)

    Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald

    2018-01-01

    Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.

  17. Calcium addition at the Hubbard Brook Experimental Forest increases the capacity for stress tolerance and carbon capture in red spruce (Picea rubens) trees during the cold season

    Science.gov (United States)

    Paul G. Schaberg; Rakesh Minocha; Stephanie Long; Joshua M. Halman; Gary J. Hawley; Christopher. Eagar

    2011-01-01

    Red spruce (Picea rubens Sarg.) trees are uniquely vulnerable to foliar freezing injury during the cold season (fall and winter), but are also capable of photosynthetic activity if temperatures moderate. To evaluate the influence of calcium (Ca) addition on the physiology of red spruce during the cold season, we measured concentrations of foliar...

  18. Two Trees: Migrating Fault Trees to Decision Trees for Real Time Fault Detection on International Space Station

    Science.gov (United States)

    Lee, Charles; Alena, Richard L.; Robinson, Peter

    2004-01-01

    We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.

  19. Hanford double shell tank corrosion monitoring instrument trees

    International Nuclear Information System (INIS)

    Nelson, J.L.

    1995-03-01

    High-level nuclear wastes at the Hanford site are stored underground in carbon steel double-shell and single-shell tanks - (DSTs and SSTS). Westinghouse Hanford Company is considering installation of a prototype corrosion monitoring instrument tree in at least one DST in the summer of 1995. The instrument tree will have the ability to detect and discriminate between uniform corrosion, stress corrosion cracking (SCC), and pitting. Additional instrument trees will follow in later years. Proof-of-technology testing is currently underway for the use of commercially available electric field pattern (EFP) analysis and electrochemical noise (EN) corrosion monitoring equipment. Creative use and combinations of other existing technologies is also being considered. Successful demonstration of these technologies will be followed by the development of a Hanford specific instrument tree. The first instrument tree will incorporate one of these technologies. Subsequent trees may include both technologies, as well as a more standard assembly of corrosion coupons. Successful development of these trees will allow their application to single shell tanks and the transfer of technology to other U.S. Department of Energy (DOE) sites

  20. Fault tree analysis of a research reactor

    International Nuclear Information System (INIS)

    Hall, J.A.; O'Dacre, D.F.; Chenier, R.J.; Arbique, G.M.

    1986-08-01

    Fault Tree Analysis Techniques have been used to assess the safety system of the ZED-2 Research Reactor at the Chalk River Nuclear Laboratories. This turned out to be a strong test of the techniques involved. The resulting fault tree was large and because of inter-links in the system structure the tree was not modularized. In addition, comprehensive documentation was required. After a brief overview of the reactor and the analysis, this paper concentrates on the computer tools that made the job work. Two types of tools were needed; text editing and forms management capability for large volumes of component and system data, and the fault tree codes themselves. The solutions (and failures) are discussed along with the tools we are already developing for the next analysis