Classification and regression trees
Breiman, Leo; Olshen, Richard A; Stone, Charles J
1984-01-01
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Directory of Open Access Journals (Sweden)
Yoonseok Shin
2015-01-01
Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
Dynamic travel time estimation using regression trees.
2008-10-01
This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Algorithms for Decision Tree Construction
Chikalov, Igor
2011-01-01
The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28] showed that such algorithm may construct decision trees whose average depth is arbitrarily far from the minimum. Hyafil and Rivest in [35] proved NP-hardness of DT problem that is constructing a tree with the minimum average depth for a diagnostic problem over 2-valued information system and uniform probability distribution. Cox et al. in [22] showed that for a two-class problem over information system, even finding the root node attribute for an optimal tree is an NP-hard problem. © Springer-Verlag Berlin Heidelberg 2011.
Constructing Student Problems in Phylogenetic Tree Construction.
Brewer, Steven D.
Evolution is often equated with natural selection and is taught from a primarily functional perspective while comparative and historical approaches, which are critical for developing an appreciation of the power of evolutionary theory, are often neglected. This report describes a study of expert problem-solving in phylogenetic tree construction.…
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Directory of Open Access Journals (Sweden)
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
Algorithms for Decision Tree Construction
Chikalov, Igor
2011-01-01
The study of algorithms for decision tree construction was initiated in 1960s. The first algorithms are based on the separation heuristic [13, 31] that at each step tries dividing the set of objects as evenly as possible. Later Garey and Graham [28
Short-term load forecasting with increment regression tree
Energy Technology Data Exchange (ETDEWEB)
Yang, Jingfei; Stenzel, Juergen [Darmstadt University of Techonology, Darmstadt 64283 (Germany)
2006-06-15
This paper presents a new regression tree method for short-term load forecasting. Both increment and non-increment tree are built according to the historical data to provide the data space partition and input variable selection. Support vector machine is employed to the samples of regression tree nodes for further fine regression. Results of different tree nodes are integrated through weighted average method to obtain the comprehensive forecasting result. The effectiveness of the proposed method is demonstrated through its application to an actual system. (author)
Computer aided construction of fault tree
International Nuclear Information System (INIS)
Kovacs, Z.
1982-01-01
Computer code CAT for the automatic construction of the fault tree is briefly described. Code CAT makes possible simple modelling of components using decision tables, it accelerates the fault tree construction process, constructs fault trees of different complexity, and is capable of harmonized co-operation with programs PREPandKITT 1,2 for fault tree analysis. The efficiency of program CAT and thus the accuracy and completeness of fault trees constructed significantly depends on the compilation and sophistication of decision tables. Currently, program CAT is used in co-operation with programs PREPandKITT 1,2 in reliability analyses of nuclear power plant systems. (B.S.)
Directory of Open Access Journals (Sweden)
Stefanie M. Herrmann
2013-10-01
Full Text Available Field trees are an integral part of the farmed parkland landscape in West Africa and provide multiple benefits to the local environment and livelihoods. While field trees have received increasing interest in the context of strengthening resilience to climate variability and change, the actual extent of farmed parkland and spatial patterns of tree cover are largely unknown. We used the rule-based predictive modeling tool Cubist® to estimate field tree cover in the west-central agricultural region of Senegal. A collection of rules and associated multiple linear regression models was constructed from (1 a reference dataset of percent tree cover derived from very high spatial resolution data (2 m Orbview as the dependent variable, and (2 ten years of 10-day 250 m Moderate Resolution Imaging Spectrometer (MODIS Normalized Difference Vegetation Index (NDVI composites and derived phenological metrics as independent variables. Correlation coefficients between modeled and reference percent tree cover of 0.88 and 0.77 were achieved for training and validation data respectively, with absolute mean errors of 1.07 and 1.03 percent tree cover. The resulting map shows a west-east gradient from high tree cover in the peri-urban areas of horticulture and arboriculture to low tree cover in the more sparsely populated eastern part of the study area. A comparison of current (2000s tree cover along this gradient with historic cover as seen on Corona images reveals dynamics of change but also areas of remarkable stability of field tree cover since 1968. The proposed modeling approach can help to identify locations of high and low tree cover in dryland environments and guide ground studies and management interventions aimed at promoting the integration of field trees in agricultural systems.
Directory of Open Access Journals (Sweden)
Suduan Chen
2014-01-01
Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Constructing phylogenetic trees using interacting pathways.
Wan, Peng; Che, Dongsheng
2013-01-01
Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms-eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.
International Nuclear Information System (INIS)
Janssen, I.; Stebbings, J.H.
1990-01-01
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
An Algorithm for Fault-Tree Construction
DEFF Research Database (Denmark)
Taylor, J. R.
1982-01-01
An algorithm for performing certain parts of the fault tree construction process is described. Its input is a flow sheet of the plant, a piping and instrumentation diagram, or a wiring diagram of the circuits, to be analysed, together with a standard library of component functional and failure...
Speeding Up Neighbour-Joining Tree Construction
DEFF Research Database (Denmark)
Brodal, Gerth Stølting; Fagerberg, Rolf; Mailund, Thomas
A widely used method for constructing phylogenetic trees is the neighbour-joining method of Saitou and Nei. We develope heuristics for speeding up the neighbour-joining method which generate the same phylogenetic trees as the original method. All heuristics are based on using a quad-tree to guide...... the search for the next pair of nodes to join, but di#er in the information stored in quad-tree nodes, the way the search is performed, and in the way the quad-tree is updated after a join. We empirically evaluate the performance of the heuristics on distance matrices obtained from the Pfam collection...... of alignments, and compare the running time with that of the QuickTree tool, a well-known and widely used implementation of the standard neighbour-joining method. The results show that the presented heuristics can give a significant speed-up over the standard neighbour-joining method, already for medium sized...
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
An improved spatial contour tree constructed method
Zheng, Yi; Zhang, Ling; Guilbert, Eric; Long, Yi
2018-05-01
Contours are important data to delineate the landform on a map. A contour tree provides an object-oriented description of landforms and can be used to enrich the topological information. The traditional contour tree is used to store topological relationships between contours in a hierarchical structure and allows for the identification of eminences and depressions as sets of nested contours. This research proposes an improved contour tree so-called spatial contour tree that contains not only the topological but also the geometric information. It can be regarded as a terrain skeleton in 3-dimention, and it is established based on the spatial nodes of contours which have the latitude, longitude and elevation information. The spatial contour tree is built by connecting spatial nodes from low to high elevation for a positive landform, and from high to low elevation for a negative landform to form a hierarchical structure. The connection between two spatial nodes can provide the real distance and direction as a Euclidean vector in 3-dimention. In this paper, the construction method is tested in the experiment, and the results are discussed. The proposed hierarchical structure is in 3-demintion and can show the skeleton inside a terrain. The structure, where all nodes have geo-information, can be used to distinguish different landforms and applied for contour generalization with consideration of geographic characteristics.
Sparse suffix tree construction in small space
DEFF Research Database (Denmark)
Bille, Philip; Fischer, Johannes; Gørtz, Inge Li
2013-01-01
the correct tree with high probability. We then give a Las-Vegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O(√n). Furthermore, additional tradeoffs between the space usage and the construction time for the Monte-Carlo algorithm are given......., which may be of independent interest, that allows to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte-Carlo and outputs...
Iron Supplementation and Altitude: Decision Making Using a Regression Tree
Directory of Open Access Journals (Sweden)
Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore
2016-03-01
Full Text Available Altitude exposure increases the body’s need for iron (Gassmann and Muckenthaler, 2015, primarily to support accelerated erythropoiesis, yet clear supplementation guidelines do not exist. Athletes are typically recommended to ingest a daily oral iron supplement to facilitate altitude adaptations, and to help maintain iron balance. However, there is some debate as to whether athletes with otherwise healthy iron stores should be supplemented, due in part to concerns of iron overload. Excess iron in vital organs is associated with an increased risk of a number of conditions including cancer, liver disease and heart failure. Therefore clear guidelines are warranted and athletes should be discouraged from ‘self-prescribing” supplementation without medical advice. In the absence of prospective-controlled studies, decision tree analysis can be used to describe a data set, with the resultant regression tree serving as guide for clinical decision making. Here, we present a regression tree in the context of iron supplementation during altitude exposure, to examine the association between pre-altitude ferritin (Ferritin-Pre and the haemoglobin mass (Hbmass response, based on daily iron supplement dose. De-identified ferritin and Hbmass data from 178 athletes engaged in altitude training were extracted from the Australian Institute of Sport (AIS database. Altitude exposure was predominantly achieved via normobaric Live high: Train low (n = 147 at a simulated altitude of 3000 m for 2 to 4 weeks. The remaining athletes engaged in natural altitude training at venues ranging from 1350 to 2800 m for 3-4 weeks. Thus, the “hypoxic dose” ranged from ~890 km.h to ~1400 km.h. Ethical approval was granted by the AIS Human Ethics Committee, and athletes provided written informed consent. An in depth description and traditional analysis of the complete data set is presented elsewhere (Govus et al., 2015. Iron supplementation was prescribed by a sports physician
Constructal tree-shaped flow structures
International Nuclear Information System (INIS)
Bejan, A.; Lorente, S.
2007-01-01
This paper is an introduction to a new trend in the conceptual design of energy systems: the generation of flow configuration based on the 'constructal' principle that the global performance is maximized by balancing and arranging the various flow resistances (the irreversibilities) in a flow system that is free to morph. The paper focuses on distribution and collection, which are flows that connect one point (source, or sink) with an infinity of points (volume, area, curve). The flow configurations that emerge from this principle are tree-shaped, and the systems that employ them are 'vascularized'. The paper traces the most recent progress made on constructal vascularization. The direction is from large-scale applications toward microscales. The large-scale tree-shaped designs of electric power distribution systems and networks for natural gas and water are now invading small-scale designs such as fuel cells, heat exchangers and cooled packages of electronics. These flow configurations have several properties in common: freedom to morph, multiple scales, hierarchy, nonuniform (optimal) distribution of scales through the available volume, compactness and finite complexity
Constructing Dynamic Event Trees from Markov Models
International Nuclear Information System (INIS)
Paolo Bucci; Jason Kirschenbaum; Tunc Aldemir; Curtis Smith; Ted Wood
2006-01-01
In the probabilistic risk assessment (PRA) of process plants, Markov models can be used to model accurately the complex dynamic interactions between plant physical process variables (e.g., temperature, pressure, etc.) and the instrumentation and control system that monitors and manages the process. One limitation of this approach that has prevented its use in nuclear power plant PRAs is the difficulty of integrating the results of a Markov analysis into an existing PRA. In this paper, we explore a new approach to the generation of failure scenarios and their compilation into dynamic event trees from a Markov model of the system. These event trees can be integrated into an existing PRA using software tools such as SAPHIRE. To implement our approach, we first construct a discrete-time Markov chain modeling the system of interest by: (a) partitioning the process variable state space into magnitude intervals (cells), (b) using analytical equations or a system simulator to determine the transition probabilities between the cells through the cell-to-cell mapping technique, and, (c) using given failure/repair data for all the components of interest. The Markov transition matrix thus generated can be thought of as a process model describing the stochastic dynamic behavior of the finite-state system. We can therefore search the state space starting from a set of initial states to explore all possible paths to failure (scenarios) with associated probabilities. We can also construct event trees of arbitrary depth by tracing paths from a chosen initiating event and recording the following events while keeping track of the probabilities associated with each branch in the tree. As an example of our approach, we use the simple level control system often used as benchmark in the literature with one process variable (liquid level in a tank), and three control units: a drain unit and two supply units. Each unit includes a separate level sensor to observe the liquid level in the tank
Undergraduate Students’ Difficulties in Reading and Constructing Phylogenetic Tree
Sa'adah, S.; Tapilouw, F. S.; Hidayat, T.
2017-02-01
Representation is a very important communication tool to communicate scientific concepts. Biologists produce phylogenetic representation to express their understanding of evolutionary relationships. The phylogenetic tree is visual representation depict a hypothesis about the evolutionary relationship and widely used in the biological sciences. Phylogenetic tree currently growing for many disciplines in biology. Consequently, learning about phylogenetic tree become an important part of biological education and an interesting area for biology education research. However, research showed many students often struggle with interpreting the information that phylogenetic trees depict. The purpose of this study was to investigate undergraduate students’ difficulties in reading and constructing a phylogenetic tree. The method of this study is a descriptive method. In this study, we used questionnaires, interviews, multiple choice and open-ended questions, reflective journals and observations. The findings showed students experiencing difficulties, especially in constructing a phylogenetic tree. The students’ responds indicated that main reasons for difficulties in constructing a phylogenetic tree are difficult to placing taxa in a phylogenetic tree based on the data provided so that the phylogenetic tree constructed does not describe the actual evolutionary relationship (incorrect relatedness). Students also have difficulties in determining the sister group, character synapomorphy, autapomorphy from data provided (character table) and comparing among phylogenetic tree. According to them building the phylogenetic tree is more difficult than reading the phylogenetic tree. Finding this studies provide information to undergraduate instructor and students to overcome learning difficulties of reading and constructing phylogenetic tree.
Mitered fractal trees: constructions and properties
Verhoeff, T.; Verhoeff, K.; Bosch, R.; McKenna, D.; Sarhangi, R.
2012-01-01
Tree-like structures, that is, branching structures without cycles, are attractive for artful expression. Especially interesting are fractal trees, where each subtree is a scaled and possibly otherwise transformed version of the entire tree. Such trees can be rendered in 3D by using beams with a
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Computer aided fault tree construction for electrical systems
International Nuclear Information System (INIS)
Fussell, J.B.
1975-01-01
A technique is presented for automated construction of the Boolean failure logic diagram, called the fault tree, for electrical systems. The method is a technique for synthesizing a fault tree from system-independent component characteristics. Terminology is defined and heuristic examples are given for all phases of the model. The computer constructed fault trees are in conventional format, use conventional symbols, and are deductively constructed from the main failure of interest to the individual component failures. The synthesis technique is generally applicable to automated fault tree construction for other types of systems
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
On Construction of Quantum Markov Chains on Cayley trees
International Nuclear Information System (INIS)
Accardi, Luigi; Mukhamedov, Farrukh; Souissi, Abdessatar
2016-01-01
The main aim of the present paper is to provide a new construction of quantum Markov chain (QMC) on arbitrary order Cayley tree. In that construction, a QMC is defined as a weak limit of finite volume states with boundary conditions, i.e. QMC depends on the boundary conditions. Note that this construction reminds statistical mechanics models with competing interactions on trees. If one considers one dimensional tree, then the provided construction reduces to well-known one, which was studied by the first author. Our construction will allow to investigate phase transition problem in a quantum setting. (paper)
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Software development to assist in fault tree construction
International Nuclear Information System (INIS)
Simic, Z.; Mikulicic, V.
1992-01-01
This paper reviews and classifies fault tree construction methods developed for system safety and reliability. We have outlined two generally different approaches: automatic and interactive fault tree construction. Automatic fault tree approach is no jet enough developed to covering various uses in practice. Interactive approach is intending to be support to the analyst (not vice verse like in automatic approach). The aim is not so high as automatic one but it is accessible. We have favored interactive approach as well because to our opinion the process of fault tree construction is very important for better system understanding. We have described our example of interactive fault tree construction approach. Computer code GIFFT (Graphical Interactive Fault Tree Tool) is in phase of intensive testing and final developing. (author) [hr
Energy Technology Data Exchange (ETDEWEB)
Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)
2011-10-17
Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.
WDM Multicast Tree Construction Algorithms and Their Comparative Evaluations
Makabe, Tsutomu; Mikoshi, Taiju; Takenaka, Toyofumi
We propose novel tree construction algorithms for multicast communication in photonic networks. Since multicast communications consume many more link resources than unicast communications, effective algorithms for route selection and wavelength assignment are required. We propose a novel tree construction algorithm, called the Weighted Steiner Tree (WST) algorithm and a variation of the WST algorithm, called the Composite Weighted Steiner Tree (CWST) algorithm. Because these algorithms are based on the Steiner Tree algorithm, link resources among source and destination pairs tend to be commonly used and link utilization ratios are improved. Because of this, these algorithms can accept many more multicast requests than other multicast tree construction algorithms based on the Dijkstra algorithm. However, under certain delay constraints, the blocking characteristics of the proposed Weighted Steiner Tree algorithm deteriorate since some light paths between source and destinations use many hops and cannot satisfy the delay constraint. In order to adapt the approach to the delay-sensitive environments, we have devised the Composite Weighted Steiner Tree algorithm comprising the Weighted Steiner Tree algorithm and the Dijkstra algorithm for use in a delay constrained environment such as an IPTV application. In this paper, we also give the results of simulation experiments which demonstrate the superiority of the proposed Composite Weighted Steiner Tree algorithm compared with the Distributed Minimum Hop Tree (DMHT) algorithm, from the viewpoint of the light-tree request blocking.
Logistic Regression in the Identification of Hazards in Construction
Drozd, Wojciech
2017-10-01
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
Regression Nodes: Extending attack trees with data from social sciences
Bullee, Jan-Willem; Montoya, L.; Pieters, Wolter; Junger, Marianne; Hartel, Pieter H.
In the field of security, attack trees are often used to assess security vulnerabilities probabilistically in relation to multi-step attacks. The nodes are usually connected via AND-gates, where all children must be executed, or via OR-gates, where only one action is necessary for the attack step to
The process and utility of classification and regression tree methodology in nursing research.
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-06-01
This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Regression tree analysis for predicting body weight of Nigerian Muscovy duck (Cairina moschata
Directory of Open Access Journals (Sweden)
Oguntunji Abel Olusegun
2017-01-01
Full Text Available Morphometric parameters and their indices are central to the understanding of the type and function of livestock. The present study was conducted to predict body weight (BWT of adult Nigerian Muscovy ducks from nine (9 morphometric parameters and seven (7 body indices and also to identify the most important predictor of BWT among them using regression tree analysis (RTA. The experimental birds comprised of 1,020 adult male and female Nigerian Muscovy ducks randomly sampled in Rain Forest (203, Guinea Savanna (298 and Derived Savanna (519 agro-ecological zones. Result of RTA revealed that compactness; body girth and massiveness were the most important independent variables in predicting BWT and were used in constructing RT. The combined effect of the three predictors was very high and explained 91.00% of the observed variation of the target variable (BWT. The optimal regression tree suggested that Muscovy ducks with compactness >5.765 would be fleshy and have highest BWT. The result of the present study could be exploited by animal breeders and breeding companies in selection and improvement of BWT of Muscovy ducks.
Scenario evolution: Interaction between event tree construction and numerical analyses
International Nuclear Information System (INIS)
Barr, G.E.; Barnard, R.W.; Dockery, H.A.; Dunn, E.; MacIntyre, A.T.
1990-01-01
Construction of well-posed scenarios for the range of conditions possible at any proposed repository site is a critical first step to assessing total system performance. Event tree construction is the method that is being used to develop potential failure scenarios for the proposed nuclear waste repository at Yucca Mountain. An event tree begins with an initial event or condition. Subsequent events are listed in a sequence, leading eventually to release of radionuclides to the accessible environment. Ensuring the validity of the scenarios requires iteration between problems constructed using scenarios contained in the event tree sequence, experimental results, and numerical analyses. Details not adequately captured within the tree initially may become more apparent as a result of analyses. To illustrate this process, the authors discuss the iterations used to develop numerical analyses for PACE-90 (Performance Assessment Calculational Exercises) using basaltic igneous activity and human-intrusion event trees
Scenario evolution: Interaction between event tree construction and numerical analyses
International Nuclear Information System (INIS)
Barr, G.E.; Barnard, R.W.; Dockery, H.A.; Dunn, E.; MacIntyre, A.T.
1991-01-01
Construction of well-posed scenarios for the range of conditions possible at any proposed repository site is a critical first step to assessing total system performance. Even tree construction is the method that is being used to develop potential failure scenarios for the proposed nuclear waste repository at Yucca Mountain. An event tree begins with an initial event or condition. Subsequent events are listed in a sequence, leading eventually to release of radionuclides to the accessible environment. Ensuring the validity of the scenarios requires iteration between problems constructed using scenarios contained in the event tree sequence, experimental results, and numerical analyses. Details not adequately captured within the tree initially may become more apparent as a result of analyses. To illustrate this process, we discuss the iterations used to develop numerical analyses for PACE-90 using basaltic igneous activity and human-intrusion event trees
Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett
2009-01-01
Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Directory of Open Access Journals (Sweden)
Kritski Afrânio
2006-02-01
Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Algorithms and programs for consequence diagram and fault tree construction
International Nuclear Information System (INIS)
Hollo, E.; Taylor, J.R.
1976-12-01
A presentation of algorithms and programs for consequence diagram and sequential fault tree construction that are intended for reliability and disturbance analysis of large systems. The system to be analyzed must be given as a block diagram formed by mini fault trees of individual system components. The programs were written in LISP programming language and run on a PDP8 computer with 8k words of storage. A description is given of the methods used and of the program construction and working. (author)
Efficient Representation for Online Suffix Tree Construction
DEFF Research Database (Denmark)
Larsson, N. Jesper; Fuglsang, Kasper; Karlsson, Kenneth
2014-01-01
of branch lookup operations (known to be a bottleneck in construction time) with some additional techniques to reduce construction cost. We discuss various effects of our approach and compare it to previous techniques. An experimental evaluation shows that we are able to reduce construction time to around...
Algorithmic fault tree construction by component-based system modeling
International Nuclear Information System (INIS)
Majdara, Aref; Wakabayashi, Toshio
2008-01-01
Computer-aided fault tree generation can be easier, faster and less vulnerable to errors than the conventional manual fault tree construction. In this paper, a new approach for algorithmic fault tree generation is presented. The method mainly consists of a component-based system modeling procedure an a trace-back algorithm for fault tree synthesis. Components, as the building blocks of systems, are modeled using function tables and state transition tables. The proposed method can be used for a wide range of systems with various kinds of components, if an inclusive component database is developed. (author)
Comparison of greedy algorithms for α-decision tree construction
Alkhalid, Abdulaziz; Chikalov, Igor; Moshkov, Mikhail
2011-01-01
A comparison among different heuristics that are used by greedy algorithms which constructs approximate decision trees (α-decision trees) is presented. The comparison is conducted using decision tables based on 24 data sets from UCI Machine Learning Repository [2]. Complexity of decision trees is estimated relative to several cost functions: depth, average depth, number of nodes, number of nonterminal nodes, and number of terminal nodes. Costs of trees built by greedy algorithms are compared with minimum costs calculated by an algorithm based on dynamic programming. The results of experiments assign to each cost function a set of potentially good heuristics that minimize it. © 2011 Springer-Verlag.
AFTC Code for Automatic Fault Tree Construction: Users Manual
International Nuclear Information System (INIS)
Gopika Vinod; Saraf, R.K.; Babar, A.K.
1999-04-01
Fault Trees perform a predominant role in reliability and safety analysis of system. Manual construction of fault tree is a very time consuming task and moreover, it won't give a formalized result, since it relies highly on analysts experience and heuristics. This necessitates a computerised fault tree construction, which is still attracting interest of reliability analysts. AFTC software is a user friendly software model for constructing fault trees based on decision tables. Software is equipped with libraries of decision tables for components commonly used in various Nuclear Power Plant (NPP) systems. User is expected to make a nodal diagram of the system, for which fault tree is to be constructed, from the flow sheets available. The text nodal diagram goes as the sole input defining the system flow chart. AFTC software is a rule based expert system which draws the fault tree from the system flow chart and component decision tables. AFTC software gives fault tree in both text and graphic format. Help is provided as how to enter system flow chart and component decision tables. The software is developed in 'C' language. Software is verified with simplified version of the fire water system of an Indian PHWR. Code conversion will be undertaken to create a window based version. (author)
Groundwater level prediction of landslide based on classification and regression tree
Directory of Open Access Journals (Sweden)
Yannan Zhao
2016-09-01
Full Text Available According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree (CART model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15% respectively. To compare the support vector machine (SVM model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
Huang, C.; Townshend, J.R.G.
2003-01-01
A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Integrating classification trees with local logistic regression in Intensive Care prognosis.
Abu-Hanna, Ameen; de Keizer, Nicolette
2003-01-01
Health care effectiveness and efficiency are under constant scrutiny especially when treatment is quite costly as in the Intensive Care (IC). Currently there are various international quality of care programs for the evaluation of IC. At the heart of such quality of care programs lie prognostic models whose prediction of patient mortality can be used as a norm to which actual mortality is compared. The current generation of prognostic models in IC are statistical parametric models based on logistic regression. Given a description of a patient at admission, these models predict the probability of his or her survival. Typically, this patient description relies on an aggregate variable, called a score, that quantifies the severity of illness of the patient. The use of a parametric model and an aggregate score form adequate means to develop models when data is relatively scarce but it introduces the risk of bias. This paper motivates and suggests a method for studying and improving the performance behavior of current state-of-the-art IC prognostic models. Our method is based on machine learning and statistical ideas and relies on exploiting information that underlies a score variable. In particular, this underlying information is used to construct a classification tree whose nodes denote patient sub-populations. For these sub-populations, local models, most notably logistic regression ones, are developed using only the total score variable. We compare the performance of this hybrid model to that of a traditional global logistic regression model. We show that the hybrid model not only provides more insight into the data but also has a better performance. We pay special attention to the precision aspect of model performance and argue why precision is more important than discrimination ability.
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
A new algorithm to construct phylogenetic networks from trees.
Wang, J
2014-03-06
Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn
2017-11-15
A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.
Computer-oriented approach to fault-tree construction
International Nuclear Information System (INIS)
Salem, S.L.; Apostolakis, G.E.; Okrent, D.
1976-11-01
A methodology for systematically constructing fault trees for general complex systems is developed and applied, via the Computer Automated Tree (CAT) program, to several systems. A means of representing component behavior by decision tables is presented. The method developed allows the modeling of components with various combinations of electrical, fluid and mechanical inputs and outputs. Each component can have multiple internal failure mechanisms which combine with the states of the inputs to produce the appropriate output states. The generality of this approach allows not only the modeling of hardware, but human actions and interactions as well. A procedure for constructing and editing fault trees, either manually or by computer, is described. The techniques employed result in a complete fault tree, in standard form, suitable for analysis by current computer codes. Methods of describing the system, defining boundary conditions and specifying complex TOP events are developed in order to set up the initial configuration for which the fault tree is to be constructed. The approach used allows rapid modifications of the decision tables and systems to facilitate the analysis and comparison of various refinements and changes in the system configuration and component modeling
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.
2016-01-01
In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…
Preface to Berk's "Regression Analysis: A Constructive Critique"
de Leeuw, Jan
2003-01-01
It is pleasure to write a preface for the book ”Regression Analysis” of my fellow series editor Dick Berk. And it is a pleasure in particular because the book is about regression analysis, the most popular and the most fundamental technique in applied statistics. And because it is critical of the way regression analysis is used in the sciences, in particular in the social and behavioral sciences. Although the book can be read as an introduction to regression analysis, it can also be read as a...
The Complexity of Constructing Evolutionary Trees Using Experiments
DEFF Research Database (Denmark)
Brodal, Gerth Stølting; Fagerberg, Rolf; Pedersen, Christian Nørgaard Storm
2001-01-01
We present tight upper and lower bounds for the problem of constructing evolutionary trees in the experiment model. We describe an algorithm which constructs an evolutionary tree of n species in time O(nd logd n) using at most n⌈d/2⌉(log2⌈d/2⌉-1 n+O(1)) experiments for d > 2, and at most n(log n......+O(1)) experiments for d = 2, where d is the degree of the tree. This improves the previous best upper bound by a factor θ(log d). For d = 2 the previously best algorithm with running time O(n log n) had a bound of 4n log n on the number of experiments. By an explicit adversary argument, we show an Ω......(nd logd n) lower bound, matching our upper bounds and improving the previous best lower bound by a factor θ(logd n). Central to our algorithm is the construction and maintenance of separator trees of small height, which may be of independent interest....
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees
Directory of Open Access Journals (Sweden)
Chen Xiaoyu
2007-12-01
Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
International Nuclear Information System (INIS)
Rosli, A D; Baharudin, R; Hashim, H; Khairuzzaman, N A; Mohd Sampian, A F; Abdullah, N E; Kamaru'zzaman, M; Sulaiman, M S
2015-01-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water. (paper)
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Ebell, Mark H; Afonso, Anna M; Geocadin, Romergryko G
2013-12-01
To predict the likelihood that an inpatient who experiences cardiopulmonary arrest and undergoes cardiopulmonary resuscitation survives to discharge with good neurologic function or with mild deficits (Cerebral Performance Category score = 1). Classification and Regression Trees were used to develop branching algorithms that optimize the ability of a series of tests to correctly classify patients into two or more groups. Data from 2007 to 2008 (n = 38,092) were used to develop candidate Classification and Regression Trees models to predict the outcome of inpatient cardiopulmonary resuscitation episodes and data from 2009 (n = 14,435) to evaluate the accuracy of the models and judge the degree of over fitting. Both supervised and unsupervised approaches to model development were used. 366 hospitals participating in the Get With the Guidelines-Resuscitation registry. Adult inpatients experiencing an index episode of cardiopulmonary arrest and undergoing cardiopulmonary resuscitation in the hospital. The five candidate models had between 8 and 21 nodes and an area under the receiver operating characteristic curve from 0.718 to 0.766 in the derivation group and from 0.683 to 0.746 in the validation group. One of the supervised models had 14 nodes and classified 27.9% of patients as very unlikely to survive neurologically intact or with mild deficits (Tree models that predict survival to discharge with good neurologic function or with mild deficits following in-hospital cardiopulmonary arrest. Models like this can assist physicians and patients who are considering do-not-resuscitate orders.
Directory of Open Access Journals (Sweden)
Shokouh Taghipour Zahir
2013-01-01
Full Text Available Purpose. We sought to investigate the utility of classification and regression trees (CART classifier to differentiate benign from malignant nodules in patients referred for thyroid surgery. Methods. Clinical and demographic data of 271 patients referred to the Sadoughi Hospital during 2006–2011 were collected. In a two-step approach, a CART classifier was employed to differentiate patients with a high versus low risk of thyroid malignancy. The first step served as the screening procedure and was tailored to produce as few false negatives as possible. The second step identified those with the lowest risk of malignancy, chosen from a high risk population. Sensitivity, specificity, positive and negative predictive values (PPV and NPV of the optimal tree were calculated. Results. In the first step, age, sex, and nodule size contributed to the optimal tree. Ultrasonographic features were employed in the second step with hypoechogenicity and/or microcalcifications yielding the highest discriminatory ability. The combined tree produced a sensitivity and specificity of 80.0% (95% CI: 29.9–98.9 and 94.1% (95% CI: 78.9–99.0, respectively. NPV and PPV were 66.7% (41.1–85.6 and 97.0% (82.5–99.8, respectively. Conclusion. CART classifier reliably identifies patients with a low risk of malignancy who can avoid unnecessary surgery.
Load Balancing Issues with Constructing Phylogenetic Trees using Neighbour-Joining Algorithm
International Nuclear Information System (INIS)
Al Mamun, S M
2012-01-01
Phylogenetic tree construction is one of the most important and interesting problems in bioinformatics. Constructing an efficient phylogenetic tree has always been a research issue. It needs to consider both the correctness and the speed of the tree construction. In this paper, we implemented the neighbour-joining algorithm, using Message Passing Interface (MPI) for constructing the phylogenetic tree. Performance is efficacious, comparing to the best sequential algorithm. From this paper, it would be clear to the researchers that how load balance can make a great effect for constructing phylogenetic trees using neighbour-joining algorithm.
The Roots of Inequality: Estimating Inequality of Opportunity from Regression Trees
DEFF Research Database (Denmark)
Brunori, Paolo; Hufe, Paul; Mahler, Daniel Gerszon
2017-01-01
the risk of arbitrary and ad-hoc model selection. Second, they provide a standardized way of trading off upward and downward biases in inequality of opportunity estimations. Finally, regression trees can be graphically represented; their structure is immediate to read and easy to understand. This will make...... the measurement of inequality of opportunity more easily comprehensible to a large audience. These advantages are illustrated by an empirical application based on the 2011 wave of the European Union Statistics on Income and Living Conditions....
Phylogenetic tree construction using trinucleotide usage profile (TUP).
Chen, Si; Deng, Lih-Yuan; Bowman, Dale; Shiau, Jyh-Jen Horng; Wong, Tit-Yee; Madahian, Behrouz; Lu, Henry Horng-Shing
2016-10-06
It has been a challenging task to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences. The most popular method, called feature frequency profile (FFP-k), finds the frequency distribution for all words of certain length k over the whole genome sequence using (overlapping) windows of the same length. For a satisfactory result, the recommended word length (k) ranges from 6 to 15 and it may not be a multiple of 3 (codon length). The total number of possible words needed for FFP-k can range from 4 6 =4096 to 4 15 . We propose a simple improvement over the popular FFP method using only a typical word length of 3. A new method, called Trinucleotide Usage Profile (TUP), is proposed based only on the (relative) frequency distribution using non-overlapping windows of length 3. The total number of possible words needed for TUP is 4 3 =64, which is much less than the total count for the recommended optimal "resolution" for FFP. To build a phylogenetic tree, we propose first representing each of the species by a TUP vector and then using an appropriate distance measure between pairs of the TUP vectors for the tree construction. In particular, we propose summarizing a DNA sequence by a matrix of three rows corresponding to three reading frames, recording the frequency distribution of the non-overlapping words of length 3 in each of the reading frame. We also provide a numerical measure for comparing trees constructed with various methods. Compared to the FFP method, our empirical study showed that the proposed TUP method is more capable of building phylogenetic trees with a stronger biological support. We further provide some justifications on this from the information theory viewpoint. Unlike the FFP method, the TUP method takes the advantage that the starting of the first reading frame is (usually) known. Without this information, the FFP method could only rely on the frequency distribution of
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
van Veen, S H C M; van Kleef, R C; van de Ven, W P M M; van Vliet, R C J A
2018-02-01
This study explores the predictive power of interaction terms between the risk adjusters in the Dutch risk equalization (RE) model of 2014. Due to the sophistication of this RE-model and the complexity of the associations in the dataset (N = ~16.7 million), there are theoretically more than a million interaction terms. We used regression tree modelling, which has been applied rarely within the field of RE, to identify interaction terms that statistically significantly explain variation in observed expenses that is not already explained by the risk adjusters in this RE-model. The interaction terms identified were used as additional risk adjusters in the RE-model. We found evidence that interaction terms can improve the prediction of expenses overall and for specific groups in the population. However, the prediction of expenses for some other selective groups may deteriorate. Thus, interactions can reduce financial incentives for risk selection for some groups but may increase them for others. Furthermore, because regression trees are not robust, additional criteria are needed to decide which interaction terms should be used in practice. These criteria could be the right incentive structure for risk selection and efficiency or the opinion of medical experts. Copyright © 2017 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
M. Saki
2013-03-01
Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.
Lazaridis, D.C.; Verbesselt, J.; Robinson, A.P.
2011-01-01
Constructing models can be complicated when the available fitting data are highly correlated and of high dimension. However, the complications depend on whether the goal is prediction instead of estimation. We focus on predicting tree mortality (measured as the number of dead trees) from change
snpTree - a web-server to identify and construct SNP trees from whole genome sequence data
DEFF Research Database (Denmark)
Leekitcharoenphon, Pimlapas; Kaas, Rolf Sommer; Thomsen, Martin Christen Frølund
2012-01-01
identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed...... to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic...... skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Results Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can...
Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression
Directory of Open Access Journals (Sweden)
Alfonso L. Palmer
2010-01-01
Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.
A computer-oriented approach to fault-tree construction. Topical report No. 1
International Nuclear Information System (INIS)
Chu, B.B.
1976-11-01
Fault Tree Analysis is one of the major tools for the safety and reliability analysis of large systems. A methodology for systematically constructing fault trees for general complex systems is developed and applied, via the computer program CAT, to several systems. First, a means of representing component behavior by decision tables is presented. In order to use these tables, a procedure for constructing and editing fault trees, either manually or by computer, is described. In order to verify the methodology the computer program CAT has been developed and used to construct fault trees for two systems
A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.
Brewer, Steven D.
This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…
Koestel, John; Bechtold, Michel; Jorda, Helena; Jarvis, Nicholas
2015-04-01
The saturated and near-saturated hydraulic conductivity of soil is of key importance for modelling water and solute fluxes in the vadose zone. Hydraulic conductivity measurements are cumbersome at the Darcy scale and practically impossible at larger scales where water and solute transport models are mostly applied. Hydraulic conductivity must therefore be estimated from proxy variables. Such pedotransfer functions are known to work decently well for e.g. water retention curves but rather poorly for near-saturated and saturated hydraulic conductivities. Recently, Weynants et al. (2009, Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone Journal, 8, 86-95) reported a coefficients of determination of 0.25 (validation with an independent data set) for the saturated hydraulic conductivity from lab-measurements of Belgian soil samples. In our study, we trained boosted regression trees on a global meta-database containing tension-disk infiltrometer data (see Jarvis et al. 2013. Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrology & Earth System Sciences, 17, 5185-5195) to predict the saturated hydraulic conductivity (Ks) and the conductivity at a tension of 10 cm (K10). We found coefficients of determination of 0.39 and 0.62 under a simple 10-fold cross-validation for Ks and K10. When carrying out the validation folded over the data-sources, i.e. the source publications, we found that the corresponding coefficients of determination reduced to 0.15 and 0.36, respectively. We conclude that the stricter source-wise cross-validation should be applied in future pedotransfer studies to prevent overly optimistic validation results. The boosted regression trees also allowed for an investigation of relevant predictors for estimating the near-saturated hydraulic conductivity. We found that land use and bulk density were most important to predict Ks. We also observed that Ks is large in fine
Damghi, Nada; Khoudri, Ibtissam; Oualili, Latifa; Abidi, Khalid; Madani, Naoufel; Zeggwagh, Amine Ali; Abouqal, Redouane
2008-07-01
Meeting the needs of patients' family members becomes an essential part of responsibilities of intensive care unit physicians. The aim of this study was to evaluate the satisfaction of patients' family members using the Arabic version of the Society of Critical Care Medicine's Family Needs Assessment questionnaire and to assess the predictors of family satisfaction using the classification and regression tree method. The authors conducted a prospective study. This study was conducted at a 12-bed medical intensive care unit in Morocco. Family representatives (n = 194) of consecutive patients with a length of stay >48 hrs were included in the study. Intervention was the Society of Critical Care Medicine's Family Needs Assessment questionnaire. Demographic data for relatives included age, gender, relationship with patients, education level, and intensive care unit commuting time. Clinical data for patients included age, gender, diagnoses, intensive care unit length of stay, Acute Physiology and Chronic Health Evaluation, MacCabe index, Therapeutic Interventioning Scoring System, and mechanical ventilation. The Arabic version of the Society of Critical Care Medicine's Family Needs Assessment questionnaire was administered between the third and fifth days after admission. Of family representatives, 81% declared being satisfied with information provided by physicians, 27% would like more information about the diagnosis, 30% about prognosis, and 45% about treatment. In univariate analysis, family satisfaction (small Society of Critical Care Medicine's Family Needs Assessment questionnaire score) increased with a lower family education level (p = .005), when the information was given by a senior physician (p = .014), and when the Society of Critical Care Medicine's Family Needs Assessment questionnaire was administered by an investigator (p = .002). Multivariate analysis (classification and regression tree) showed that the education level was the predominant factor
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Mazenq, Julie; Dubus, Jean-Christophe; Gaudart, Jean; Charpin, Denis; Viudes, Gilles; Noel, Guilhem
2017-11-01
Particulate matter, nitrogen dioxide (NO 2 ) and ozone are recognized as the three pollutants that most significantly affect human health. Asthma is a multifactorial disease. However, the place of residence has rarely been investigated. We compared the impact of air pollution, measured near patients' homes, on emergency department (ED) visits for asthma or trauma (controls) within the Provence-Alpes-Côte-d'Azur region. Variables were selected using classification and regression trees on asthmatic and control population, 3-99 years, visiting ED from January 1 to December 31, 2013. Then in a nested case control study, randomization was based on the day of ED visit and on defined age groups. Pollution, meteorological, pollens and viral data measured that day were linked to the patient's ZIP code. A total of 794,884 visits were reported including 6250 for asthma and 278,192 for trauma. Factors associated with an excess risk of emergency visit for asthma included short-term exposure to NO 2 , female gender, high viral load and a combination of low temperature and high humidity. Short-term exposures to high NO 2 concentrations, as assessed close to the homes of the patients, were significantly associated with asthma-related ED visits in children and adults. Copyright © 2017 Elsevier Ltd. All rights reserved.
Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.
Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid
2016-10-01
Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.
Directory of Open Access Journals (Sweden)
Josef Smolle
2001-01-01
Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.
Directory of Open Access Journals (Sweden)
I GEDE AGUS JIWADIANA
2015-11-01
Full Text Available The aim of this research is to determine the classification characteristics of traffic accidents in Denpasar city in January-July 2014 by using Classification And Regression Trees (CART. Then, for determine the explanatory variables into the main classifier of CART. The result showed that optimum CART generate three terminal node. First terminal node, there are 12 people were classified as heavy traffic accident characteritics with single accident, and second terminal nodes, there are 68 people were classified as minor traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in district road and sub-district road. For third terminal node, there are 291 people were classified as medium traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in municipality road and explanatory variables into the main splitter to make of CART is type of traffic accident with maximum homogeneity measure of 0.03252.
Estimating carbon and showing impacts of drought using satellite data in regression-tree models
Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.
2018-01-01
Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.
Regression trees modeling and forecasting of PM10 air pollution in urban areas
Stoimenova, M.; Voynikova, D.; Ivanov, A.; Gocheva-Ilieva, S.; Iliev, I.
2017-10-01
Fine particulate matter (PM10) air pollution is a serious problem affecting the health of the population in many Bulgarian cities. As an example, the object of this study is the pollution with PM10 of the town of Pleven, Northern Bulgaria. The measured concentrations of this air pollutant for this city consistently exceeded the permissible limits set by European and national legislation. Based on data for the last 6 years (2011-2016), the analysis shows that this applies both to the daily limit of 50 micrograms per cubic meter and the allowable number of daily concentration exceedances to 35 per year. Also, the average annual concentration of PM10 exceeded the prescribed norm of no more than 40 micrograms per cubic meter. The aim of this work is to build high performance mathematical models for effective prediction and forecasting the level of PM10 pollution. The study was conducted with the powerful flexible data mining technique Classification and Regression Trees (CART). The values of PM10 were fitted with respect to meteorological data such as maximum and minimum air temperature, relative humidity, wind speed and direction and others, as well as with time and autoregressive variables. As a result the obtained CART models demonstrate high predictive ability and fit the actual data with up to 80%. The best models were applied for forecasting the level pollution for 3 to 7 days ahead. An interpretation of the modeling results is presented.
Xie, Yang; Schreier, Günter; Chang, David C W; Neubauer, Sandra; Redmond, Stephen J; Lovell, Nigel H
2014-01-01
Healthcare administrators worldwide are striving to both lower the cost of care whilst improving the quality of care given. Therefore, better clinical and administrative decision making is needed to improve these issues. Anticipating outcomes such as number of hospitalization days could contribute to addressing this problem. In this paper, a method was developed, using large-scale health insurance claims data, to predict the number of hospitalization days in a population. We utilized a regression decision tree algorithm, along with insurance claim data from 300,000 individuals over three years, to provide predictions of number of days in hospital in the third year, based on medical admissions and claims data from the first two years. Our method performs well in the general population. For the population aged 65 years and over, the predictive model significantly improves predictions over a baseline method (predicting a constant number of days for each patient), and achieved a specificity of 70.20% and sensitivity of 75.69% in classifying these subjects into two categories of 'no hospitalization' and 'at least one day in hospital'.
A new methodology for the computer-aided construction of fault trees
International Nuclear Information System (INIS)
Salem, S.L.; Apostolakis, G.E.; Okrent, D.
1977-01-01
A methodology for systematically constructing fault trees for general complex systems is developed. A means of modeling component behaviour via decision tables is presented, and a procedure, and a procedure for constructing and editing fault trees, either manually or by computer, is developed. The techniques employed result in a complete fault tree in standard form. In order to demonstrate the methodology, the computer program CAT was developed and is used to construct trees for a nuclear system. By analyzing and comparing these fault trees, several conclusions are reached. First, such an approach can be used to produce fault trees that accurately describe system behaviour. Second, multiple trees can be rapidly produced by defining various TOP events, including system success. Finally, the accuracy and utility of such trees is shown to depend upon the careful development of the decision table models by the analyst, and of the overall system definition itself. Thus the method is seen to be a tool for assisting in the work of fault tree construction rather than a replacement for the careful work of the fault tree analyst. (author)
Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana
2015-05-01
Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.
Greedy algorithm with weights for decision tree construction
Moshkov, Mikhail
2010-12-01
An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.
Greedy algorithm with weights for decision tree construction
Moshkov, Mikhail
2010-01-01
An approximate algorithm for minimization of weighted depth of decision trees is considered. A bound on accuracy of this algorithm is obtained which is unimprovable in general case. Under some natural assumptions on the class NP, the considered algorithm is close (from the point of view of accuracy) to best polynomial approximate algorithms for minimization of weighted depth of decision trees.
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan
2006-01-01
We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Directory of Open Access Journals (Sweden)
Xu Yu
2018-01-01
Full Text Available Cross-domain collaborative filtering (CDCF solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR. We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.
Rovlias, Aristedis; Theodoropoulos, Spyridon; Papoutsakis, Dimitrios
2015-01-01
Background: Chronic subdural hematoma (CSDH) is one of the most common clinical entities in daily neurosurgical practice which carries a most favorable prognosis. However, because of the advanced age and medical problems of patients, surgical therapy is frequently associated with various complications. This study evaluated the clinical features, radiological findings, and neurological outcome in a large series of patients with CSDH. Methods: A classification and regression tree (CART) technique was employed in the analysis of data from 986 patients who were operated at Asclepeion General Hospital of Athens from January 1986 to December 2011. Burr holes evacuation with closed system drainage has been the operative technique of first choice at our institution for 29 consecutive years. A total of 27 prognostic factors were examined to predict the outcome at 3-month postoperatively. Results: Our results indicated that neurological status on admission was the best predictor of outcome. With regard to the other data, age, brain atrophy, thickness and density of hematoma, subdural accumulation of air, and antiplatelet and anticoagulant therapy were found to correlate significantly with prognosis. The overall cross-validated predictive accuracy of CART model was 85.34%, with a cross-validated relative error of 0.326. Conclusions: Methodologically, CART technique is quite different from the more commonly used methods, with the primary benefit of illustrating the important prognostic variables as related to outcome. Since, the ideal therapy for the treatment of CSDH is still under debate, this technique may prove useful in developing new therapeutic strategies and approaches for patients with CSDH. PMID:26257985
Kaskhedikar, Apoorva Prakash
According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI
Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L
2012-08-07
Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in
Directory of Open Access Journals (Sweden)
Andréa Gazzinelli
Full Text Available Praziquantel (PZQ is an effective chemotherapy for schistosomiasis mansoni and a mainstay for its control and potential elimination. However, it does not prevent against reinfection, which can occur rapidly in areas with active transmission. A guide to ranking the risk factors for Schistosoma mansoni reinfection would greatly contribute to prioritizing resources and focusing prevention and control measures to prevent rapid reinfection. The objective of the current study was to explore the relationship among the socioeconomic, demographic, and epidemiological factors that can influence reinfection by S. mansoni one year after successful treatment with PZQ in school-aged children in Northeastern Minas Gerais state Brazil. Parasitological, socioeconomic, demographic, and water contact information were surveyed in 506 S. mansoni-infected individuals, aged 6 to 15 years, resident in these endemic areas. Eligible individuals were treated with PZQ until they were determined to be negative by the absence of S. mansoni eggs in the feces on two consecutive days of Kato-Katz fecal thick smear. These individuals were surveyed again 12 months from the date of successful treatment with PZQ. A classification and regression tree modeling (CART was then used to explore the relationship between socioeconomic, demographic, and epidemiological variables and their reinfection status. The most important risk factor identified for S. mansoni reinfection was their "heavy" infection at baseline. Additional analyses, excluding heavy infection status, showed that lower socioeconomic status and a lower level of education of the household head were also most important risk factors for S. mansoni reinfection. Our results provide an important contribution toward the control and possible elimination of schistosomiasis by identifying three major risk factors that can be used for targeted treatment and monitoring of reinfection. We suggest that control measures that target
Comprehensive database of diameter-based biomass regressions for North American tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2004-01-01
A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...
Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y
2008-02-18
The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.
Giesen, R. J.; Huynen, A. L.; Aarnink, R. G.; de la Rosette, J. J.; Debruyne, F. M.; Wijkstra, H.
1996-01-01
A non-parametric algorithm is described for the construction of a binary decision tree classifier. This tree is used to correlate textural features, computed from ultrasonographic prostate images, with the histopathology of the imaged tissue. The algorithm consists of two parts; growing and pruning.
CAT: a computer code for the automated construction of fault trees
International Nuclear Information System (INIS)
Apostolakis, G.E.; Salem, S.L.; Wu, J.S.
1978-03-01
A computer code, CAT (Computer Automated Tree, is presented which applies decision table methods to model the behavior of components for systematic construction of fault trees. The decision tables for some commonly encountered mechanical and electrical components are developed; two nuclear subsystems, a Containment Spray Recirculation System and a Consequence Limiting Control System, are analyzed to demonstrate the applications of CAT code
Automatic fault tree construction with RIKKE - a compendium of examples. Vol. 2
International Nuclear Information System (INIS)
Taylor, J.R.
1982-02-01
This second volume describes the construction of fault trees for systems with loops, including control and safety loops. It also gives a short summary of the event coding scheme used in the FTLIB component model library. (author)
Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh
2017-08-01
The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations
Directory of Open Access Journals (Sweden)
Aguiar Fabio S
2012-08-01
Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with
Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek
2016-04-01
The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of
Multiple Additive Regression Trees a Methodology for Predictive Data Mining for Fraud Detection
National Research Council Canada - National Science Library
da
2002-01-01
...) is using new and innovative techniques for fraud detection. Their primary techniques for fraud detection are the data mining tools of classification trees and neural networks as well as methods for pooling the results of multiple model fits...
A. Snow; Abdel E. Ghaly
2007-01-01
A surface flow constructed wetland was used to treat stormwater runoff from surrounding watersheds which are comprised primarily of commercial properties and two former landfills. The uptake of manganese by red maple, white birch and red spruce trees growing under flooded soil conditions in the constructed wetland was compared to that of the same trees growing under well drained soil conditions in a nearby reference site. The seasonal variability of manganese and its distribution in different...
Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin
2018-09-01
The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.
Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding
2010-05-01
Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Directory of Open Access Journals (Sweden)
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Early cost estimating for road construction projects using multiple regression techniques
Directory of Open Access Journals (Sweden)
Ibrahim Mahamid
2011-12-01
Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.
International Nuclear Information System (INIS)
Jo, Y. G.
2008-01-01
Complementary events in the event trees for a PRA model should be treated properly in order to evaluate plant risk correctly. In this study, the characteristics of the following three different cut-set generation methods were investigated first in order to find the best practical way for treating complementary events: 1) exact method which treats complementary events logically, 2) no-delete term method which does not treat complementary events at all, and 3) delete term method which treats complementary events by deleting nonsense cut-sets which are generated as a result of ignoring complementary events. Then, practical methods for treating complementary events in constructing linked fault trees for level 1 and level 2 PRA in EPRI R and R workstation software environment, where CAFTA is the fault tree editor and FORTE is the cut-set engine, were suggested and demonstrated. The suggested methods deal with the following selected four typical cases: Case 1: an event tree event (E) is represented by a fault tree gate whose inputs consist of only fault tree gates, Case 2: E is represented by a single basic event, Case 3: E is represented by an OR fault tree gate which has a single basic event and a fault tree gate as inputs, and Case 4: E is represented by an AND fault tree gate which has a single basic event and a fault tree gate as inputs. In the suggested methods, first the high level logic structures of event tree events are examined and restructured, if needed. Then, the delete term method, the exact method, and the combination of the two methods are applied to Case 1, Case 2, and Cases 3 and 4, respectively. Also, it is recommended to treat complementary events, using the suggested methods, before level 1 and level 2 PRA fault trees are coupled. It should be noted that the selected four typical cases may not cover all different cases encountered in level 1 and level 2 PRA modeling. However, a process similar to the one suggested in this study may be used to find
SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.
Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H
2014-02-26
Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
Soltanzadeh, Ahmad; Mohammadfam, Iraj; Moghimbeigi, Abbas; Ghiasvand, Reza
2016-03-01
Construction industry involves the highest risk of occupational accidents and bodily injuries, which range from mild to very severe. The aim of this cross-sectional study was to identify the factors associated with accident severity rate (ASR) in the largest Iranian construction companies based on data about 500 occupational accidents recorded from 2009 to 2013. We also gathered data on safety and health risk management and training systems. Data were analysed using Pearson's chi-squared coefficient and multiple regression analysis. Median ASR (and the interquartile range) was 107.50 (57.24- 381.25). Fourteen of the 24 studied factors stood out as most affecting construction accident severity (p<0.05). These findings can be applied in the design and implementation of a comprehensive safety and health risk management system to reduce ASR.
Linear tree codes and the problem of explicit constructions
Czech Academy of Sciences Publication Activity Database
Pudlák, Pavel
2016-01-01
Roč. 490, February 1 (2016), s. 124-144 ISSN 0024-3795 R&D Projects: GA ČR GBP202/12/G061 Institutional support: RVO:67985840 Keywords : tree code * error correcting code * triangular totally nonsingular matrix Subject RIV: BA - General Mathematics Impact factor: 0.973, year: 2016 http://www.sciencedirect.com/science/article/pii/S002437951500645X
Greenfield, Patricia Marks; Schneider, Leslie
1977-01-01
This study examined the construction of a mobile with plastic construction straws in order to study the development of tree representations in a domain other than language. Subjects were 70 children between the ages of 3 and 11. (Author/JMB)
Constructing an optimal decision tree for FAST corner point detection
Alkhalid, Abdulaziz; Chikalov, Igor; Moshkov, Mikhail
2011-01-01
In this paper, we consider a problem that is originated in computer vision: determining an optimal testing strategy for the corner point detection problem that is a part of FAST algorithm [11,12]. The problem can be formulated as building a decision tree with the minimum average depth for a decision table with all discrete attributes. We experimentally compare performance of an exact algorithm based on dynamic programming and several greedy algorithms that differ in the attribute selection criterion. © 2011 Springer-Verlag.
Directory of Open Access Journals (Sweden)
Soichirou Satoh
Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.
DEFF Research Database (Denmark)
Bou Kheir, Rania; Shomar, B.; Greve, Mogens Humlekrog
2014-01-01
Soil heavy metal pollution has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study used Geographic Information Systems (GIS) and regression-tree modeling (196 trees) to precisely quantify...... the relationships between four toxic heavy metals (Ni, Cr, Cd and As) and sixteen environmental parameters (e.g., parent material, slope gradient, proximity to roads, etc.) in the soils of northern Lebanon (as a case study of Mediterranean landscapes), and to detect the most important parameters that can be used...... between 68% and 100%), surroundings of waste areas (48 – 92%), proximity to roads (45 – 82%) and parent materials (57 – 73%) considerably influenced all investigated heavy metals, which is not the case of hydromorphological and soil properties. For instance, hydraulic conductivity (18 – 41%) and pH (23...
Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.
Mirzaei, Sajad; Wu, Yufeng
2016-01-01
Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.
International Nuclear Information System (INIS)
Wilson, J.R.; Burdick, G.R.
1977-06-01
This is a user's manual for AUTOET I and II. AUTOET I is a computer code for automatic event tree construction. It is designed to incorporate and display subsystem interdependencies and common or key component dependencies in the event tree format. The code is written in FORTRAN IV for the CDC Cyber 76 using the Integrated Graphics System (IGS). AUTOET II incorporates consequence and risk calculations, in addition to some other refinements. 5 figures
Construction of α-decision trees for tables with many-valued decisions
Moshkov, Mikhail; Zielosko, Beata
2011-01-01
The paper is devoted to the study of greedy algorithm for construction of approximate decision trees (α-decision trees). This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. We consider bound on the number of algorithm steps, and bound on the algorithm accuracy relative to the depth of decision trees. © 2011 Springer-Verlag.
Directory of Open Access Journals (Sweden)
Brian A. Johnson
2018-01-01
Full Text Available The advent of very high resolution (VHR satellite imagery and the development of Geographic Object-Based Image Analysis (GEOBIA have led to many new opportunities for fine-scale land cover mapping, especially in urban areas. Image segmentation is an important step in the GEOBIA framework, so great time/effort is often spent to ensure that computer-generated image segments closely match real-world objects of interest. In the remote sensing community, segmentation is frequently performed using the multiresolution segmentation (MRS algorithm, which is tuned through three user-defined parameters (the scale, shape/color, and compactness/smoothness parameters. The scale parameter (SP is the most important parameter and governs the average size of generated image segments. Existing automatic methods to determine suitable SPs for segmentation are scene-specific and often computationally intensive, so an approach to estimating appropriate SPs that is generalizable (i.e., not scene-specific could speed up the GEOBIA workflow considerably. In this study, we attempted to identify generalizable SPs for five common urban land cover types (buildings, vegetation, roads, bare soil, and water through meta-analysis and nonlinear regression tree (RT modeling. First, we performed a literature search of recent studies that employed GEOBIA for urban land cover mapping and extracted the MRS parameters used, the image properties (i.e., spatial and radiometric resolutions, and the land cover classes mapped. Using this data extracted from the literature, we constructed RT models for each land cover class to predict suitable SP values based on the: image spatial resolution, image radiometric resolution, shape/color parameter, and compactness/smoothness parameter. Based on a visual and quantitative analysis of results, we found that for all land cover classes except water, relatively accurate SPs could be identified using our RT modeling results. The main advantage of our
Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung
2017-04-01
Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An
International Nuclear Information System (INIS)
Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien
2015-01-01
Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables
Astuti, Yuniar Andi
2011-01-01
This study examines techniques Support Vector Regression and Decision Tree C4.5 has been used in studies in various fields, in order to know the advantages and disadvantages of both techniques that appear in Data Mining. From the ten studies that use both techniques, the results of the analysis showed that the accuracy of the SVR technique for 59,64% and C4.5 for 76,97% So in this study obtained a statement that C4.5 is better than SVR 097038020
Treatment of complementary events in constructing the linked Level 1 and Level 2 fault trees
International Nuclear Information System (INIS)
Jo, Young G.; Ahn, Kwang-Il
2009-01-01
Complementary events in the event trees for a PRA model should be treated properly in order to evaluate plant risk correctly. In this paper, the characteristics of the following three different cutset generation methods were investigated first in order to find the best practical way for treating complementary events: (1) exact method which treats complementary events logically, (2) no-delete term method which does not treat complementary events at all, and (3) delete term method which treats complementary events by deleting nonsense cutsets which are generated as a result of ignoring complementary events. Then, practical methods for treating complementary events in constructing linked fault trees for Level 1 and Level 2 PRA were suggested and demonstrated. The suggested methods deal with the following selected four typical cases: (1) Case 1-an event tree event (E) is represented by a fault tree gate whose inputs consist of only fault tree gates, (2) Case 2-E is represented by a single basic event, (3) Case 3-E is represented by an OR fault tree gate which has a single basic event and a fault tree gate as inputs, and (4) Case 4-E is represented by an AND fault tree gate which has a single basic event and a fault tree gate as inputs. In the suggested methods, first the high level logic structures of event tree events are examined and restructured, if needed. Then, the delete term method, the exact method, and the combination of the two methods are applied to through Case 1 to Case 4, respectively. As a result, it is recommended to treat complementary events, using the suggested methods, before Level 1 and Level 2 PRA fault trees are coupled
Liu, Zhenqiu; Sun, Fengzhu; Braun, Jonathan; McGovern, Dermot P B; Piantadosi, Steven
2015-04-01
Identifying disease associated taxa and constructing networks for bacteria interactions are two important tasks usually studied separately. In reality, differentiation of disease associated taxa and correlation among taxa may affect each other. One genus can be differentiated because it is highly correlated with another highly differentiated one. In addition, network structures may vary under different clinical conditions. Permutation tests are commonly used to detect differences between networks in distinct phenotypes, and they are time-consuming. In this manuscript, we propose a multilevel regularized regression method to simultaneously identify taxa and construct networks. We also extend the framework to allow construction of a common network and differentiated network together. An efficient algorithm with dual formulation is developed to deal with the large-scale n ≪ m problem with a large number of taxa (m) and a small number of samples (n) efficiently. The proposed method is regularized with a general Lp (p ∈ [0, 2]) penalty and models the effects of taxa abundance differentiation and correlation jointly. We demonstrate that it can identify both true and biologically significant genera and network structures. Software MLRR in MATLAB is available at http://biostatistics.csmc.edu/mlrr/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Directory of Open Access Journals (Sweden)
MILAD TAZIK
2017-11-01
Full Text Available Identifying cases in which road crashes result in fatality or injury of drivers may help improve their safety. In this study, datasets of crashes happened in TehranQom freeway, Iran, were examined by three models (multiple logistic regression, Bayesian logistic and classification tree to analyse the contribution of several variables to fatal accidents. For multiple logistic regression and Bayesian logistic models, the odds ratio was calculated for each variable. The model which best suited the identification of accident severity was determined based on AIC and DIC criteria. Based on the results of these two models, rollover crashes (OR = 14.58, %95 CI: 6.8-28.6, not using of seat belt (OR = 5.79, %95 CI: 3.1-9.9, exceeding speed limits (OR = 4.02, %95 CI: 1.8-7.9 and being female (OR = 2.91, %95 CI: 1.1-6.1 were the most important factors in fatalities of drivers. In addition, the results of the classification tree model have verified the findings of the other models.
How to avoid the generation of logic loops in the construction of fault trees
International Nuclear Information System (INIS)
Demichela, M.; Piccinini, N.; Ciarambino, I.; Contini, S.
2004-01-01
Generation of an infinite series of identical sub-trees may occur during the construction of a Fault Tree (FT) when one item of equipment in a plant is considered several times in the same sub-tree in the course of the tree extraction from a HazOp (Hazard Operability analysis) analysis. Generation of loops in the construction of an FT can be avoided by means of an ad hoc logical analysis in which certain simple rules of syntax are taken into account. A radical solution, however, can be obtained if identification of unwanted events in a process plant is not undertaken with conventional procedures, such as HazOp (Operability Analysis with guide words, failure mode and effect analysis (FMEA) etc.), but with a more modern and structured version, such as Recursive Operability Analysis (ROA), which is both systematic and complete, and allows direct extraction of logic trees, (FT, event trees, etc.) for subsequent quantification. This feature means that, by contrast with conventional operability analysis, the congruence of the ROA itself can be checked. The ROA method is illustrated in this paper with the aid of some simple examples
ERA: Efficient serial and parallel suffix tree construction for very long strings
Mansour, Essam
2011-09-01
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.
A. Snow; Abdel E. Ghaly; R. Cote; A. M. Snow
2008-01-01
A surface flow wetland was constructed in the Burnside Industrial Park, Dartmouth, Nova Scotia, to treat stormwater runoff from the surrounding watersheds which are comprised primarily of commercial properties and two former landfills. The objectives of this study were: (a) to compare the uptake of iron by red maple, white birch and red spruce trees growing under flooded soil conditions in the constructed wetland and well drained soil conditions in a nearby reference site, (b) to evaluate the...
BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.
Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu
2013-09-15
Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.
Decision table development and application to the construction of fault trees
International Nuclear Information System (INIS)
Salem, S.L.; Wu, J.S.; Apostolakis, G.
1979-01-01
A systematic methodology for the construction of fault trees based on the use of decision tables has been developed. These tables are used to describe each possible output state of a component as a set of combinations of states of inputs and internal operational or T states. Two methods for modeling component behavior via decision tables have been developed, one inductive and one deductive. These methods are useful for creating decision tables that realistically model the operational and failure modes of electrical, mechanical, and hydraulic components as well as human interactions inhibit conditions and common-cause events. A computer code CAT (Computer Automated Tree) has been developed to automatically produce fault trees from decision tables. A simple electrical system was chosen to illustrate the basic features of the decision table approach and to provide an example of an actual fault tree produced by this code. This example demonstrates the potential utility of such an automated approach to fault tree construction once a basic set of general decision tables has been developed
An Efficient Distributed Algorithm for Constructing Spanning Trees in Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Rosana Lachowski
2015-01-01
Full Text Available Monitoring and data collection are the two main functions in wireless sensor networks (WSNs. Collected data are generally transmitted via multihop communication to a special node, called the sink. While in a typical WSN, nodes have a sink node as the final destination for the data traffic, in an ad hoc network, nodes need to communicate with each other. For this reason, routing protocols for ad hoc networks are inefficient for WSNs. Trees, on the other hand, are classic routing structures explicitly or implicitly used in WSNs. In this work, we implement and evaluate distributed algorithms for constructing routing trees in WSNs described in the literature. After identifying the drawbacks and advantages of these algorithms, we propose a new algorithm for constructing spanning trees in WSNs. The performance of the proposed algorithm and the quality of the constructed tree were evaluated in different network scenarios. The results showed that the proposed algorithm is a more efficient solution. Furthermore, the algorithm provides multiple routes to the sensor nodes to be used as mechanisms for fault tolerance and load balancing.
Bersinger, T; Bareille, G; Pigot, T; Bru, N; Le Hécho, I
2018-06-01
A good knowledge of the dynamic of pollutant concentration and flux in a combined sewer network is necessary when considering solutions to limit the pollutants discharged by combined sewer overflow (CSO) into receiving water during wet weather. Identification of the parameters that influence pollutant concentration and flux is important. Nevertheless, few studies have obtained satisfactory results for the identification of these parameters using statistical tools. Thus, this work uses a large database of rain events (116 over one year) obtained via continuous measurement of rainfall, discharge flow and chemical oxygen demand (COD) estimated using online turbidity for the identification of these parameters. We carried out a statistical study of the parameters influencing the maximum COD concentration, the discharge flow and the discharge COD flux. In this study a new test was used that has never been used in this field: the conditional regression tree test. We have demonstrated that the antecedent dry weather period, the rain event average intensity and the flow before the event are the three main factors influencing the maximum COD concentration during a rainfall event. Regarding the discharge flow, it is mainly influenced by the overall rainfall height but not by the maximum rainfall intensity. Finally, COD discharge flux is influenced by the discharge volume and the maximum COD concentration. Regression trees seem much more appropriate than common tests like PCA and PLS for this type of study as they take into account the thresholds and cumulative effects of various parameters as a function of the target variable. These results could help to improve sewer and CSO management in order to decrease the discharge of pollutants into receiving waters. Copyright © 2017 Elsevier B.V. All rights reserved.
Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.
Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak
2006-06-06
To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence
Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks
Directory of Open Access Journals (Sweden)
Chang Jeong-Ho
2006-06-01
Full Text Available Abstract Background To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. Results To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. Conclusion By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway
Azad, Mohammad; Moshkov, Mikhail
2014-01-01
A greedy algorithm has been presented in this paper to construct decision trees for three different approaches (many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassification error’ is used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’ heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed by this algorithm are compared in the framework of each of the three approaches.
Constructal tree-shaped two-phase flow for cooling a surface
Energy Technology Data Exchange (ETDEWEB)
Zamfirescu, C.; Bejan, A. [Duke University, Durham, NC (United States). Dept. of Mechanical Engineering and Materials Science
2003-07-01
This paper documents the strong relation that exists between the changing architecture of a complex flow system and the maximization of global performance under constraints. The system is a surface with uniform heating per unit area, which is cooled by a network with evaporating two-phase flow. Illustrations are based on the design of the cooling network for a skating rink. The flow structure is optimized as a sequence of building blocks, which starts with the smallest (elemental volume of fixed size), and continues with assemblies of stepwise larger sizes (first construct, second construct, etc.). The optimized flow network is tree shaped. Three features of the elemental volume are optimized: the cross-sectional shape, the elemental tube diameter, and the shape of the elemental area viewed from above. The tree that emerges at larger scales is optimized for minimal amount of header material and fixed pressure drop. The optimal number of constituents in each new (larger) construct decreases as the size and complexity of the construct increase. Constructs of various levels of complexity compete: the paper shows how to select the optimal flow structure subject to fixed size (cooled surface), pressure drop and amount of header material. (author)
Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression
National Research Council Canada - National Science Library
Cook, Jason J
2006-01-01
Cost overruns are a critical problem for construction projects. The common practice for dealing with cost overruns is the assignment of an arbitrary flat percentage of the construction budget as a contingency fund...
Smith, R.; Kasprzyk, J. R.; Balaji, R.
2017-12-01
In light of deeply uncertain factors like future climate change and population shifts, responsible resource management will require new types of information and strategies. For water utilities, this entails potential expansion and efficient management of water supply infrastructure systems for changes in overall supply; changes in frequency and severity of climate extremes such as droughts and floods; and variable demands, all while accounting for conflicting long and short term performance objectives. Multiobjective Evolutionary Algorithms (MOEAs) are emerging decision support tools that have been used by researchers and, more recently, water utilities to efficiently generate and evaluate thousands of planning portfolios. The tradeoffs between conflicting objectives are explored in an automated way to produce (often large) suites of portfolios that strike different balances of performance. Once generated, the sets of optimized portfolios are used to support relatively subjective assertions of priorities and human reasoning, leading to adoption of a plan. These large tradeoff sets contain information about complex relationships between decisions and between groups of decisions and performance that, until now, has not been quantitatively described. We present a novel use of Multivariate Regression Trees (MRTs) to analyze tradeoff sets to reveal these relationships and critical decisions. Additionally, when MRTs are applied to tradeoff sets developed for different realizations of an uncertain future, they can identify decisions that are robust across a wide range of conditions and produce fundamental insights about the system being optimized.
Directory of Open Access Journals (Sweden)
David J. Purpura
2017-12-01
Full Text Available Many children struggle to successfully acquire early mathematics skills. Theoretical and empirical evidence has pointed to deficits in domain-specific skills (e.g., non-symbolic mathematics skills or domain-general skills (e.g., executive functioning and language as underlying low mathematical performance. In the current study, we assessed a sample of 113 three- to five-year old preschool children on a battery of domain-specific and domain-general factors in the fall and spring of their preschool year to identify Time 1 (fall factors associated with low performance in mathematics knowledge at Time 2 (spring. We used the exploratory approach of classification and regression tree analyses, a strategy that uses step-wise partitioning to create subgroups from a larger sample using multiple predictors, to identify the factors that were the strongest classifiers of low performance for younger and older preschool children. Results indicated that the most consistent classifier of low mathematics performance at Time 2 was children’s Time 1 mathematical language skills. Further, other distinct classifiers of low performance emerged for younger and older children. These findings suggest that risk classification for low mathematics performance may differ depending on children’s age.
Chiang, Peggy Pei-Chia; Xie, Jing; Keeffe, Jill Elizabeth
2011-04-25
To identify the critical success factors (CSF) associated with coverage of low vision services. Data were collected from a survey distributed to Vision 2020 contacts, government, and non-government organizations (NGOs) in 195 countries. The Classification and Regression Tree Analysis (CART) was used to identify the critical success factors of low vision service coverage. Independent variables were sourced from the survey: policies, epidemiology, provision of services, equipment and infrastructure, barriers to services, human resources, and monitoring and evaluation. Socioeconomic and demographic independent variables: health expenditure, population statistics, development status, and human resources in general, were sourced from the World Health Organization (WHO), World Bank, and the United Nations (UN). The findings identified that having >50% of children obtaining devices when prescribed (χ(2) = 44; P 3 rehabilitation workers per 10 million of population (χ(2) = 4.50; P = 0.034), higher percentage of population urbanized (χ(2) = 14.54; P = 0.002), a level of private investment (χ(2) = 14.55; P = 0.015), and being fully funded by government (χ(2) = 6.02; P = 0.014), are critical success factors associated with coverage of low vision services. This study identified the most important predictors for countries with better low vision coverage. The CART is a useful and suitable methodology in survey research and is a novel way to simplify a complex global public health issue in eye care.
Heddam, Salim; Kisi, Ozgur
2018-04-01
In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.
Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen
2010-07-01
We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
Directory of Open Access Journals (Sweden)
Yingxin Gu
2016-11-01
Full Text Available Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD between the predicted and actual NDVI (scaled NDVI, value from 0–200 and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4, which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Energy Technology Data Exchange (ETDEWEB)
Lopes, J.A Pecas; Vasconcelos, Maria Helena O.P. de [Instituto de Engenharia de Sistemas e Computadores (INESC), Porto (Portugal). E-mail: jpl@riff.fe.up.pt; hvasconcelos@inescn.pt
1999-07-01
This paper describes in a synthetic manner the technology adopted to define structures used in the fast evaluation of dynamic safety of isolated network with high level of eolic production contribution. This methodology uses hybrid regression trees, which allows the quantification the endurance connected to the dynamic behavior of these networks by emulating the frequency minimum deviation that will be experienced by the system when submitted toa pre-defined perturbation. Also, new procedures for data automatic generation are presented, which will be used for construction and measurements of the evaluation structures performance. The paper describes the Terceira island - Acores archipelago network study case.
Al-Khaja, Nawal
2007-01-01
This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.
Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.
2012-01-01
agebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change – adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush (Artemisia spp.), percent big sagebrush (Artemisia tridentata), percent Wyoming sagebrush (Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.
International Nuclear Information System (INIS)
Althuwaynee, Omar F; Pradhan, Biswajeet; Ahmad, Noordin
2014-01-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Constructing Binomial Trees Via Random Maps for Analysis of Financial Assets
Directory of Open Access Journals (Sweden)
Antonio Airton Carneiro de Freitas
2010-04-01
Full Text Available Random maps can be constructed from a priori knowledge of the financial assets. It is also addressed the reverse problem, i.e. from a function of an empirical stationary probability density function we set up a random map that naturally leads to an implied binomial tree, allowing the adjustment of models, including the ability to incorporate jumps. An applica- tion related to the options market is presented. It is emphasized that the quality of the model to incorporate a priori knowledge of the financial asset may be affected, for example, by the skewed vision of the analyst.
NetRaVE: constructing dependency networks using sparse linear regression
DEFF Research Database (Denmark)
Phatak, A.; Kiiveri, H.; Clemmensen, Line Katrine Harder
2010-01-01
NetRaVE is a small suite of R functions for generating dependency networks using sparse regression methods. Such networks provide an alternative to interpreting 'top n lists' of genes arising out of an analysis of microarray data, and they provide a means of organizing and visualizing the resulting...
Zhan, Zongqian; Wang, Xin; Wei, Minglu
2015-05-01
In image-based three-dimensional (3-D) reconstruction, one topic of growing importance is how to quickly obtain a 3-D model from a large number of images. The retrieval of the correct and relevant images for the model poses a considerable technological challenge. The "image vocabulary tree" has been proposed as a method to search for similar images. However, a significant drawback of this approach is identified in its low time efficiency and barely satisfactory classification result. The method proposed is inspired by, and improves upon, some recent methods. Specifically, vocabulary quality is considered and multivocabulary trees are designed to improve the classification result. A marked improvement was, indeed, observed in our evaluation of the proposed method. To improve time efficiency, graphics processing unit (GPU) computer unified device architecture parallel computation is applied in the multivocabulary trees. The results of the experiments showed that the GPU was three to four times more efficient than the enumeration matching and CPU methods when the number of images is large. This paper presents a reliable reference method for the rapid construction of a free network to be used for the computing of 3-D information.
Directory of Open Access Journals (Sweden)
Artur Wnorowski
2017-06-01
Full Text Available Tree saps are nourishing biological media commonly used for beverage and syrup production. Although the nutritional aspect of tree saps is widely acknowledged, the exact relationship between the sap composition, origin, and effect on the metabolic rate of human cells is still elusive. Thus, we collected saps from seven different tree species and conducted composition-activity analysis. Saps from trees of Betulaceae, but not from Salicaceae, Sapindaceae, nor Juglandaceae families, were increasing the metabolic rate of HepG2 cells, as measured using tetrazolium-based assay. Content of glucose, fructose, sucrose, chlorides, nitrates, sulphates, fumarates, malates, and succinates in sap samples varied across different tree species. Grade correspondence analysis clustered trees based on the saps’ chemical footprint indicating its usability in chemotaxonomy. Multiple regression modeling showed that glucose and fumarate present in saps from silver birch (Betula pendula Roth., black alder (Alnus glutinosa Gaertn., and European hornbeam (Carpinus betulus L. are positively affecting the metabolic activity of HepG2 cells.
Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok
2013-02-01
The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic
LifePrint: a novel k-tuple distance method for construction of phylogenetic trees
Directory of Open Access Journals (Sweden)
Fabián Reyes-Prieto
2011-01-01
Full Text Available Fabián Reyes-Prieto1, Adda J García-Chéquer1, Hueman Jaimes-Díaz1, Janet Casique-Almazán1, Juana M Espinosa-Lara1, Rosaura Palma-Orozco2, Alfonso Méndez-Tenorio1, Rogelio Maldonado-Rodríguez1, Kenneth L Beattie31Laboratory of Biotechnology and Genomic Bioinformatics, Department of Biochemistry, National School of Biological Sciences, 2Superior School of Computer Sciences, National Polytechnic Institute, Mexico City, Mexico; 3Amerigenics Inc, Crossville, Tennessee, USAPurpose: Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes.Methods: We designed a representative sample of all possible DNA tuples of length 9 (9-tuples. The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9 that are distinct from each other by at least two internal and noncontiguous nucleotide differences. For validation of our k-tuple distance method, we analyzed several real and simulated viroid genomes. Using different distance metrics, we scrutinized diverse viroid genomes to estimate the k-tuple distances between these genomic sequences. Then we used the estimated genomic k-tuple distances to construct phylogenetic trees using the neighbor-joining algorithm. A comparison of the accuracy of LPS9 and the previously reported 5-tuple method was made using symmetric differences between the trees estimated from each method and a simulated “true” phylogenetic tree.Results: The identified optimal search scheme for LPS9 allows only up to two nucleotide differences between each 9-tuple and the scrutinized genome. Similarity search results of simulated viroid genomes indicate that, in most cases, LPS9 is able to detect single-base substitutions between genomes efficiently. Analysis of simulated genomic variants with a high proportion of base substitutions indicates that LPS9 is able to discern relationships between genomic variants with up to 40% of nucleotide
Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.
2012-06-01
In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).
Wang, Fen; Ye, Bin
2016-09-01
Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.
Directory of Open Access Journals (Sweden)
H.H. Mohamad
2013-09-01
This research aims to develop a mathematical model for assessing the expected net profit of any construction company. To achieve the research objective, four steps were performed. First, the main factors affecting firms’ net profit were identified. Second, pertinent data regarding the net profit factors were collected. Third, two different net profit models were developed using the Multiple Regression (MR and the Neural Network (NN techniques. The validity of the proposed models was also investigated. Finally, the results of both MR and NN models were compared to investigate the predictive capabilities of the two models.
Decision-table development for use with the CAT code for the automated fault-tree construction
International Nuclear Information System (INIS)
Wu, J.S.; Salem, S.L.; Apostolakis, G.E.
1977-01-01
A library of decision tables to be used in connection with the CAT computer code for the automated construction of fault trees is presented. A decision table is constructed for each component type describing the output of the component in terms of its inputs and its internal states. In addition, a modification of the CAT code that couples it with a fault tree analysis code is presented. This report represents one aspect of a study entitled, 'A General Evaluation Approach to Risk-Benefit for Large Technological Systems, and Its Application to Nuclear Power.'
Constructing the tree-level Yang-Mills S-matrix using complex factorization
Schuster, Philip C.; Toro, Natalia
2009-06-01
A remarkable connection between BCFW recursion relations and constraints on the S-matrix was made by Benincasa and Cachazo in 0705.4305, who noted that mutual consistency of different BCFW constructions of four-particle amplitudes generates non-trivial (but familiar) constraints on three-particle coupling constants — these include gauge invariance, the equivalence principle, and the lack of non-trivial couplings for spins > 2. These constraints can also be derived with weaker assumptions, by demanding the existence of four-point amplitudes that factorize properly in all unitarity limits with complex momenta. From this starting point, we show that the BCFW prescription can be interpreted as an algorithm for fully constructing a tree-level S-matrix, and that complex factorization of general BCFW amplitudes follows from the factorization of four-particle amplitudes. The allowed set of BCFW deformations is identified, formulated entirely as a statement on the three-particle sector, and using only complex factorization as a guide. Consequently, our analysis based on the physical consistency of the S-matrix is entirely independent of field theory. We analyze the case of pure Yang-Mills, and outline a proof for gravity. For Yang-Mills, we also show that the well-known scaling behavior of BCFW-deformed amplitudes at large z is a simple consequence of factorization. For gravity, factorization in certain channels requires asymptotic behavior ~ 1/z2.
Constructing the tree-level Yang-Mills S-matrix using complex factorization
International Nuclear Information System (INIS)
Schuster, Philip C.; Toro, Natalia
2009-01-01
A remarkable connection between BCFW recursion relations and constraints on the S-matrix was made by Benincasa and Cachazo in 0705.4305, who noted that mutual consistency of different BCFW constructions of four-particle amplitudes generates non-trivial (but familiar) constraints on three-particle coupling constants - these include gauge invariance, the equivalence principle, and the lack of non-trivial couplings for spins > 2. These constraints can also be derived with weaker assumptions, by demanding the existence of four-point amplitudes that factorize properly in all unitarity limits with complex momenta. From this starting point, we show that the BCFW prescription can be interpreted as an algorithm for fully constructing a tree-level S-matrix, and that complex factorization of general BCFW amplitudes follows from the factorization of four-particle amplitudes. The allowed set of BCFW deformations is identified, formulated entirely as a statement on the three-particle sector, and using only complex factorization as a guide. Consequently, our analysis based on the physical consistency of the S-matrix is entirely independent of field theory. We analyze the case of pure Yang-Mills, and outline a proof for gravity. For Yang-Mills, we also show that the well-known scaling behavior of BCFW-deformed amplitudes at large z is a simple consequence of factorization. For gravity, factorization in certain channels requires asymptotic behavior ∼ 1/z 2 .
ERA: Efficient serial and parallel suffix tree construction for very long strings
Mansour, Essam; Allam, Amin; Skiadopoulos, Spiros G.; Kalnis, Panos
2011-01-01
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree
Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.
2008-01-01
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Azad, Mohammad
2013-11-25
In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.
Azad, Mohammad; Chikalov, Igor; Moshkov, Mikhail; Zielosko, Beata
2013-01-01
In the paper, we study a greedy algorithm for construction of decision trees. This algorithm is applicable to decision tables with many-valued decisions where each row is labeled with a set of decisions. For a given row, we should find a decision from the set attached to this row. Experimental results for data sets from UCI Machine Learning Repository and randomly generated tables are presented. We make a comparative study of the depth and average depth of the constructed decision trees for proposed approach and approach based on generalized decision. The obtained results show that the proposed approach can be useful from the point of view of knowledge representation and algorithm construction.
Patch-based image segmentation of satellite imagery using minimum spanning tree construction
Energy Technology Data Exchange (ETDEWEB)
Skurikhin, Alexei N [Los Alamos National Laboratory
2010-01-01
We present a method for hierarchical image segmentation and feature extraction. This method builds upon the combination of the detection of image spectral discontinuities using Canny edge detection and the image Laplacian, followed by the construction of a hierarchy of segmented images of successively reduced levels of details. These images are represented as sets of polygonized pixel patches (polygons) attributed with spectral and structural characteristics. This hierarchy forms the basis for object-oriented image analysis. To build fine level-of-detail representation of the original image, seed partitions (polygons) are built upon a triangular mesh composed of irregular sized triangles, whose spatial arrangement is adapted to the image content. This is achieved by building the triangular mesh on the top of the detected spectral discontinuities that form a network of constraints for the Delaunay triangulation. A polygonized image is represented as a spatial network in the form of a graph with vertices which correspond to the polygonal partitions and graph edges reflecting pairwise partitions relations. Image graph partitioning is based on the iterative graph oontraction using Boruvka's Minimum Spanning Tree algorithm. An important characteristic of the approach is that the agglomeration of partitions is constrained by the detected spectral discontinuities; thus the shapes of agglomerated partitions are more likely to correspond to the outlines of real-world objects.
Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H
2018-01-01
Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.
Chesters, Douglas
2017-05-01
Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among
Directory of Open Access Journals (Sweden)
Regis Wendpouire Oubida
2015-03-01
Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.
Constructing multi-labelled decision trees for junction design using the predicted probabilities
Bezembinder, Erwin M.; Wismans, Luc J. J.; Van Berkum, Eric C.
2017-01-01
In this paper, we evaluate the use of traditional decision tree algorithms CRT, CHAID and QUEST to determine a decision tree which can be used to predict a set of (Pareto optimal) junction design alternatives (e.g. signal or roundabout) for a given traffic demand pattern and available space. This is
Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T
2006-08-01
The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust
Serag, Ahmed; Aljabar, Paul; Ball, Gareth; Counsell, Serena J; Boardman, James P; Rutherford, Mary A; Edwards, A David; Hajnal, Joseph V; Rueckert, Daniel
2012-02-01
Medical imaging has shown that, during early development, the brain undergoes more changes in size, shape and appearance than at any other time in life. A better understanding of brain development requires a spatio-temporal atlas that characterizes the dynamic changes during this period. In this paper we present an approach for constructing a 4D atlas of the developing brain, between 28 and 44 weeks post-menstrual age at time of scan, using T1 and T2 weighted MR images from 204 premature neonates. The method used for the creation of the average 4D atlas utilizes non-rigid registration between all pairs of images to eliminate bias in the atlas toward any of the original images. In addition, kernel regression is used to produce age-dependent anatomical templates. A novelty in our approach is the use of a time-varying kernel width, to overcome the variations in the distribution of subjects at different ages. This leads to an atlas that retains a consistent level of detail at every time-point. Comparisons between the resulting atlas and atlases constructed using affine and non-rigid registration are presented. The resulting 4D atlas has greater anatomic definition than currently available 4D atlases created using various affine and non-rigid registration approaches, an important factor in improving registrations between the atlas and individual subjects. Also, the resulting 4D atlas can serve as a good representative of the population of interest as it reflects both global and local changes. The atlas is publicly available at www.brain-development.org. Copyright © 2011 Elsevier Inc. All rights reserved.
Yield curve event tree construction for multi stage stochastic programming models
DEFF Research Database (Denmark)
Rasmussen, Kourosh Marjani; Poulsen, Rolf
Dynamic stochastic programming (DSP) provides an intuitive framework for modelling of financial portfolio choice problems where market frictions are present and dynamic re--balancing has a significant effect on initial decisions. The application of these models in practice, however, is limited....... Indeed defining a universal and tractable framework for fully ``appropriate'' event trees is in our opinion an impossible task. A problem specific approach to designing such event trees is the way ahead. In this paper we propose a number of desirable properties which should be present in an event tree...
DEFF Research Database (Denmark)
Greve, Mogens Humlekrog; Bou Kheir, Rania; Greve, Mette Balslev
2012-01-01
Soil texture is an important soil characteristic that drives crop production and field management, and is the basis for environmental monitoring (including soil quality and sustainability, hydrological and ecological processes, and climate change simulations). The combination of coarse sand, fine...... sand, silt, and clay in soil determines its textural classification. This study used Geographic Information Systems (GIS) and regression-tree modeling to precisely quantify the relationships between the soil texture fractions and different environmental parameters on a national scale, and to detect...... precipitation, seasonal precipitation to statistically explain soil texture fractions field/laboratory measurements (45,224 sampling sites) in the area of interest (Denmark). The developed strongest relationships were associated with clay and silt, variance being equal to 60%, followed by coarse sand (54...
Yamashita, Takashi; Kart, Cary S; Noe, Douglas A
2012-12-01
Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.
Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O
2014-06-01
Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p falls (OR = 0.25 with p factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.
Directory of Open Access Journals (Sweden)
H. Prashantha Kumar
2011-09-01
Full Text Available Low density parity check (LDPC codes are capacity-approaching codes, which means that practical constructions exist that allow the noise threshold to be set very close to the theoretical Shannon limit for a memory less channel. LDPC codes are finding increasing use in applications like LTE-Networks, digital television, high density data storage systems, deep space communication systems etc. Several algebraic and combinatorial methods are available for constructing LDPC codes. In this paper we discuss a novel low complexity algebraic method for constructing regular LDPC like codes derived from full rank codes. We demonstrate that by employing these codes over AWGN channels, coding gains in excess of 2dB over un-coded systems can be realized when soft iterative decoding using a parity check tree is employed.
Directory of Open Access Journals (Sweden)
Francesco Cerasoli
Full Text Available Boosted Regression Trees (BRT is one of the modelling techniques most recently applied to biodiversity conservation and it can be implemented with presence-only data through the generation of artificial absences (pseudo-absences. In this paper, three pseudo-absences generation techniques are compared, namely the generation of pseudo-absences within target-group background (TGB, testing both the weighted (WTGB and unweighted (UTGB scheme, and the generation at random (RDM, evaluating their performance and applicability in distribution modelling and species conservation. The choice of the target group fell on amphibians, because of their rapid decline worldwide and the frequent lack of guidelines for conservation strategies and regional-scale planning, which instead could be provided through an appropriate implementation of SDMs. Bufo bufo, Salamandrina perspicillata and Triturus carnifex were considered as target species, in order to perform our analysis with species having different ecological and distributional characteristics. The study area is the "Gran Sasso-Monti della Laga" National Park, which hosts 15 Natura 2000 sites and represents one of the most important biodiversity hotspots in Europe. Our results show that the model calibration ameliorates when using the target-group based pseudo-absences compared to the random ones, especially when applying the WTGB. Contrarily, model discrimination did not significantly vary in a consistent way among the three approaches with respect to the tree target species. Both WTGB and RDM clearly isolate the highly contributing variables, supplying many relevant indications for species conservation actions. Moreover, the assessment of pairwise variable interactions and their three-dimensional visualization further increase the amount of useful information for protected areas' managers. Finally, we suggest the use of RDM as an admissible alternative when it is not possible to individuate a suitable set of
Directory of Open Access Journals (Sweden)
José A. Delgado
2012-01-01
Full Text Available Forest structural parameters such as quadratic mean diameter, basal area, and number of trees per unit area are important for the assessment of wood volume and biomass and represent key forest inventory attributes. Forest inventory information is required to support sustainable management, carbon accounting, and policy development activities. Digital image processing of remotely sensed imagery is increasingly utilized to assist traditional, more manual, methods in the estimation of forest structural attributes over extensive areas, also enabling evaluation of change over time. Empirical attribute estimation with remotely sensed data is frequently employed, yet with known limitations, especially over complex environments such as Mediterranean forests. In this study, the capacity of high spatial resolution (HSR imagery and related techniques to model structural parameters at the stand level (n = 490 in Mediterranean pines in Central Spain is tested using data from the commercial satellite QuickBird-2. Spectral and spatial information derived from multispectral and panchromatic imagery (2.4 m and 0.68 m sided pixels, respectively served to model structural parameters. Classification and Regression Tree Analysis (CART was selected for the modeling of attributes. Accurate models were produced of quadratic mean diameter (QMD (R2 = 0.8; RMSE = 0.13 m with an average error of 17% while basal area (BA models produced an average error of 22% (RMSE = 5.79 m2/ha. When the measured number of trees per unit area (N was categorized, as per frequent forest management practices, CART models correctly classified 70% of the stands, with all other stands classified in an adjacent class. The accuracy of the attributes estimated here is expected to be better when canopy cover is more open and attribute values are at the lower end of the range present, as related in the pattern of the residuals found in this study. Our findings indicate that attributes derived from
Fault tree construction of hybrid system requirements using qualitative formal method
International Nuclear Information System (INIS)
Lee, Jang-Soo; Cha, Sung-Deok
2005-01-01
When specifying requirements for software controlling hybrid systems and conducting safety analysis, engineers experience that requirements are often known only in qualitative terms and that existing fault tree analysis techniques provide little guidance on formulating and evaluating potential failure modes. In this paper, we propose Causal Requirements Safety Analysis (CRSA) as a technique to qualitatively evaluate causal relationship between software faults and physical hazards. This technique, extending qualitative formal method process and utilizing information captured in the state trajectory, provides specific guidelines on how to identify failure modes and relationship among them. Using a simplified electrical power system as an example, we describe step-by-step procedures of conducting CRSA. Our experience of applying CRSA to perform fault tree analysis on requirements for the Wolsong nuclear power plant shutdown system indicates that CRSA is an effective technique in assisting safety engineers
Self-constructed tree-shape high thermal conductivity nanosilver networks in epoxy.
Pashayi, Kamyar; Fard, Hafez Raeisi; Lai, Fengyuan; Iruvanti, Sushumna; Plawsky, Joel; Borca-Tasciuc, Theodorian
2014-04-21
We report the formation of high aspect ratio nanoscale tree-shape silver networks in epoxy, at low temperatures (thermal conductivity (κ) of the nanocomposite compared to the polymer matrix. The networks form through a three-step process comprising of self-assembly by diffusion limited aggregation of polyvinylpyrrolidone (PVP) coated nanoparticles, removal of PVP coating from the surface, and sintering of silver nanoparticles in high aspect ratio networked structures. Controlling self-assembly and sintering by carefully designed multistep temperature and time processing leads to κ of our silver nanocomposites that are up to 300% of the present state of the art polymer nanocomposites at similar volume fractions. Our investigation of the κ enhancements enabled by tree-shaped network nanocomposites provides a basis for the development of new polymer nanocomposites for thermal transport and storage applications.
Hutton, Eileen K; Simioni, Julia C; Thabane, Lehana
2017-08-01
Among women with a fetus with a non-cephalic presentation, external cephalic version (ECV) has been shown to reduce the rate of breech presentation at birth and cesarean birth. Compared with ECV at term, beginning ECV prior to 37 weeks' gestation decreases the number of infants in a non-cephalic presentation at birth. The purpose of this secondary analysis was to investigate factors associated with a successful ECV procedure and to present this in a clinically useful format. Data were collected as part of the Early ECV Pilot and Early ECV2 Trials, which randomized 1776 women with a fetus in breech presentation to either early ECV (34-36 weeks' gestation) or delayed ECV (at or after 37 weeks). The outcome of interest was successful ECV, defined as the fetus being in a cephalic presentation immediately following the procedure, as well as at the time of birth. The importance of several factors in predicting successful ECV was investigated using two statistical methods: logistic regression and classification and regression tree (CART) analyses. Among nulliparas, non-engagement of the presenting part and an easily palpable fetal head were independently associated with success. Among multiparas, non-engagement of the presenting part, gestation less than 37 weeks and an easily palpable fetal head were found to be independent predictors of success. These findings were consistent with results of the CART analyses. Regardless of parity, descent of the presenting part was the most discriminating factor in predicting successful ECV and cephalic presentation at birth. © 2017 Nordic Federation of Societies of Obstetrics and Gynecology.
Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.
2016-06-01
Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).
Sayegh, Arwa; Tate, James E.; Ropkins, Karl
2016-02-01
Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper
Yu, Huibin; Song, Yonghui; Liu, Ruixia; Pan, Hongwei; Xiang, Liancheng; Qian, Feng
2014-10-01
The stabilization of latent tracers of dissolved organic matter (DOM) of wastewater was analyzed by three-dimensional excitation-emission matrix (EEM) fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis (CART) in wastewater treatment performance. DOM of water samples collected from primary sedimentation, anaerobic, anoxic, oxic and secondary sedimentation tanks in a large-scale wastewater treatment plant contained four fluorescence components: tryptophan-like (C1), tyrosine-like (C2), microbial humic-like (C3) and fulvic-like (C4) materials extracted by self-organizing map. These components showed good positive linear correlations with dissolved organic carbon of DOM. C1 and C2 were representative components in the wastewater, and they were removed to a higher extent than those of C3 and C4 in the treatment process. C2 was a latent parameter determined by CART to differentiate water samples of oxic and secondary sedimentation tanks from the successive treatment units, indirectly proving that most of tyrosine-like material was degraded by anaerobic microorganisms. C1 was an accurate parameter to comprehensively separate the samples of the five treatment units from each other, indirectly indicating that tryptophan-like material was decomposed by anaerobic and aerobic bacteria. EEM fluorescence spectroscopy in combination with self-organizing map and CART analysis can be a nondestructive effective method for characterizing structural component of DOM fractions and monitoring organic matter removal in wastewater treatment process. Copyright © 2014 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
César Lavorenti
1990-01-01
Full Text Available O presente trabalho foi realizado com o objetivo de determinar a existência e as magnitudes de correlações e regressões lineares simples em plântulas jovens de seringueira (Hevea spp., para melhor condução de seleção nos futuros trabalhos de melhoramento. Foram utilizadas médias de produção de borracha seca por plântulas por corte, através do teste Hamaker-Morris-Mann (P; circunferência do caule (CC; espessura de casca (EC; número de anéis (NA; diâmetro dos vasos (DV; densidade dos vasos laticíleros (D e distância média entre anéis de vasos consecutivos (DMEAVC em um viveiro de cruzamento com três anos e meio de idade. Os resultados mostraram, entre outros fatores, que as correlações lineares simples de P com CC, EC, NA, D, DV e DMEAVC foram, respectivamente, r =t 0,61, 0,34, 0,28, 0,29, 0,43 e -0,13. As correlações de CC com EC, NA, D, DV e DMEAVC foram: 0,65, 0,22, 0,37, 0,33 e 0,096 respectivamente. Estudos de regressão linear simples de P com CC, EC, NA, DV, D e DMEAVC sugerem que CC foi o caráter independente mais significativo, contribuindo com 36% da variação em P. Em relação ao vigor, a regressão de CC com os respectivos caracteres sugere que EC foi o único caráter que contribuiu significativamente para a variação de CC com 42%. As altas correlações observadas da produção com circunferência do caule e com espessura de casca evidenciam a possibilidade de obter genótipos jovens de boa capacidade produtiva e grande vigor, através de seleção precoce dessas variáveis.This study was undertaken aiming to determine the existence of linear correlations, based on simple regression studies for a better improvement of young rubber tree (Hevea spp. breeding and selection. The characters studied were: yield of dry rubber per tapping by Hamaker-Morris-Mann test tapping (P, mean gurth (CC, bark thickness (EC, number of latex vessel rings (NA, diameter of latex vesseis (DV, density of latex vesseis per 5mm
A constructive approach to minimal free resolutions of path ideals of trees
Directory of Open Access Journals (Sweden)
Rachelle R. Bouchat
2017-01-01
Full Text Available For a rooted tree $\\Gamma$, we consider path ideals of $\\Gamma$, which are ideals that are generated by all directed paths of a fixed length in $\\Gamma$. In this paper, we provide a combinatorial description of the minimal free resolution of these path ideals. In particular, we provide a class of subforests of $\\Gamma$ that are in one-to-one correspondence with the multi-graded Betti numbers of the path ideal as well as providing a method for determining the projective dimension and the Castelnuovo-Mumford regularity of a given path ideal.
Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer
2017-09-26
Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .
Directory of Open Access Journals (Sweden)
Trefz Florian M
2012-12-01
Full Text Available Abstract Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration. Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l. However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed
Sequence embedding for fast construction of guide trees for multiple sequence alignment
LENUS (Irish Health Repository)
Blackshields, Gordon
2010-05-14
Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.
Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons.
Narzisi, Giuseppe; Mishra, Bud
2011-01-15
Mired by its connection to a well-known -complete combinatorial optimization problem-namely, the Shortest Common Superstring Problem (SCSP)-historically, the whole-genome sequence assembly (WGSA) problem has been assumed to be amenable only to greedy and heuristic methods. By placing efficiency as their first priority, these methods opted to rely only on local searches, and are thus inherently approximate, ambiguous or error prone, especially, for genomes with complex structures. Furthermore, since choice of the best heuristics depended critically on the properties of (e.g. errors in) the input data and the available long range information, these approaches hindered designing an error free WGSA pipeline. We dispense with the idea of limiting the solutions to just the approximated ones, and instead favor an approach that could potentially lead to an exhaustive (exponential-time) search of all possible layouts. Its computational complexity thus must be tamed through a constrained search (Branch-and-Bound) and quick identification and pruning of implausible overlays. For his purpose, such a method necessarily relies on a set of score functions (oracles) that can combine different structural properties (e.g. transitivity, coverage, physical maps, etc.). We give a detailed description of this novel assembly framework, referred to as Scoring-and-Unfolding Trimmed Tree Assembler (SUTTA), and present experimental results on several bacterial genomes using next-generation sequencing technology data. We also report experimental evidence that the assembly quality strongly depends on the choice of the minimum overlap parameter k. SUTTA's binaries are freely available to non-profit institutions for research and educational purposes at http://www.bioinformatics.nyu.edu.
Xiao, Hong; Lin, Xiao-ling; Dai, Xiang-yu; Gao, Li-dong; Chen, Bi-yun; Zhang, Xi-xing; Zhu, Pei-juan; Tian, Huai-yu
2012-05-01
To analyze the periodicity of pandemic influenza A (H1N1) in Changsha in year 2009 and its correlation with sensitive climatic factors. The information of 5439 cases of influenza A (H1N1) and synchronous meteorological data during the period between May 22th and December 31st in year 2009 (223 days in total) in Changsha city were collected. The classification and regression tree (CART) was employed to screen the sensitive climatic factors on influenza A (H1N1); meanwhile, cross wavelet transform and wavelet coherence analysis were applied to assess and compare the periodicity of the pandemic disease and its association with the time-lag phase features of the sensitive climatic factors. The results of CART indicated that the daily minimum temperature and daily absolute humidity were the sensitive climatic factors for the popularity of influenza A (H1N1) in Changsha. The peak of the incidence of influenza A (H1N1) was in the period between October and December (Median (M) = 44.00 cases per day), simultaneously the daily minimum temperature (M = 13°C) and daily absolute humidity (M = 6.69 g/m(3)) were relatively low. The results of wavelet analysis demonstrated that a period of 16 days was found in the epidemic threshold in Changsha, while the daily minimum temperature and daily absolute humidity were the relatively sensitive climatic factors. The number of daily reported patients was statistically relevant to the daily minimum temperature and daily absolute humidity. The frequency domain was mostly in the period of (16 ± 2) days. In the initial stage of the disease (from August 9th and September 8th), a 6-day lag was found between the incidence and the daily minimum temperature. In the peak period of the disease, the daily minimum temperature and daily absolute humidity were negatively relevant to the incidence of the disease. In the pandemic period, the incidence of influenza A (H1N1) showed periodic features; and the sensitive climatic factors did have a "driving
Directory of Open Access Journals (Sweden)
Sadegh Maleki
2013-11-01
Full Text Available The goal of this study was to present regression models for predicting resistance of joints made with screw and plywood members. Joint members were out of hardwood plywood that were 19 mm in thickness. Two types of screws including coarse and fine thread drywall screw with 3.5, 4 and 5mm in diameter and sheet metal screw with 4 and 5mm were used. Results have shown that withdrawal resistance of screw was increased by increasing of screws, diameter and penetrating depth. Joints fabricated with coarse thread drywall screws were higher than those of fine thread drywall screws. Finally, average joint withdrawal resistance of screwed could be predicted by means of the expressions Wc=2.127×D1.072×P0.520 for coarse thread drywall screws and Wf=1.377×D1.156×P0.581 for fine thread drywall screws by taking account the diameter and penetrating depth. The difference of the observed and predicted data showed that developed models have a good correlation with actual experimental measurements.
DEFF Research Database (Denmark)
Petersen, Mette Bisgaard; Tolver, Anders; Husted, Louise
2016-01-01
-off value of 7 mmol/L had a sensitivity of 0.66 and a specificity of 0.92 in predicting survival. In independent test data, the sensitivity was 0.69 and the specificity was 0.76. At the observed survival rate (38%), the optimal decision tree identified horses as non-survivors when the Lac at admission...... admitted with acute colitis (trees, as well as random...
Directory of Open Access Journals (Sweden)
Lucinda Pfalzer
2013-06-01
Full Text Available Background/Purpose: Over 1/3 of adults over age 65 experiences at least one fall each year. This pilot report uses a classification regression tree analysis (CART to model the outcomes for balance/risk of falls from the Gentiva® Safe Strides® Program (SSP. Methods/Outcomes: SSP is a home-based balance/fall prevention program designed to treat root causes of a patient
Czech Academy of Sciences Publication Activity Database
Hampl, V.; Pavlíček, Adam; Flegr, J.
2001-01-01
Roč. 51, - (2001), s. 731-735 ISSN 1466-5026 R&D Projects: GA MŠk VS96142 Grant - others:GA UK(XC) 107/1998 Keywords : FreeTree software * fingerprinting * Trichomonas Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 2.004, year: 2001
Spady, Richard; Stouli, Sami
2012-01-01
We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...
Darmayasa, J. B.; Wahyudin; Mulyana, T.
2018-01-01
Ethnomathematics may be the connecting bridge between culture and technology and arts. Therefore, the exploration of mathematics values that intersects with cultural anthropology should be significantly conducted. One case containing such issue is the construction of Traditional House of Saka Roras in Bali. Thus, this research aimed to explore the mathematic concept adopted in the construction of such traditional Bale (house) located in Songan Village, Kintamani, Bali. Specifically, this research also aimed to investigate the selection of linear regression coefficient for the saka (pillar) in the Bale. This research applied Embedded Mix-Method Design. Meanwhile, the data collection was conducted by interview, observation and measurement of pillars of 32 Bale Saka Roras. The result of this research revealed that the connection between the width and height of pillars was stated in the formula Y = 26,3 + 18,2X, where X acted as stimulus variable. The coefficient value amounted to 18.2 showed that most preceding architects in Songan Village were more likely to use 19 as the coefficient towards the pillar width than the other coefficients such as 17, 20 and 21 as mentioned in book/palm-leaf manuscript entitled Kosala-Kosali. The last but not least, the researchers also figured out that the pillar width depended on the length of the house-owner candidate’s index finger.
Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke
2015-01-01
Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly.
Leone, Robert Matthew
A search for vector-like quarks (VLQs) decaying to a Z boson using multi-stage machine learning was compared to a search using a standard square cuts search strategy. VLQs are predicted by several new theories beyond the Standard Model. The searches used 20.3 inverse femtobarns of proton-proton collisions at a center-of-mass energy of 8 TeV collected with the ATLAS detector in 2012 at the CERN Large Hadron Collider. CLs upper limits on production cross sections of vector-like top and bottom quarks were computed for VLQs produced singly or in pairs, Tsingle, Bsingle, Tpair, and Bpair. The two stage machine learning classification search strategy did not provide any improvement over the standard square cuts strategy, but for Tpair, Bpair, and Tsingle, a third stage of machine learning regression was able to lower the upper limits of high signal masses by as much as 50%. Additionally, new test statistics were developed for use in the Neyman construction of confidence regions in order to address deficiencies in c...
Lindsay K. Campbell
2014-01-01
In 2005-2006, bureaucrats at the New York City Department of Parks and Recreation (DPR) began to marshal quantitative evidence to argue for investment in tree planting as part of Mayor Bloomberg's long-term sustainability plan, PlaNYC 2030, launched in 2007. Concurrently, Bette Midlerthe celebrity founder of the non-profit New York Restoration Project (...
Smits, M.; Janssen, J.; Vet, de H.C.W.; Zwaan, L.; Timmermans, D.R.M.; Groenewegen, P.P.; Wagner, C.
2009-01-01
BACKGROUND: Root cause analysis is a method to examine causes of unintended events. PRISMA (Prevention and Recovery Information System for Monitoring and Analysis: is a root cause analysis tool. With PRISMA, events are described in causal trees and root causes are subsequently classified with the
Smits, M.; Janssen, J.; Vet, R. de; Zwaan, L.; Groenewegen, P.P.; Timmermans, D.
2009-01-01
Background. Root cause analysis is a method to examine causes of unintended events. PRISMA (Prevention and Recovery Information System for Monitoring and Analysis) is a root cause analysis tool. With PRISMA, events are described in causal trees and root causes are subsequently classified with the
Smits, M.; Janssen, J.; Vet, R. de; Zwaan, L.; Timmermans, D.; Groenewegen, P.; Wagner, C.
2009-01-01
Background: Root cause analysis is a method to examine causes of unintended events. PRISMA (Prevention and Recovery Information System for Monitoring and Analysis) is a root cause analysis tool. With PRISMA, events are described in causal trees and root causes are subsequently classified with the
2013-01-01
Background Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities. Methods Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation. Results Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs
International Nuclear Information System (INIS)
Haasl, D.F.; Roberts, N.H.; Vesely, W.E.; Goldberg, F.F.
1981-01-01
This handbook describes a methodology for reliability analysis of complex systems such as those which comprise the engineered safety features of nuclear power generating stations. After an initial overview of the available system analysis approaches, the handbook focuses on a description of the deductive method known as fault tree analysis. The following aspects of fault tree analysis are covered: basic concepts for fault tree analysis; basic elements of a fault tree; fault tree construction; probability, statistics, and Boolean algebra for the fault tree analyst; qualitative and quantitative fault tree evaluation techniques; and computer codes for fault tree evaluation. Also discussed are several example problems illustrating the basic concepts of fault tree construction and evaluation
Directory of Open Access Journals (Sweden)
Drzewiecki Wojciech
2016-12-01
Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.
Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.
2017-01-01
Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...
Directory of Open Access Journals (Sweden)
Robin Roj
2014-07-01
Full Text Available This paper presents three different search engines for the detection of CAD-parts in large databases. The analysis of the contained information is performed by the export of the data that is stored in the structure trees of the CAD-models. A preparation program generates one XML-file for every model, which in addition to including the data of the structure tree, also owns certain physical properties of each part. The first search engine is specializes in the discovery of standard parts, like screws or washers. The second program uses certain user input as search parameters, and therefore has the ability to perform personalized queries. The third one compares one given reference part with all parts in the database, and locates files that are identical, or similar to, the reference part. All approaches run automatically, and have the analysis of the structure tree in common. Files constructed with CATIA V5, and search engines written with Python have been used for the implementation. The paper also includes a short comparison of the advantages and disadvantages of each program, as well as a performance test.
DEFF Research Database (Denmark)
Finbow, Arthur; Frendrup, Allan; Vestergaard, Preben D.
cardinality then G is a total well dominated graph. In this paper we study composition and decomposition of total well dominated trees. By a reversible process we prove that any total well dominated tree can both be reduced to and constructed from a family of three small trees....
Alternative Methods of Regression
Birkes, David
2011-01-01
Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction
Weisburg, W. G.; Giovannoni, S. J.; Woese, C. R.
1989-01-01
Through comparative analysis of 16S ribosomal RNA sequences, it can be shown that two seemingly dissimilar types of eubacteria Deinococcus and the ubiquitous hot spring organism Thermus are distantly but specifically related to one another. This confirms an earlier report based upon 16S rRNA oligonucleotide cataloging studies (Hensel et al., 1986). Their two lineages form a distinctive grouping within the eubacteria that deserved the taxonomic status of a phylum. The (partial) sequence of T. aquaticus rRNA appears relatively close to those of other thermophilic eubacteria. e.g. Thermotoga maritima and Thermomicrobium roseum. However, this closeness does not reflect a true evolutionary closeness; rather it is due to a "thermophilic convergence", the result of unusually high G+C composition in the rRNAs of thermophilic bacteria. Unless such compositional biases are taken into account, the branching order and root of phylogenetic trees can be incorrectly inferred.
Directory of Open Access Journals (Sweden)
Javier Trujillano
2008-02-01
Full Text Available Objetivo: : Realizar una aproximación a la metodología de árboles de decisión tipo CART (Classification and Regression Trees desarrollando un modelo para calcular la probabilidad de muerte hospitalaria en infarto agudo de miocardio (IAM. Método: Se utiliza el conjunto mínimo básico de datos al alta hospitalaria (CMBD de Andalucía, Cataluña, Madrid y País Vasco de los años 2001 y 2002, que incluye los casos con IAM como diagnóstico principal. Los 33.203 pacientes se dividen aleatoriamente (70 y 30 % en grupo de desarrollo (GD = 23.277 y grupo de validación (GV = 9.926. Como CART se utiliza un modelo inductivo basado en el algoritmo de Breiman, con análisis de sensibilidad mediante el índice de Gini y sistema de validación cruzada. Se compara con un modelo de regresión logística (RL y una red neuronal artificial (RNA (multilayer perceptron. Los modelos desarrollados se contrastan en el GV y sus propiedades se comparan con el área bajo la curva ROC (ABC (intervalo de confianza del 95%. Resultados: En el GD el CART con ABC = 0,85 (0,86-0,88, RL 0,87 (0,86-0,88 y RNA 0,85 (0,85-0,86. En el GV el CART con ABC = 0,85 (0,85-0,88, RL 0,86 (0,85-0,88 y RNA 0,84 (0,83-0,86. Conclusiones: Los 3 modelos obtienen resultados similares en su capacidad de discriminación. El modelo CART ofrece como ventaja su simplicidad de uso y de interpretación, ya que las reglas de decisión que generan pueden aplicarse sin necesidad de procesos matemáticos.Objective: To provide an overview of decision trees based on CART (Classification and Regression Trees methodology. As an example, we developed a CART model intended to estimate the probability of intrahospital death from acute myocardial infarction (AMI. Method: We employed the minimum data set (MDS of Andalusia, Catalonia, Madrid and the Basque Country (2001-2002, which included 33,203 patients with a diagnosis of AMI. The 33,203 patients were randomly divided (70% and 30% into the development (DS
A bijection between phylogenetic trees and plane oriented recursive trees
Prodinger, Helmut
2017-01-01
Phylogenetic trees are binary nonplanar trees with labelled leaves, and plane oriented recursive trees are planar trees with an increasing labelling. Both families are enumerated by double factorials. A bijection is constructed, using the respective representations a 2-partitions and trapezoidal words.
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Directory of Open Access Journals (Sweden)
Abraham Pouliakis
2015-01-01
Full Text Available Objective. Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN in cases with ASCUS+ in cytology. Study Design. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Results. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. Conclusions. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study.
Pouliakis, Abraham; Karakitsou, Efrossyni; Chrelias, Charalampos; Pappas, Asimakis; Panayiotides, Ioannis; Valasoulis, George; Kyrgiou, Maria; Paraskevaidis, Evangelos; Karakitsos, Petros
2015-01-01
Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs) for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN) in cases with ASCUS+ in cytology. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study.
Directory of Open Access Journals (Sweden)
Santana Isabel
2011-08-01
Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Directory of Open Access Journals (Sweden)
Ziping WANG
2011-09-01
Full Text Available Background and objective It has been proven that gefitinib produces only 10%-20% tumor regression in heavily pretreated, unselected non-small cell lung cancer (NSCLC patients as the second- and third-line setting. Asian, female, nonsmokers and adenocarcinoma are favorable factors; however, it is difficult to find a patient satisfying all the above clinical characteristics. The aim of this study is to identify novel predicting factors, and to explore the interactions between clinical variables and their impact on the survival of Chinese patients with advanced NSCLC who were heavily treated with gefitinib in the second- or third-line setting. Methods The clinical and follow-up data of 127 advanced NSCLC patients referred to the Cancer Hospital & Institute, Chinese Academy of Medical Sciences from March 2005 to March 2010 were analyzed. Multivariate analysis of progression-free survival (PFS was performed using recursive partitioning, which is referred to as the classification and regression tree (CART analysis. Results The median PFS of 127 eligible consecutive advanced NSCLC patients was 8.0 months (95%CI: 5.8-10.2. CART was performed with an initial split on first-line chemotherapy outcomes and a second split on patients’ age. Three terminal subgroups were formed. The median PFS of the three subsets ranged from 1.0 month (95%CI: 0.8-1.2 for those with progressive disease outcome after the first-line chemotherapy subgroup, 10 months (95%CI: 7.0-13.0 in patients with a partial response or stable disease in first-line chemotherapy and age <70, and 22.0 months for patients obtaining a partial response or stable disease in first-line chemotherapy at age 70-81 (95%CI: 3.8-40.1. Conclusion Partial response, stable disease in first-line chemotherapy and age ≥ 70 are closely correlated with long-term survival treated by gefitinib as a second- or third-line setting in advanced NSCLC. CART can be used to identify previously unappreciated patient
Favre, Charles
2004-01-01
This volume is devoted to a beautiful object, called the valuative tree and designed as a powerful tool for the study of singularities in two complex dimensions. Its intricate yet manageable structure can be analyzed by both algebraic and geometric means. Many types of singularities, including those of curves, ideals, and plurisubharmonic functions, can be encoded in terms of positive measures on the valuative tree. The construction of these measures uses a natural tree Laplace operator of independent interest.
International Nuclear Information System (INIS)
Bass, L.; Wynholds, H.W.; Porterfield, W.R.
1975-01-01
Described is an operational system that enables the user, through an intelligent graphics terminal, to construct, modify, analyze, and store fault trees. With this system, complex engineering designs can be analyzed. This paper discusses the system and its capabilities. Included is a brief discussion of fault tree analysis, which represents an aspect of reliability and safety modeling
Tani, Yuji; Ogasawara, Katsuhiko
2012-01-01
This study aimed to contribute to the management of a healthcare organization by providing management information using time-series analysis of business data accumulated in the hospital information system, which has not been utilized thus far. In this study, we examined the performance of the prediction method using the auto-regressive integrated moving-average (ARIMA) model, using the business data obtained at the Radiology Department. We made the model using the data used for analysis, which was the number of radiological examinations in the past 9 years, and we predicted the number of radiological examinations in the last 1 year. Then, we compared the actual value with the forecast value. We were able to establish that the performance prediction method was simple and cost-effective by using free software. In addition, we were able to build the simple model by pre-processing the removal of trend components using the data. The difference between predicted values and actual values was 10%; however, it was more important to understand the chronological change rather than the individual time-series values. Furthermore, our method was highly versatile and adaptable compared to the general time-series data. Therefore, different healthcare organizations can use our method for the analysis and forecasting of their business data.
Productivity of the supply system based on whole-tree bundling
Energy Technology Data Exchange (ETDEWEB)
Laitila, J. (Finnish Forest Research Inst., Joensuu (Finland)), Email: juha.laitila@metla.fi; Jylhae, P. (Finnish Forest Research Inst., Kannus (Finland)), Email: paula.jylha@metla.fi; Kaerhae, K. (Metsaeteho Oy, Helsinki (Finland)), Email: kalle.karha@metsateho.fi
2009-07-01
In the present study, time consumption models for bundle harvesting and forwarding were created by applying regression analyses. The time studies related to on-road transportation were created by applying regression analyses. The time studies related to on-road transportation were focused on comparing the terminal times spent on handling of whole-tree bundles and conventional 5-m pulpwood. The number of whole-tree bundles per truck load and the weights of the payloads were also recorded. The forwarding productivity of whole-tree bundles was about double compared to conventional pulpwood and whole-trees. In on-road transportation, the mean loading and unloading time of whole-tree bundles per truck load was 46 % higher compared to that of conventional 5-m pulpwood. The second prototype of the bundle harvester is under construction, and the time studies are to be continued after accomplishing the machine in the autumn 2009. (orig.)
International Nuclear Information System (INIS)
Balasubramanian, K.
1982-01-01
A method is developed for obtaining the spectra of trees of NMR and chemical interests. The characteristic polynomials of branched trees can be obtained in terms of the characteristic polynomials of unbranched trees and branches by pruning the tree at the joints. The unbranched trees can also be broken down further until a tree containing just two vertices is obtained. The effectively reduces the order of the secular determinant of the tree used at the beginning to determinants of orders atmost equal to the number of vertices in the branch containing the largest number of vertices. An illustrative example of a NMR graph is given for which the 22 x 22 secular determinant is reduced to determinants of orders atmost 4 x 4 in just the second step of the algorithm. The tree pruning algorithm can be applied even to trees with no symmetry elements and such a factoring can be achieved. Methods developed here can be elegantly used to find if two trees are cospectral and to construct cospectral trees
Combining Alphas via Bounded Regression
Directory of Open Access Journals (Sweden)
Zura Kakushadze
2015-11-01
Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.
Rectilinear Full Steiner Tree Generation
DEFF Research Database (Denmark)
Zachariasen, Martin
1999-01-01
The fastest exact algorithm (in practice) for the rectilinear Steiner tree problem in the plane uses a two-phase scheme: First, a small but sufficient set of full Steiner trees (FSTs) is generated and then a Steiner minimum tree is constructed from this set by using simple backtrack search, dynamic...
Standards for Standardized Logistic Regression Coefficients
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Introduction to fault tree analysis
International Nuclear Information System (INIS)
Barlow, R.E.; Lambert, H.E.
1975-01-01
An elementary, engineering oriented introduction to fault tree analysis is presented. The basic concepts, techniques and applications of fault tree analysis, FTA, are described. The two major steps of FTA are identified as (1) the construction of the fault tree and (2) its evaluation. The evaluation of the fault tree can be qualitative or quantitative depending upon the scope, extensiveness and use of the analysis. The advantages, limitations and usefulness of FTA are discussed
Energy Technology Data Exchange (ETDEWEB)
Briand, B
2008-04-15
The objective of this thesis is the development of a method allowing the identification of factors leading to various radioactive contamination levels of the plants. The methodology suggested is based on the use of a radioecological transfer model of the radionuclides through the environment (A.S.T.R.A.L. computer code) and a classification-tree method. Particularly, to avoid the instability problems of classification trees and to preserve the tree structure, a node level stabilizing technique is used. Empirical comparisons are carried out between classification trees built by this method (called R.E.N. method) and those obtained by the C.A.R.T. method. A similarity measure is defined to compare the structure of two classification trees. This measure is used to study the stabilizing performance of the R.E.N. method. The methodology suggested is applied to a simplified contamination scenario. By the results obtained, we can identify the main variables responsible of the various radioactive contamination levels of four leafy-vegetables (lettuce, cabbage, spinach and leek). Some extracted rules from these classification trees can be usable in a post-accidental context. (author)
DEFF Research Database (Denmark)
Jaeger, Manfred
2006-01-01
We introduce type extension trees as a formal representation language for complex combinatorial features of relational data. Based on a very simple syntax this language provides a unified framework for expressing features as diverse as embedded subgraphs on the one hand, and marginal counts...... of attribute values on the other. We show by various examples how many existing relational data mining techniques can be expressed as the problem of constructing a type extension tree and a discriminant function....
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Differentiating regressed melanoma from regressed lichenoid keratosis.
Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A
2017-04-01
Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
ON REGRESSION REPRESENTATIONS OF STOCHASTIC-PROCESSES
RUSCHENDORF, L; DEVALK, [No Value
We construct a.s. nonlinear regression representations of general stochastic processes (X(n))n is-an-element-of N. As a consequence we obtain in particular special regression representations of Markov chains and of certain m-dependent sequences. For m-dependent sequences we obtain a constructive
Indian Academy of Sciences (India)
Flowering Trees. Boswellia serrata Roxb. ex Colebr. (Indian Frankincense tree) of Burseraceae is a large-sized deciduous tree that is native to India. Bark is thin, greenish-ash-coloured that exfoliates into smooth papery flakes. Stem exudes pinkish resin ... Fruit is a three-valved capsule. A green gum-resin exudes from the ...
Indian Academy of Sciences (India)
IAS Admin
Flowering Trees. Ailanthus excelsa Roxb. (INDIAN TREE OF. HEAVEN) of Simaroubaceae is a lofty tree with large pinnately compound alternate leaves, which are ... inflorescences, unisexual and greenish-yellow. Fruits are winged, wings many-nerved. Wood is used in making match sticks. 1. Male flower; 2. Female flower.
Indian Academy of Sciences (India)
Flowering Trees. Gyrocarpus americanus Jacq. (Helicopter Tree) of Hernandiaceae is a moderate size deciduous tree that grows to about 12 m in height with a smooth, shining, greenish-white bark. The leaves are ovate, rarely irregularly ... flowers which are unpleasant smelling. Fruit is a woody nut with two long thin wings.
Indian Academy of Sciences (India)
More Details Fulltext PDF. Volume 8 Issue 8 August 2003 pp 112-112 Flowering Trees. Zizyphus jujuba Lam. of Rhamnaceae · More Details Fulltext PDF. Volume 8 Issue 9 September 2003 pp 97-97 Flowering Trees. Moringa oleifera · More Details Fulltext PDF. Volume 8 Issue 10 October 2003 pp 100-100 Flowering Trees.
Directory of Open Access Journals (Sweden)
V. I. Polyakov
2014-10-01
Full Text Available In 2001 six permanent sample plots (PSP were established in forest stands differing in degrees of damage by pollution from the Norilsk industrial region. In 2004 the second forest inventory was carried out at these PSP for evaluation of pollutant impacts on stand condition changes. During both inventory procedures the vigor state of every tree was visually categorized according to 6-points scale of «Forest health regulations in Russian Federation». The changeover of tree into fall was also taken into account. Two types of Markov’s models simulating thinning process in tree stands within different ecological conditions has been developed: 1 based on assessment for probability of tree survival during three years; 2 in terms of evaluation of matrix for probability on change of vigor state category in the same period. The reconstruction of tree mortality from 1979 after industrial complex «Nadezda» setting into operation was realized on the basis of probability estimation of dead standing trees conservation during three years observed. The forecast of situation was carried out up to 2030. Using logistic regression the probability of tree survival was established depending on four factors: degree of tree damage by pollutants, tree species, stand location in relief and tree age. The acquired results make it possible to single out an impact of pollutants to tree stands’ resistance from other factors. There was revealed the percent of tree fall, resulted by pollution. The evaluation scale of SO2 gas resistance of tree species was constructed: birch, spruce, larch. Larch showed the highest percent of fall because of pollution.
Comparison of Greedy Algorithms for Decision Tree Optimization
Alkhalid, Abdulaziz; Chikalov, Igor; Moshkov, Mikhail
2013-01-01
This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values
Tree compression with top trees
DEFF Research Database (Denmark)
Bille, Philip; Gørtz, Inge Li; Landau, Gad M.
2013-01-01
We introduce a new compression scheme for labeled trees based on top trees [3]. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...
Tree compression with top trees
DEFF Research Database (Denmark)
Bille, Philip; Gørtz, Inge Li; Landau, Gad M.
2015-01-01
We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...
Encoding phylogenetic trees in terms of weighted quartets.
Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles
2008-04-01
One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
2017-10-11
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
QuickJoin—Fast Neighbour-Joining Tree Reconstruction
DEFF Research Database (Denmark)
Mailund; Pedersen, Christian N. Storm
2004-01-01
We have built a tool for fast construction of very large phylogenetic trees. The tool uses heuristics for speeding up the neighbour-joining algorithm—while still constructing the same tree as the original neighbour-joining algorithm—making it possible to construct trees for ~8000 species in less...
Recursive Trees for Practical ORAM
Directory of Open Access Journals (Sweden)
Moataz Tarik
2015-06-01
Full Text Available We present a new, general data structure that reduces the communication cost of recent tree-based ORAMs. Contrary to ORAM trees with constant height and path lengths, our new construction r-ORAM allows for trees with varying shorter path length. Accessing an element in the ORAM tree results in different communication costs depending on the location of the element. The main idea behind r-ORAM is a recursive ORAM tree structure, where nodes in the tree are roots of other trees. While this approach results in a worst-case access cost (tree height at most as any recent tree-based ORAM, we show that the average cost saving is around 35% for recent binary tree ORAMs. Besides reducing communication cost, r-ORAM also reduces storage overhead on the server by 4% to 20% depending on the ORAM’s client memory type. To prove r-ORAM’s soundness, we conduct a detailed overflow analysis. r-ORAM’s recursive approach is general in that it can be applied to all recent tree ORAMs, both constant and poly-log client memory ORAMs. Finally, we implement and benchmark r-ORAM in a practical setting to back up our theoretical claims.
Mathematical foundations of event trees
International Nuclear Information System (INIS)
Papazoglou, Ioannis A.
1998-01-01
A mathematical foundation from first principles of event trees is presented. The main objective of this formulation is to offer a formal basis for developing automated computer assisted construction techniques for event trees. The mathematical theory of event trees is based on the correspondence between the paths of the tree and the elements of the outcome space of a joint event. The concept of a basic cylinder set is introduced to describe joint event outcomes conditional on specific outcomes of basic events or unconditional on the outcome of basic events. The concept of outcome space partition is used to describe the minimum amount of information intended to be preserved by the event tree representation. These concepts form the basis for an algorithm for systematic search for and generation of the most compact (reduced) form of an event tree consistent with the minimum amount of information the tree should preserve. This mathematical foundation allows for the development of techniques for automated generation of event trees corresponding to joint events which are formally described through other types of graphical models. Such a technique has been developed for complex systems described by functional blocks and it is reported elsewhere. On the quantification issue of event trees, a formal definition of a probability space corresponding to the event tree outcomes is provided. Finally, a short discussion is offered on the relationship of the presented mathematical theory with the more general use of event trees in reliability analysis of dynamic systems
FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix
Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
2009-01-01
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor in...
López-Ochoa, L A; Apolinar-Hernández, M M; Peña-Ramírez, Y J
2015-02-20
The forest tree Spanish cedar (Cedrela odorata L.) is well-known for its high-value timber; however, this species is attacked by the shoot borer (Hypsipyla grandella) during its early years of development, resulting in branched stems and making the plants useless for high-quality wood production. The generation of resistant varieties expressing entomotoxic proteins may be an alternative to pesticide treatments. The use of plastid transformation rather than nuclear transformation should be used because it reduces the risk of transgene dissemination by pollen. Chloroplast transformation vectors require an expression cassette flanked by homologous plastid sequences to drive plastome recombination. Thus, C. odorata plastome sequences are a prerequisite. The rrn16-rrn23 plastome region was selected, cloned, and characterized. When the sequence identity among the rrn16-rrn23 regions from C. odorata and Nicotiana tabacum was compared, 3 inDels of 240, 104, and 39 bp were found that might severely affect transformation efficiency. Using this region, a new transformation vector was developed using pUC19 as a backbone by inserting the rrn16-trnI and trnA-rrn23 sequences from C. odorata and adding 2 independent expression cassettes into the trnI-trnA intergenic region, conferring spectinomycin resistance, the ability to express the gfp reporter gene, and a site that can be used to express any other gene of interest.
DEFF Research Database (Denmark)
Johansen, Søren
2008-01-01
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...
... Blog Vision Awards Common Allergens Tree Nut Allergy Tree Nut Allergy Learn about tree nut allergy, how ... a Tree Nut Label card . Allergic Reactions to Tree Nuts Tree nuts can cause a severe and ...
Regression filter for signal resolution
International Nuclear Information System (INIS)
Matthes, W.
1975-01-01
The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)
The stopping rules for winsorized tree
Ch'ng, Chee Keong; Mahat, Nor Idayu
2017-11-01
Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.
Indian Academy of Sciences (India)
medium-sized handsome tree with a straight bole that branches at the top. Leaves are once pinnate, with two to three pairs of leaflets. Young parts of the tree are velvety. Inflorescence is a branched raceme borne at the branch ends. Flowers are large, white, attractive, and fragrant. Corolla is funnel-shaped. Fruit is an ...
Indian Academy of Sciences (India)
Cassia siamia Lamk. (Siamese tree senna) of Caesalpiniaceae is a small or medium size handsome tree. Leaves are alternate, pinnately compound and glandular, upto 18 cm long with 8–12 pairs of leaflets. Inflorescence is axillary or terminal and branched. Flowering lasts for a long period from March to February. Fruit is ...
Indian Academy of Sciences (India)
Flowering Trees. Cerbera manghasL. (SEA MANGO) of Apocynaceae is a medium-sized evergreen coastal tree with milky latex. The bark is grey-brown, thick and ... Fruit is large. (5–10 cm long), oval containing two flattened seeds and resembles a mango, hence the name Mangas or. Manghas. Leaves and fruits contain ...
Indian Academy of Sciences (India)
user
Flowering Trees. Gliricidia sepium(Jacq.) Kunta ex Walp. (Quickstick) of Fabaceae is a small deciduous tree with. Pinnately compound leaves. Flower are prroduced in large number in early summer on terminal racemes. They are attractive, pinkish-white and typically like bean flowers. Fruit is a few-seeded flat pod.
Indian Academy of Sciences (India)
Flowering Trees. Acrocarpus fraxinifolius Wight & Arn. (PINK CEDAR, AUSTRALIAN ASH) of. Caesalpiniaceae is a lofty unarmed deciduous native tree that attains a height of 30–60m with buttresses. Bark is thin and light grey. Leaves are compound and bright red when young. Flowers in dense, erect, axillary racemes.
Tolman, Marvin
2005-01-01
Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…
DEFF Research Database (Denmark)
Halkjær From, Andreas; Schlichtkrull, Anders; Villadsen, Jørgen
2018-01-01
We formally prove in Isabelle/HOL two properties of an algorithm for laying out trees visually. The first property states that removing layout annotations recovers the original tree. The second property states that nodes are placed at least a unit of distance apart. We have yet to formalize three...
Indian Academy of Sciences (India)
Srimath
Grevillea robusta A. Cunn. ex R. Br. (Sil- ver Oak) of Proteaceae is a daintily lacy ornamental tree while young and growing into a mighty tree (45 m). Young shoots are silvery grey and the leaves are fern- like. Flowers are golden-yellow in one- sided racemes (10 cm). Fruit is a boat- shaped, woody follicle.
Development of tree hollows in pedunculate oak (Quercus robur)
Ranius, Thomas; Niklasson, Mats; Berg, Niclas
2009-01-01
Many invertebrates, birds and mammals are dependent on hollow trees. For landscape planning that aims at persistence of species inhabiting hollow trees it is crucial to understand the development of such trees. In this study we constructed an individual-based simulation model to predict diameter distribution and formation of hollows in oak tree populations. Based on tree-ring data from individual trees, we estimated the ages when hollow formation commences for pedunculate oak (Quercus robur) ...
Piedrahita, Ricardo A.
The Denver Aerosol Sources and Health study (DASH) was a long-term study of the relationship between the variability in fine particulate mass and chemical constituents (PM2.5, particulate matter less than 2.5mum) and adverse health effects such as cardio-respiratory illnesses and mortality. Daily filter samples were chemically analyzed for multiple species. We present findings based on 2.8 years of DASH data, from 2003 to 2005. Multilinear Engine 2 (ME-2), a receptor-based source apportionment model was applied to the data to estimate source contributions to PM2.5 mass concentrations. This study relied on two different ME-2 models: (1) a 2-way model that closely reflects PMF-2; and (2) an enhanced model with meteorological data that used additional temporal and meteorological factors. The Coarse Rural Urban Sources and Health study (CRUSH) is a long-term study of the relationship between the variability in coarse particulate mass (PMcoarse, particulate matter between 2.5 and 10mum) and adverse health effects such as cardio-respiratory illnesses, pre-term births, and mortality. Hourly mass concentrations of PMcoarse and fine particulate matter (PM2.5) are measured using tapered element oscillating microbalances (TEOMs) with Filter Dynamics Measurement Systems (FDMS), at two rural and two urban sites. We present findings based on nine months of mass concentration data, including temporal trends, and non-parametric regressions (NPR) results, which were used to characterize the wind speed and wind direction relationships that might point to sources. As part of CRUSH, 1-year coarse and fine mode particulate matter filter sampling network, will allow us to characterize the chemical composition of the particulate matter collected and perform spatial comparisons. This work describes the construction and validation testing of four dichotomous filter samplers for this purpose. The use of dichotomous splitters with an approximate 2.5mum cut point, coupled with a 10mum cut
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Energy Technology Data Exchange (ETDEWEB)
Fiorini, G.L., E-mail: gian-luigi.fiorini@orange.fr; Ammirabile, L., E-mail: luca.ammirabile@ec.europa.eu [European Commission - Joint Research Centre Institute for Energy and Transport (Belgium); Ranguelova, V., E-mail: vesselina.ranguelova@ec.europa.eu [European Commission - Joint Research Centre Headquarters, Brussels (Belgium)
2014-10-15
The design of the safety architecture of innovative as well as the assessment of existing nuclear systems needs to integrate the constraints related to the safety principles, requirements and objectives. Among these constraints, the compliance of the installation’s architecture with the principles of Defence in Depth (DiD), and its different levels, is certainly one of the most structuring. Defence in depth is the key to achieve safety robustness, thereby helping to ensure that nuclear systems do not exhibit any particularly dominant risk vulnerability. To help designers of innovative systems to correctly implement the defence-in-depth, or to assess how well the latter has been applied for existing reactor systems, the Objection-Provision Tree (OPT) methodology, which is part of the Integrated Safety Assessment Methodology (ISAM) promoted by the Generation IV Risk and Safety Working Group (GIF/RSWG), is suggested as a useful tool to complement the required traditional deterministic and probabilistic safety assessments. The document recalls the content of the OPT method and gives some details for its implementation, including for the systematic identification of the initiating events to be considered in designing the system. This step is essential especially for new systems for which there is no sufficient operational to support their design. The interactions with other tools (e.g. Failure Mode and Effect Analyses (FMEA) or ISAM Tools) are also commented. (author)
Baños, Hector; Bushek, Nathaniel; Davidson, Ruth; Gross, Elizabeth; Harris, Pamela E.; Krone, Robert; Long, Colby; Stewart, Allen; Walker, Robert
2016-01-01
We introduce the package PhylogeneticTrees for Macaulay2 which allows users to compute phylogenetic invariants for group-based tree models. We provide some background information on phylogenetic algebraic geometry and show how the package PhylogeneticTrees can be used to calculate a generating set for a phylogenetic ideal as well as a lower bound for its dimension. Finally, we show how methods within the package can be used to compute a generating set for the join of any two ideals.
TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.
Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald
2018-01-01
Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Ormeño, M. I.; Faúndez-Abans, M.; Cavada, G.
2003-08-01
A importância deste trabalho deve-se à seleção de objetos ainda não tratados particularmente como uma família e ao emprego de procedimento estatístico robusto que não precisa de pressupostos ou condições de contorno. Contribui, assim, ao melhor entendimento do cenário das Galáxias Aneladas do diagrama de Hubble via classificação e estudo de subclasses. Selecionaram-se 100 galáxias possuidoras de dois anéis do Catalog of Southern Ringed Galaxies compilado por Ronald Buta, de modo a construir uma amostra completa em termos de conhecimento dos semi-eixos dos anéis interno e externo projetados no plano do céu. Visando uma possível classificação destas galáxias aneladas normais em famílias de acordo com as características geométricas dos anéis, empregou-se primeiramente a Análise de Aglomerados (ferramenta de classificação: medições de semelhança em um espaço bidimensional) para explorar a possível existência de famílias. As variáveis analisadas foram: os diâmetros interiores menores d(I) e maiores D(I), os diâmetros exteriores menores d(E) e maiores D(E), e os ângulos de inclinação dos semi-eixos maiores interiores q(I) e exteriores q(E) dos anéis. Como metodologia de discriminação, empregou-se a construção de Árvores de Classificação. As árvores de classificação constituem um método de discriminação alternativo aos modelos clássicos, tais como a Análise Discriminante e a Regressão Logística, onde uma base de dados é dividida em partições (subgrupos) da árvore por ação de um predictor (variável específica). Os pacotes estatísticos utilizados para o processamento da informação foram: SAS versão 8.0 (Statistical Analisys System) e CART versão 3.6.3. Esta análise estatística sugere a existência de três possíveis famílias de galáxias bianeladas, com base apenas na geometria dos anéis. Como forma exploratória inicial deste resultado, a construção de um diagrama BT (magnitude total) versus o
Decision trees in epidemiological research
Directory of Open Access Journals (Sweden)
Ashwini Venkatasubramaniam
2017-09-01
Full Text Available Abstract Background In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART technique and the newer Conditional Inference tree (CTree technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...
DEFF Research Database (Denmark)
Appelt, Ane L; Rønde, Heidi S
2013-01-01
The photo shows a close-up of a Lichtenberg figure – popularly called an “electron tree” – produced in a cylinder of polymethyl methacrylate (PMMA). Electron trees are created by irradiating a suitable insulating material, in this case PMMA, with an intense high energy electron beam. Upon discharge......, during dielectric breakdown in the material, the electrons generate branching chains of fractures on leaving the PMMA, producing the tree pattern seen. To be able to create electron trees with a clinical linear accelerator, one needs to access the primary electron beam used for photon treatments. We...... appropriated a linac that was being decommissioned in our department and dismantled the head to circumvent the target and ion chambers. This is one of 24 electron trees produced before we had to stop the fun and allow the rest of the accelerator to be disassembled....
Indian Academy of Sciences (India)
Srimath
shaped corolla. Fruit is large, ellipsoidal, green with a hard and smooth shell containing numerous flattened seeds, which are embedded in fleshy pulp. Calabash tree is commonly grown in the tropical gardens of the world as a botanical oddity.
Directory of Open Access Journals (Sweden)
Hanna Louise Steyne
2013-09-01
Full Text Available Victorian London saw dramatic physical changes along the river Thames. Large enclosed Docks and Thames Embankments were constructed as the city struggled to cope with its ballooning population and prospering shipping industry. Whilst the Thames Embankments have been hailed as engineering triumphs, the fate of those whose livelihood relied on access to the river in central London (such as wharf workers, barge, ferry and lighter men, and others is unknown. In order to investigate the impact of the Embankment, a methodology has been developed which enables characterisation of a large swathe of urban riverside throughout the mid- to late 19th century, whilst also ensuring that the stories of individuals and communities are not lost. The approach combines and adapts established methodologies, such as Historic Landscape/Seascape Characterisation and Maritime Cultural Landscapes, to understand the nature and changes in the urban riverside landscape. This methodology forms the background for detailed research on smaller sites, such as a single street, housing block, or industrial site, in order to create ‘Ethnographies of Place’. These small-scale ‘Ethnographies’ have the potential to tell stories about how the social and economic circumstances of individuals and communities changed as a result of the landscape changes associated with the Embankment construction. This paper presents the initial work to establish the methodology and preliminary conclusions based on key sources.
Understanding logistic regression analysis
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Huang, D; Wu, W; Zhou, Y; Hu, Z; Lu, L
2004-05-01
Construction of single chromosomal DNA libraries by means of chromosome microdissection and microcloning will be useful for genomic research, especially for those species that have not been extensively studied genetically. Application of the technology of microdissection and microcloning to woody fruit plants has not been reported hitherto, largely due to the generally small sizes of metaphase chromosomes and the difficulty of chromosome preparation. The present study was performed to establish a method for single chromosome microdissection and microcloning in woody fruit species using pomelo as a model. The standard karyotype of a pomelo cultivar ( Citrus grandis cv. Guanxi) was established based on 20 prometaphase photomicrographs. According to the standard karyotype, chromosome 1 was identified and isolated with fine glass microneedles controlled by a micromanipulator. DNA fragments ranging from 0.3 kb to 2 kb were acquired from the isolated single chromosome 1 via two rounds of PCR mediated by Sau3A linker adaptors and then cloned into T-easy vectors to generate a DNA library of chromosome 1. Approximately 30,000 recombinant clones were obtained. Evaluation based on 108 randomly selected clones showed that the sizes of the cloned inserts varied from 0.5 kb to 1.5 kb with an average of 860 bp. Our research suggests that microdissection and microcloning of single small chromosomes in woody plants is feasible.
Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne
2017-01-01
Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and
On algorithm for building of optimal α-decision trees
Alkhalid, Abdulaziz; Chikalov, Igor; Moshkov, Mikhail
2010-01-01
The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic
Quantile Regression With Measurement Error
Wei, Ying
2009-08-27
Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.
A Metric on Phylogenetic Tree Shapes.
Colijn, C; Plazzotta, G
2018-01-01
The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees' branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Directory of Open Access Journals (Sweden)
Matthias Schmid
Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Weisberg, Sanford
2013-01-01
Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Hosmer, David W; Sturdivant, Rodney X
2013-01-01
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Indian Academy of Sciences (India)
deciduous tree with irregularly-shaped trunk, greyish-white scaly bark and milky latex. Leaves in opposite pairs are simple, oblong and whitish beneath. Flowers that occur in branched inflorescence are white, 2–. 3cm across and fragrant. Calyx is glandular inside. Petals bear numerous linear white scales, the corollary.
Indian Academy of Sciences (India)
Berrya cordifolia (Willd.) Burret (Syn. B. ammonilla Roxb.) – Trincomali Wood of Tiliaceae is a tall evergreen tree with straight trunk, smooth brownish-grey bark and simple broad leaves. Inflorescence is much branched with white flowers. Stamens are many with golden yellow anthers. Fruit is a capsule with six spreading ...
Indian Academy of Sciences (India)
Canthium parviflorum Lam. of Rubiaceae is a large shrub that often grows into a small tree with conspicuous spines. Leaves are simple, in pairs at each node and are shiny. Inflorescence is an axillary few-flowered cymose fascicle. Flowers are small (less than 1 cm across), 4-merous and greenish-white. Fruit is ellipsoid ...
Indian Academy of Sciences (India)
sriranga
Hook.f. ex Brandis (Yellow. Cadamba) of Rubiaceae is a large and handsome deciduous tree. Leaves are simple, large, orbicular, and drawn abruptly at the apex. Flowers are small, yellowish and aggregate into small spherical heads. The corolla is funnel-shaped with five stamens inserted at its mouth. Fruit is a capsule.
Indian Academy of Sciences (India)
Celtis tetrandra Roxb. of Ulmaceae is a moderately large handsome deciduous tree with green branchlets and grayish-brown bark. Leaves are simple with three to four secondary veins running parallel to the mid vein. Flowers are solitary, male, female and bisexual and inconspicuous. Fruit is berry-like, small and globose ...
Indian Academy of Sciences (India)
IAS Admin
Aglaia elaeagnoidea (A.Juss.) Benth. of Meliaceae is a small-sized evergreen tree of both moist and dry deciduous forests. The leaves are alternate and pinnately compound, terminating in a single leaflet. Leaflets are more or less elliptic with entire margin. Flowers are small on branched inflorescence. Fruit is a globose ...
Indian Academy of Sciences (India)
user
Flowers are borne on stiff bunches terminally on short shoots. They are 2-3 cm across, white, sweet-scented with light-brown hairy sepals and many stamens. Loquat fruits are round or pear-shaped, 3-5 cm long and are edible. A native of China, Loquat tree is grown in parks as an ornamental and also for its fruits.
Indian Academy of Sciences (India)
mid-sized slow-growing evergreen tree with spreading branches that form a dense crown. The bark is smooth, thick, dark and flakes off in large shreds. Leaves are thick, oblong, leathery and bright red when young. The female flowers are drooping and are larger than male flowers. Fruit is large, red in color and velvety.
Indian Academy of Sciences (India)
Andira inermis (wright) DC. , Dog Almond of Fabaceae is a handsome lofty evergreen tree. Leaves are alternate and pinnately compound with 4–7 pairs of leaflets. Flowers are fragrant and are borne on compact branched inflorescences. Fruit is ellipsoidal one-seeded drupe that is peculiar to members of this family.
Indian Academy of Sciences (India)
narrow towards base. Flowers are large and attrac- tive, but emit unpleasant foetid smell. They appear in small numbers on erect terminal clusters and open at night. Stamens are numerous, pink or white. Style is slender and long, terminating in a small stigma. Fruit is green, ovoid and indistinctly lobed. Flowering Trees.
Indian Academy of Sciences (India)
Muntingia calabura L. (Singapore cherry) of. Elaeocarpaceae is a medium size handsome ever- green tree. Leaves are simple and alternate with sticky hairs. Flowers are bisexual, bear numerous stamens, white in colour and arise in the leaf axils. Fruit is a berry, edible with several small seeds embedded in a fleshy pulp ...
Indian Academy of Sciences (India)
. Stamens are fused into a purple staminal tube that is toothed. Fruit is about 0.5 in. across, nearly globose, generally 5-seeded, green but yellow when ripe, quite smooth at first but wrinkled in drying, remaining long on the tree ajier ripening.
Mark J. Ambrose
2012-01-01
Tree mortality is a natural process in all forest ecosystems. However, extremely high mortality also can be an indicator of forest health issues. On a regional scale, high mortality levels may indicate widespread insect or disease problems. High mortality may also occur if a large proportion of the forest in a particular region is made up of older, senescent stands....
Indian Academy of Sciences (India)
Guaiacum officinale L. (LIGNUM-VITAE) of Zygophyllaceae is a dense-crowned, squat, knobbly, rough and twisted medium-sized ev- ergreen tree with mottled bark. The wood is very hard and resinous. Leaves are compound. The leaflets are smooth, leathery, ovate-ellipti- cal and appear in two pairs. Flowers (about 1.5.
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.
Directory of Open Access Journals (Sweden)
Paulo Eduardo Telles dos Santos
2010-06-01
Full Text Available By the assessment of ten technological traits of eucalypt wood for sawn timber and energy purposes,
it was developed a multivariate statistical procedure in order to determine the sequence of logs to be sampled, in such a way to represent all statistical variation contained within the tree and, accordingly, to establish the appropriate sampling intensity. In the present work, it was used a total of 40 logs from four trees of Eucalyptus grandis provenance Concórdia-SC aged 18 years. By using principal components regression analysis and stepwise selection techniques, it was showed that only two logs, corresponding to the first (0.05 m to 2.60 m and fourth (8.85 m to 11.40 m positions into the tree, contained 99.2 % of the total variation detected originally. In the case of adopting a single log, the recommendation was over the fourth log, which represented 97.5 % of the total
amount of the original variation. For the referred population, the statistical procedure contributed substantially to reduce the high time-consuming and financial costs that are normally associated to studies oriented to this goal, without affecting the original statistical information exhibited by the whole group of logs that would be usually sampled.A partir da avaliação de dez características tecnológicas de madeira de eucalipto para fins de serraria e energia, desenvolveu-se procedimento estatístico multivariado para se determinar a seqüência de toras a ser amostrada, de forma a representar acumuladamente toda a variação estatística presente na árvore e, com isso, estabelecer a intensidade adequada de amostragem. Neste estudo, foram utilizadas 40 toras oriundas de quatro árvores de Eucalyptus grandis aos 18 anos de idade procedentes de Concórdia, SC. Com o uso de técnicas de regressão multivariada de componentes principais e seleção por etapas, chegou-se à conclusão que amostrandose apenas duas toras, correspondentes à primeira (0,05 m a 2
Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A
2018-04-29
Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.
Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
2013-01-01
Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as
Modelling tree biomasses in Finland
Energy Technology Data Exchange (ETDEWEB)
Repola, J.
2013-06-01
Biomass equations for above- and below-ground tree components of Scots pine (Pinus sylvestris L), Norway spruce (Picea abies [L.] Karst) and birch (Betula pendula Roth and Betula pubescens Ehrh.) were compiled using empirical material from a total of 102 stands. These stands (44 Scots pine, 34 Norway spruce and 24 birch stands) were located mainly on mineral soil sites representing a large part of Finland. The biomass models were based on data measured from 1648 sample trees, comprising 908 pine, 613 spruce and 127 birch trees. Biomass equations were derived for the total above-ground biomass and for the individual tree components: stem wood, stem bark, living and dead branches, needles, stump, and roots, as dependent variables. Three multivariate models with different numbers of independent variables for above-ground biomass and one for below-ground biomass were constructed. Variables that are normally measured in forest inventories were used as independent variables. The simplest model formulations, multivariate models (1) were mainly based on tree diameter and height as independent variables. In more elaborated multivariate models, (2) and (3), additional commonly measured tree variables such as age, crown length, bark thickness and radial growth rate were added. Tree biomass modelling includes consecutive phases, which cause unreliability in the prediction of biomass. First, biomasses of sample trees should be determined reliably to decrease the statistical errors caused by sub-sampling. In this study, methods to improve the accuracy of stem biomass estimates of the sample trees were developed. In addition, the reliability of the method applied to estimate sample-tree crown biomass was tested, and no systematic error was detected. Second, the whole information content of data should be utilized in order to achieve reliable parameter estimates and applicable and flexible model structure. In the modelling approach, the basic assumption was that the biomasses of
Directory of Open Access Journals (Sweden)
Mok Tik
2014-06-01
Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Tools for valuing tree and park services
E.G. McPherson
2010-01-01
Arborists and urban foresters plan, design, construct, and manage trees and parks in cities throughout the world. These civic improvements create walkable, cool environments, save energy, reduce stormwater runoff, sequester carbon dioxide, and absorb air pollutants. The presence of trees and green spaces in cities is associated with increases in property values,...
Forecasting exchange rates: a robust regression approach
Preminger, Arie; Franck, Raphael
2005-01-01
The least squares estimation method as well as other ordinary estimation method for regression models can be severely affected by a small number of outliers, thus providing poor out-of-sample forecasts. This paper suggests a robust regression approach, based on the S-estimation method, to construct forecasting models that are less sensitive to data contamination by outliers. A robust linear autoregressive (RAR) and a robust neural network (RNN) models are estimated to study the predictabil...
Multicollinearity and Regression Analysis
Daoud, Jamal I.
2017-12-01
In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.
Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei
2014-01-01
Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone.
Extensions and applications of ensemble-of-trees methods in machine learning
Bleich, Justin
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of
DEFF Research Database (Denmark)
Bache, Stefan Holst
A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....
DEFF Research Database (Denmark)
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
2017-01-01
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....
Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix
N. Price, Morgan
2009-01-01
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor i...
A Distributed Spanning Tree Algorithm
DEFF Research Database (Denmark)
Johansen, Karl Erik; Jørgensen, Ulla Lundin; Nielsen, Sven Hauge
We present a distributed algorithm for constructing a spanning tree for connected undirected graphs. Nodes correspond to processors and edges correspond to two-way channels. Each processor has initially a distinct identity and all processors perform the same algorithm. Computation as well...
A distributed spanning tree algorithm
DEFF Research Database (Denmark)
Johansen, Karl Erik; Jørgensen, Ulla Lundin; Nielsen, Svend Hauge
1988-01-01
We present a distributed algorithm for constructing a spanning tree for connected undirected graphs. Nodes correspond to processors and edges correspond to two way channels. Each processor has initially a distinct identity and all processors perform the same algorithm. Computation as well as comm...
Surface tree languages and parallel derivation trees
Engelfriet, Joost
1976-01-01
The surface tree languages obtained by top-down finite state transformation of monadic trees are exactly the frontier-preserving homomorphic images of sets of derivation trees of ETOL systems. The corresponding class of tree transformation languages is therefore equal to the class of ETOL languages.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Ritz, Christian; Parmigiani, Giovanni
2009-01-01
R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.
Bayesian ARTMAP for regression.
Sasu, L M; Andonie, R
2013-10-01
Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.
Bounded Gaussian process regression
DEFF Research Database (Denmark)
Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan
2013-01-01
We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....
and Multinomial Logistic Regression
African Journals Online (AJOL)
This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Regression of environmental noise in LIGO data
International Nuclear Information System (INIS)
Tiwari, V; Klimenko, S; Mitselmakher, G; Necula, V; Drago, M; Prodi, G; Frolov, V; Yakushin, I; Re, V; Salemi, F; Vedovato, G
2015-01-01
We address the problem of noise regression in the output of gravitational-wave (GW) interferometers, using data from the physical environmental monitors (PEM). The objective of the regression analysis is to predict environmental noise in the GW channel from the PEM measurements. One of the most promising regression methods is based on the construction of Wiener–Kolmogorov (WK) filters. Using this method, the seismic noise cancellation from the LIGO GW channel has already been performed. In the presented approach the WK method has been extended, incorporating banks of Wiener filters in the time–frequency domain, multi-channel analysis and regulation schemes, which greatly enhance the versatility of the regression analysis. Also we present the first results on regression of the bi-coherent noise in the LIGO data. (paper)
Log and tree sawing times for hardwood mills
Everette D. Rast
1974-01-01
Data on 6,850 logs and 1,181 trees were analyzed to predict sawing times. For both logs and trees, regression equations were derived that express (in minutes) sawing time per log or tree and per Mbf. For trees, merchantable height is expressed in number of logs as well as in feet. One of the major uses for the tables of average sawing times is as a bench mark against...
Khina, Anatoly
2016-08-15
We consider the problem of stabilizing an unstable plant driven by bounded noise over a digital noisy communication link, a scenario at the heart of networked control. To stabilize such a plant, one needs real-time encoding and decoding with an error probability profile that decays exponentially with the decoding delay. The works of Schulman and Sahai over the past two decades have developed the notions of tree codes and anytime capacity, and provided the theoretical framework for studying such problems. Nonetheless, there has been little practical progress in this area due to the absence of explicit constructions of tree codes with efficient encoding and decoding algorithms. Recently, linear time-invariant tree codes were proposed to achieve the desired result under maximum-likelihood decoding. In this work, we take one more step towards practicality, by showing that these codes can be efficiently decoded using sequential decoding algorithms, up to some loss in performance (and with some practical complexity caveats). We supplement our theoretical results with numerical simulations that demonstrate the effectiveness of the decoder in a control system setting.
E.G. McPherson; F. Ferrini
2010-01-01
We know that âtrees are good,â and most people believe this to be true. But if this is so, why are so many trees neglected, and so many tree wells empty? An individualâs attitude toward trees may result from their firsthand encounters with specific trees. Understanding how attitudes about trees are shaped, particularly aversion to trees, is critical to the business of...
Ridge Regression Signal Processing
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Subset selection in regression
Miller, Alan
2002-01-01
Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...
Better Autologistic Regression
Directory of Open Access Journals (Sweden)
Mark A. Wolters
2017-11-01
Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.
Regression in organizational leadership.
Kernberg, O F
1979-02-01
The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.
Automated Generation of Attack Trees
DEFF Research Database (Denmark)
Vigo, Roberto; Nielson, Flemming; Nielson, Hanne Riis
2014-01-01
Attack trees are widely used to represent threat scenarios in a succinct and intuitive manner, suitable for conveying security information to non-experts. The manual construction of such objects relies on the creativity and experience of specialists, and therefore it is error-prone and impractica......Attack trees are widely used to represent threat scenarios in a succinct and intuitive manner, suitable for conveying security information to non-experts. The manual construction of such objects relies on the creativity and experience of specialists, and therefore it is error......-prone and impracticable for large systems. Nonetheless, the automated generation of attack trees has only been explored in connection to computer networks and levering rich models, whose analysis typically leads to an exponential blow-up of the state space. We propose a static analysis approach where attack trees...... are automatically inferred from a process algebraic specification in a syntax-directed fashion, encompassing a great many application domains and avoiding incurring systematically an exponential explosion. Moreover, we show how the standard propositional denotation of an attack tree can be used to phrase...
Toward a theory of statistical tree-shape analysis
DEFF Research Database (Denmark)
Feragen, Aasa; Lo, Pechin Chien Pau; de Bruijne, Marleen
2013-01-01
In order to develop statistical methods for shapes with a tree-structure, we construct a shape space framework for tree-shapes and study metrics on the shape space. This shape space has singularities, which correspond to topological transitions in the represented trees. We study two closely relat...
MHV Vertices And Tree Amplitudes In Gauge Theory
International Nuclear Information System (INIS)
Cachazo, Freddy; Svrcek, Peter; Witten, Edward
2004-01-01
As an alternative to the usual Feynman graphs, tree amplitudes in Yang-Mills theory can be constructed from tree graphs in which the vertices are tree level MHV scattering amplitudes, continued off shell in a particular fashion. The formalism leads to new and relatively simple formulas for many amplitudes, and can be heuristically derived from twistor space. (author)
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Measurement Error in Education and Growth Regressions
Portela, Miguel; Alessie, Rob; Teulings, Coen
2010-01-01
The use of the perpetual inventory method for the construction of education data per country leads to systematic measurement error. This paper analyzes its effect on growth regressions. We suggest a methodology for correcting this error. The standard attenuation bias suggests that using these
Measurement Error in Education and Growth Regressions
Portela, M.; Teulings, C.N.; Alessie, R.
The perpetual inventory method used for the construction of education data per country leads to systematic measurement error. This paper analyses the effect of this measurement error on GDP regressions. There is a systematic difference in the education level between census data and observations
Measurement error in education and growth regressions
Portela, Miguel; Teulings, Coen; Alessie, R.
2004-01-01
The perpetual inventory method used for the construction of education data per country leads to systematic measurement error. This paper analyses the effect of this measurement error on GDP regressions. There is a systematic difference in the education level between census data and observations
DEFF Research Database (Denmark)
Bahr, Patrick
2012-01-01
Tree automata are traditionally used to study properties of tree languages and tree transformations. In this paper, we consider tree automata as the basis for modular and extensible recursion schemes. We show, using well-known techniques, how to derive from standard tree automata highly modular...
David J. Nowak; Jeffrey T. Walton; James Baldwin; Jerry. Bond
2015-01-01
Information on street trees is critical for management of this important resource. Sampling of street tree populations provides an efficient means to obtain street tree population information. Long-term repeat measures of street tree samples supply additional information on street tree changes and can be used to report damages from catastrophic events. Analyses of...
Steganalysis using logistic regression
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
SEPARATION PHENOMENA LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Ikaro Daniel de Carvalho Barreto
2014-03-01
Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.
DEFF Research Database (Denmark)
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
2017-01-01
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...
Adaptive metric kernel regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
2000-01-01
Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Adaptive Metric Kernel Regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
1998-01-01
Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Fuzzy tree automata and syntactic pattern recognition.
Lee, E T
1982-04-01
An approach of representing patterns by trees and processing these trees by fuzzy tree automata is described. Fuzzy tree automata are defined and investigated. The results include that the class of fuzzy root-to-frontier recognizable ¿-trees is closed under intersection, union, and complementation. Thus, the class of fuzzy root-to-frontier recognizable ¿-trees forms a Boolean algebra. Fuzzy tree automata are applied to processing fuzzy tree representation of patterns based on syntactic pattern recognition. The grade of acceptance is defined and investigated. Quantitative measures of ``approximate isosceles triangle,'' ``approximate elongated isosceles triangle,'' ``approximate rectangle,'' and ``approximate cross'' are defined and used in the illustrative examples of this approach. By using these quantitative measures, a house, a house with high roof, and a church are also presented as illustrative examples. In addition, three fuzzy tree automata are constructed which have the capability of processing the fuzzy tree representations of ``fuzzy houses,'' ``houses with high roofs,'' and ``fuzzy churches,'' respectively. The results may have useful applications in pattern recognition, image processing, artificial intelligence, pattern database design and processing, image science, and pictorial information systems.
O'Reilly, Joseph E; Donoghue, Philip C J
2018-03-01
Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.
Tree dimension in verification of constrained Horn clauses
DEFF Research Database (Denmark)
Kafle, Bishoksan; Gallagher, John Patrick; Ganty, Pierre
2018-01-01
In this paper, we show how the notion of tree dimension can be used in the verification of constrained Horn clauses (CHCs). The dimension of a tree is a numerical measure of its branching complexity and the concept here applies to Horn clause derivation trees. Derivation trees of dimension zero c...... algorithms using these constructions to decompose a CHC verification problem. One variation of this decomposition considers derivations of successively increasing dimension. The paper includes descriptions of implementations and experimental results....
Allegheny County / City of Pittsburgh / Western PA Regional Data Center — Trees cared for and managed by the City of Pittsburgh Department of Public Works Forestry Division. Tree Benefits are calculated using the National Tree Benefit...
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
Directory of Open Access Journals (Sweden)
Iliev Iliycho
2018-01-01
Full Text Available Subject of investigation is a new high-powered strontium bromide (SrBr2 vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Integrated fault tree development environment
International Nuclear Information System (INIS)
Dixon, B.W.
1986-01-01
Probabilistic Risk Assessment (PRA) techniques are utilized in the nuclear industry to perform safety analyses of complex defense-in-depth systems. A major effort in PRA development is fault tree construction. The Integrated Fault Tree Environment (IFTREE) is an interactive, graphics-based tool for fault tree design. IFTREE provides integrated building, editing, and analysis features on a personal workstation. The design philosophy of IFTREE is presented, and the interface is described. IFTREE utilizes a unique rule-based solution algorithm founded in artificial intelligence (AI) techniques. The impact of the AI approach on the program design is stressed. IFTREE has been developed to handle the design and maintenance of full-size living PRAs and is currently in use
DEFF Research Database (Denmark)
Hansen, Henrik; Tarp, Finn
2001-01-01
This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy....... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes.......This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy...
Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.
Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine
2017-04-01
In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.
Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix
Energy Technology Data Exchange (ETDEWEB)
N. Price, Morgan; S. Dehal, Paramvir; P. Arkin, Adam
2009-07-31
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
Analyzing and synthesizing phylogenies using tree alignment graphs.
Directory of Open Access Journals (Sweden)
Stephen A Smith
Full Text Available Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG. The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees, we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to
Analyzing and synthesizing phylogenies using tree alignment graphs.
Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E
2013-01-01
Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
MeshTree: A Delay optimised Overlay Multicast Tree Building Protocol
Tan, Su-Wei; Waters, A. Gill; Crawford, John
2005-01-01
We study decentralised low delay degree-constrained overlay multicast tree construction for single source real-time applications. This optimisation problem is NP-hard even if computed centrally. We identify two problems in traditional distributed solutions, namely the greedy problem and delay-cost trade-off. By offering solutions to these problems, we propose a new self-organising distributed tree building protocol called MeshTree. The main idea is to embed the delivery tree in a degree-bound...
A method to study response of large trees to different amounts of available soil water
D.H. Marx; Shi-Jean S. Sung; J.S. Cunningham; M.D. Thompson; L.M. White
1995-01-01
A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25% less thrufall, normal thrufall, or 25% additional thrufall.Undercanopy construction in these plots moderately...
Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun
2016-07-01
In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Categorizing ideas about trees: a tree of trees.
Fisler, Marie; Lecointre, Guillaume
2013-01-01
The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a "tree of trees." Then, we categorize schools of tree-representations. Classical schools like "cladists" and "pheneticists" are recovered but others are not: "gradists" are separated into two blocks, one of them being called here "grade theoreticians." We propose new interesting categories like the "buffonian school," the "metaphoricians," and those using "strictly genealogical classifications." We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization.
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
E. Gregory McPherson; Paula J. Peper
2012-01-01
This paper describes three long-term tree growth studies conducted to evaluate tree performance because repeated measurements of the same trees produce critical data for growth model calibration and validation. Several empirical and process-based approaches to modeling tree growth are reviewed. Modeling is more advanced in the fields of forestry and...
Kevin T. Smith
2009-01-01
Landscape trees have real value and contribute to making livable communities. Making the most of that value requires providing trees with the proper care and attention. As potentially large and long-lived organisms, trees benefit from commitment to regular care that respects the natural tree system. This system captures, transforms, and uses energy to survive, grow,...
Regularized Label Relaxation Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu
2018-04-01
Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.
Confidence bands for inverse regression models
International Nuclear Information System (INIS)
Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo
2010-01-01
We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract
Validating MEDIQUAL Constructs
Lee, Sang-Gun; Min, Jae H.
In this paper, we validate MEDIQUAL constructs through the different media users in help desk service. In previous research, only two end-users' constructs were used: assurance and responsiveness. In this paper, we extend MEDIQUAL constructs to include reliability, empathy, assurance, tangibles, and responsiveness, which are based on the SERVQUAL theory. The results suggest that: 1) five MEDIQUAL constructs are validated through the factor analysis. That is, importance of the constructs have relatively high correlations between measures of the same construct using different methods and low correlations between measures of the constructs that are expected to differ; and 2) five MEDIQUAL constructs are statistically significant on media users' satisfaction in help desk service by regression analysis.
Polynomial regression analysis and significance test of the regression function
International Nuclear Information System (INIS)
Gao Zhengming; Zhao Juan; He Shengping
2012-01-01
In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
Live phylogeny with polytomies: Finding the most compact parsimonious trees.
Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G
2017-08-01
Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Algorithms for optimal dyadic decision trees
Energy Technology Data Exchange (ETDEWEB)
Hush, Don [Los Alamos National Laboratory; Porter, Reid [Los Alamos National Laboratory
2009-01-01
A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.
Recursive Algorithm For Linear Regression
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Cafts: computer aided fault tree analysis
International Nuclear Information System (INIS)
Poucet, A.
1985-01-01
The fault tree technique has become a standard tool for the analysis of safety and reliability of complex system. In spite of the costs, which may be high for a complete and detailed analysis of a complex plant, the fault tree technique is popular and its benefits are fully recognized. Due to this applications of these codes have mostly been restricted to simple academic examples and rarely concern complex, real world systems. In this paper an interactive approach to fault tree construction is presented. The aim is not to replace the analyst, but to offer him an intelligent tool which can assist him in modeling complex systems. Using the CAFTS-method, the analyst interactively constructs a fault tree in two phases: (1) In a first phase he generates an overall failure logic structure of the system; the macrofault tree. In this phase, CAFTS features an expert system approach to assist the analyst. It makes use of a knowledge base containing generic rules on the behavior of subsystems and components; (2) In a second phase the macrofault tree is further refined and transformed in a fully detailed and quantified fault tree. In this phase a library of plant-specific component failure models is used
Reconciliation of Gene and Species Trees
Directory of Open Access Journals (Sweden)
L. Y. Rusin
2014-01-01
Full Text Available The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi
2014-01-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both
On algorithm for building of optimal α-decision trees
Alkhalid, Abdulaziz
2010-01-01
The paper describes an algorithm that constructs approximate decision trees (α-decision trees), which are optimal relatively to one of the following complexity measures: depth, total path length or number of nodes. The algorithm uses dynamic programming and extends methods described in [4] to constructing approximate decision trees. Adjustable approximation rate allows controlling algorithm complexity. The algorithm is applied to build optimal α-decision trees for two data sets from UCI Machine Learning Repository [1]. © 2010 Springer-Verlag Berlin Heidelberg.
Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz
2005-01-01
We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...
On the book thickness of $k$-trees
Dujmović, Vida; Wood, David R.
2009-01-01
Graphs and Algorithms; International audience; Every k-tree has book thickness at most k + 1, and this bound is best possible for all k \\textgreater= 3. Vandenbussche et al. [SIAM J. Discrete Math., 2009] proved that every k-tree that has a smooth degree-3 tree decomposition with width k has book thickness at most k. We prove this result is best possible for k \\textgreater= 4, by constructing a k-tree with book thickness k + 1 that has a smooth degree-4 tree decomposition with width k. This s...
Mary Torsello; Toni McLellan
The goals of hazard tree management programs are to maximize public safety and maintain a healthy sustainable tree resource. Although hazard tree management frequently targets removal of trees or parts of trees that attract wildlife, it can take into account a diversity of tree values. With just a little extra planning, hazard tree management can be highly beneficial...
Transferability of decision trees for land cover classification in a ...
African Journals Online (AJOL)
This paper attempts to derive classification rules from training data of four Landsat-8 scenes by using the classification and regression tree (CART) implementation of the decision tree algorithm. The transferability of the ruleset was evaluated by classifying two adjacent scenes. The classification of the four mosaicked scenes ...
Developing Models to Forcast Sales of Natural Christmas Trees
Lawrence D. Garrett; Thomas H. Pendleton
1977-01-01
A study of practices for marketing Christmas trees in Winston-Salem, North Carolina, and Denver, Colorado, revealed that such factors as retail lot competition, tree price, consumer traffic, and consumer income were very important in determining a particular retailer's sales. Analyses of 4 years of market data were used in developing regression models for...
Modeling Caribbean tree stem diameters from tree height and crown width measurements
Thomas Brandeis; KaDonna Randolph; Mike Strub
2009-01-01
Regression models to predict diameter at breast height (DBH) as a function of tree height and maximum crown radius were developed for Caribbean forests based on data collected by the U.S. Forest Service in the Commonwealth of Puerto Rico and Territory of the U.S. Virgin Islands. The model predicting DBH from tree height fit reasonably well (R2 = 0.7110), with...
Regression in autistic spectrum disorders.
Stefanatos, Gerry A
2008-12-01
A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.
DEFF Research Database (Denmark)
Baumbach, Jan; Guo, Jiong; Ibragimov, Rashid
2015-01-01
We study the tree edit distance problem with edge deletions and edge insertions as edit operations. We reformulate a special case of this problem as Covering Tree with Stars (CTS): given a tree T and a set of stars, can we connect the stars in by adding edges between them such that the resulting...... tree is isomorphic to T? We prove that in the general setting, CST is NP-complete, which implies that the tree edit distance considered here is also NP-hard, even when both input trees having diameters bounded by 10. We also show that, when the number of distinct stars is bounded by a constant k, CTS...
Linear regression in astronomy. I
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie
2017-10-01
By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.
2011-03-01
To minimize the severity of run-off-road collisions of vehicles with trees, departments of transportation (DOTs) : commonly establish clear zones for trees and other fixed objects. Caltrans clear zone on freeways is 30 feet : minimum (40 feet pref...
Buntine, Wray
1994-01-01
IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.
Linear regression in astronomy. II
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
2017-03-01
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
Minnesota's Forest Trees. Revised.
Miles, William R.; Fuller, Bruce L.
This bulletin describes 46 of the more common trees found in Minnesota's forests and windbreaks. The bulletin contains two tree keys, a summer key and a winter key, to help the reader identify these trees. Besides the two keys, the bulletin includes an introduction, instructions for key use, illustrations of leaf characteristics and twig…
DEFF Research Database (Denmark)
Brodal, Gerth Stølting; Sioutas, Spyros; Pantazos, Kostas
2015-01-01
We present a new overlay, called the Deterministic Decentralized tree (D2-tree). The D2-tree compares favorably to other overlays for the following reasons: (a) it provides matching and better complexities, which are deterministic for the supported operations; (b) the management of nodes (peers...
DEFF Research Database (Denmark)
Baumbach, Jan; Guo, Jian-Ying; Ibragimov, Rashid
2013-01-01
We study the tree edit distance problem with edge deletions and edge insertions as edit operations. We reformulate a special case of this problem as Covering Tree with Stars (CTS): given a tree T and a set of stars, can we connect the stars in by adding edges between them such that the resulting ...
Sweeney, Debra; Rounds, Judy
2011-01-01
Trees are great inspiration for artists. Many art teachers find themselves inspired and maybe somewhat obsessed with the natural beauty and elegance of the lofty tree, and how it changes through the seasons. One such tree that grows in several regions and always looks magnificent, regardless of the time of year, is the birch. In this article, the…
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
Cheng, Ching-Wu; Leu, Sou-Sen; Cheng, Ying-Mei; Wu, Tsung-Chih; Lin, Chen-Chung
2012-09-01
Construction accident research involves the systematic sorting, classification, and encoding of comprehensive databases of injuries and fatalities. The present study explores the causes and distribution of occupational accidents in the Taiwan construction industry by analyzing such a database using the data mining method known as classification and regression tree (CART). Utilizing a database of 1542 accident cases during the period 2000-2009, the study seeks to establish potential cause-and-effect relationships regarding serious occupational accidents in the industry. The results of this study show that the occurrence rules for falls and collapses in both public and private project construction industries serve as key factors to predict the occurrence of occupational injuries. The results of the study provide a framework for improving the safety practices and training programs that are essential to protecting construction workers from occasional or unexpected accidents. Copyright © 2011 Elsevier Ltd. All rights reserved.
Quantile regression theory and applications
Davino, Cristina; Vistocco, Domenico
2013-01-01
A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
TreePics: visualizing trees with pictures
Directory of Open Access Journals (Sweden)
Nicolas Puillandre
2017-09-01
Full Text Available While many programs are available to edit phylogenetic trees, associating pictures with branch tips in an efficient and automatic way is not an available option. Here, we present TreePics, a standalone software that uses a web browser to visualize phylogenetic trees in Newick format and that associates pictures (typically, pictures of the voucher specimens to the tip of each branch. Pictures are visualized as thumbnails and can be enlarged by a mouse rollover. Further, several pictures can be selected and displayed in a separate window for visual comparison. TreePics works either online or in a full standalone version, where it can display trees with several thousands of pictures (depending on the memory available. We argue that TreePics can be particularly useful in a preliminary stage of research, such as to quickly detect conflicts between a DNA-based phylogenetic tree and morphological variation, that may be due to contamination that needs to be removed prior to final analyses, or the presence of species complexes.
Distribution of cavity trees in midwestern old-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Smith, Paul F; Ganesh, Siva; Liu, Ping
2013-10-30
Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Tree rings and radiocarbon calibration
International Nuclear Information System (INIS)
Barbetti, M.
1999-01-01
Only a few kinds of trees in Australia and Southeast Asia are known to have growth rings that are both distinct and annual. Those that do are therefore extremely important to climatic and isotope studies. In western Tasmania, extensive work with Huon pine (Lagarostrobos franklinii) has shown that many living trees are more than 1,000 years old, and that their ring widths are sensitive to temperature, rainfall and cloud cover (Buckley et al. 1997). At the Stanley River there is a forest of living (and recently felled) trees which we have sampled and measured. There are also thousands of subfossil Huon pine logs, buried at depths less than 5 metres in an area of floodplain extending over a distance of more than a kilometre with a width of tens of metres. Some of these logs have been buried for 50,000 years or more, but most of them belong to the period between 15,000 years and the present. In previous expeditions in the 1980s and 1990s, we excavated and sampled about 350 logs (Barbetti et al. 1995; Nanson et al. 1995). By measuring the ring-width patterns, and matching them between logs and living trees, we have constructed a tree-ring dated chronology from 571 BC to AD 1992. We have also built a 4254-ring floating chronology (placed by radiocarbon at ca. 3580 to 7830 years ago), and an earlier 1268-ring chronology (ca. 7,580 to 8,850 years ago). There are many individuals, or pairs of logs which match and together span several centuries, at 9,000 years ago and beyond
Accurate phylogenetic tree reconstruction from quartets: a heuristic approach.
Reaz, Rezwana; Bayzid, Md Shamsuzzoha; Rahman, M Sohel
2014-01-01
Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A 'quartet' is an unrooted tree over 4 taxa, hence the quartet-based supertree methods combine many 4-taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets.
Interspecific variation in tree seedlings establishment in canopy gaps in relation to tree density
Energy Technology Data Exchange (ETDEWEB)
Reader, R.J.; Bonser, S.P.; Duralia, T.E.; Bricker, B.D. [Guelph Univ., ON (Canada). Dept. of Botany
1995-10-01
We tested whether interspecific variation in tree seedling establishment in canopy gaps was significantly related to interspecific variation in tree density, for seven deciduous forest tree species (Quercus alba, Hamamelis virginiana, Acer rubrum, Sassafras albidum, Quercus rubra, Prunus serotina, Ostrya virginiana). For each species, seedling establishment was calculated as the difference in seedling density before experimental gap creation versus three years after gap creation. In each of the six experimentally-created gap types (33% or 66% removal of tree basal area from 0.01ha, 0.05ha or 0.20ha patches), differences in seedling establishment among species were significantly related to differences in their density in the tree canopy. A regression model with log{sub e} tree density as the independent variable accounted for between 93% and 98% of interspecific variation in seedling establishment. Our results provide empirical support for models of tree dynamics in gaps that assume seedling establishment depends on canopy tree density. 17 refs, 1 fig, 3 tabs
Bypassing BDD Construction for Reliability Analysis
DEFF Research Database (Denmark)
Williams, Poul Frederick; Nikolskaia, Macha; Rauzy, Antoine
2000-01-01
In this note, we propose a Boolean Expression Diagram (BED)-based algorithm to compute the minimal p-cuts of boolean reliability models such as fault trees. BEDs make it possible to bypass the Binary Decision Diagram (BDD) construction, which is the main cost of fault tree assessment....
Steiner trees for fixed orientation metrics
DEFF Research Database (Denmark)
Brazil, Marcus; Zachariasen, Martin
2009-01-01
We consider the problem of constructing Steiner minimum trees for a metric defined by a polygonal unit circle (corresponding to s = 2 weighted legal orientations in the plane). A linear-time algorithm to enumerate all angle configurations for degree three Steiner points is given. We provide...... a simple proof that the angle configuration for a Steiner point extends to all Steiner points in a full Steiner minimum tree, such that at most six orientations suffice for edges in a full Steiner minimum tree. We show that the concept of canonical forms originally introduced for the uniform orientation...... metric generalises to the fixed orientation metric. Finally, we give an O(s n) time algorithm to compute a Steiner minimum tree for a given full Steiner topology with n terminal leaves....
Panel Smooth Transition Regression Models
DEFF Research Database (Denmark)
González, Andrés; Terasvirta, Timo; Dijk, Dick van
We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Testing discontinuities in nonparametric regression
Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun
2017-01-01
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Logistic Regression: Concept and Application
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Decision Tree Technique for Particle Identification
International Nuclear Information System (INIS)
Quiller, Ryan
2003-01-01
Particle identification based on measurements such as the Cerenkov angle, momentum, and the rate of energy loss per unit distance (-dE/dx) is fundamental to the BaBar detector for particle physics experiments. It is particularly important to separate the charged forms of kaons and pions. Currently, the Neural Net, an algorithm based on mapping input variables to an output variable using hidden variables as intermediaries, is one of the primary tools used for identification. In this study, a decision tree classification technique implemented in the computer program, CART, was investigated and compared to the Neural Net over the range of momenta, 0.25 GeV/c to 5.0 GeV/c. For a given subinterval of momentum, three decision trees were made using different sets of input variables. The sensitivity and specificity were calculated for varying kaon acceptance thresholds. This data was used to plot Receiver Operating Characteristic curves (ROC curves) to compare the performance of the classification methods. Also, input variables used in constructing the decision trees were analyzed. It was found that the Neural Net was a significant contributor to decision trees using dE/dx and the Cerenkov angle as inputs. Furthermore, the Neural Net had poorer performance than the decision tree technique, but tended to improve decision tree performance when used as an input variable. These results suggest that the decision tree technique using Neural Net input may possibly increase accuracy of particle identification in BaBar
Refining discordant gene trees.
Górecki, Pawel; Eulenstein, Oliver
2014-01-01
Evolutionary studies are complicated by discordance between gene trees and the species tree in which they evolved. Dealing with discordant trees often relies on comparison costs between gene and species trees, including the well-established Robinson-Foulds, gene duplication, and deep coalescence costs. While these costs have provided credible results for binary rooted gene trees, corresponding cost definitions for non-binary unrooted gene trees, which are frequently occurring in practice, are challenged by biological realism. We propose a natural extension of the well-established costs for comparing unrooted and non-binary gene trees with rooted binary species trees using a binary refinement model. For the duplication cost we describe an efficient algorithm that is based on a linear time reduction and also computes an optimal rooted binary refinement of the given gene tree. Finally, we show that similar reductions lead to solutions for computing the deep coalescence and the Robinson-Foulds costs. Our binary refinement of Robinson-Foulds, gene duplication, and deep coalescence costs for unrooted and non-binary gene trees together with the linear time reductions provided here for computing these costs significantly extends the range of trees that can be incorporated into approaches dealing with discordance.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
International Nuclear Information System (INIS)
Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei
2007-01-01
Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age
Stress Wave Propagation in Larch Plantation Trees-Numerical Simulation
Fenglu Liu; Fang Jiang; Xiping Wang; Houjiang Zhang; Wenhua Yu
2015-01-01
In this paper, we attempted to simulate stress wave propagation in virtual tree trunks and construct two dimensional (2D) wave-front maps in the longitudinal-radial section of the trunk. A tree trunk was modeled as an orthotropic cylinder in which wood properties along the fiber and in each of the two perpendicular directions were different. We used the COMSOL...
Stochastic development regression using method of moments
DEFF Research Database (Denmark)
Kühnel, Line; Sommer, Stefan Horst
2017-01-01
This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....
Tumor regression patterns in retinoblastoma
International Nuclear Information System (INIS)
Zafar, S.N.; Siddique, S.N.; Zaheer, N.
2016-01-01
To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)
On Tree Pattern Matching by Pushdown Automata
Directory of Open Access Journals (Sweden)
T. Flouri
2009-01-01
Full Text Available Tree pattern matching is an important operation in Computer Science on which a number of tasks such as mechanical theorem proving, term-rewriting, symbolic computation and non-procedural programming languages are based on. Work has begun on a systematic approach to the construction of tree pattern matchers by deterministic pushdown automata which read subject trees in prefix notation. The method is analogous to the construction of string pattern matchers: for given patterns, a non-deterministic pushdown automaton is created and then it is determinised. In this first paper, we present the proposed non-deterministic pushdown automaton which will serve as a basis for the determinisation process, and prove its correctness.
Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng
2018-02-09
Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Regression to Causality : Regression-style presentation influences causal attribution
DEFF Research Database (Denmark)
Bordacconi, Mats Joe; Larsen, Martin Vinæs
2014-01-01
of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...
Decision tree approach for classification of remotely sensed satellite
Indian Academy of Sciences (India)
DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source ...
Topological variation in single-gene phylogenetic trees
Castresana, Jose
2007-01-01
A recent large-scale phylogenomic study has shown the great degree of topological variation that can be found among eukaryotic phylogenetic trees constructed from single genes, highlighting the problems that can be associated with gene sampling in phylogenetic studies.
Regression analysis with categorized regression calibrated exposure: some interesting findings
Directory of Open Access Journals (Sweden)
Hjartåker Anette
2006-07-01
Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a
Poorter, L.
2008-01-01
Background and Aims: The volume of tree stems is made up of three components: solid wood, gas and water. These components have important consequences for the construction costs, strength and stability of trees. Here, the importance of stem components for sapling growth and survival in the field was
Investigating Tree Thinking & Ancestry with Cladograms
Davenport, K. D.; Milks, Kirstin Jane; Van Tassell, Rebecca
2015-01-01
Interpreting cladograms is a key skill for biological literacy. In this lesson, students interpret cladograms based on familial relationships and language relationships to build their understanding of tree thinking and to construct a definition of "common ancestor." These skills can then be applied to a true biological cladogram.
A new coding algorithm for trees
Balakirsky, V.B.
2002-01-01
We construct a one-to-one mapping between binary vectors of length $n$ and preorder codewords of regular, ordered, oriented, rooted, binary trees having $N \\approx n + 2$ log $n$ nodes. The mappings in both directions can be organized in such a way that complexities of all transformations are
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Logic regression and its extensions.
Schwender, Holger; Ruczinski, Ingo
2010-01-01
Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
Belitsky, A. V.
2018-04-01
The form factor program for the regularized space-time S-matrix in planar maximally supersymmetric gauge theory, known as the pentagon operator product expansion, is formulated in terms of flux-tube excitations propagating on a dual two-dimensional world-sheet, whose dynamics is known exactly as a function of 't Hooft coupling. Both MHV and non-MHV amplitudes are described in a uniform, systematic fashion within this framework, with the difference between the two encoded in coupling-dependent helicity form factors expressed via Zhukowski variables. The nontrivial SU(4) tensor structure of flux-tube transitions is coupling independent and is known for any number of charged excitations from solutions of a system of Watson and Mirror equations. This description allows one to resum the infinite series of form factors and recover the space-time S-matrix exactly in kinematical variables at a given order of perturbation series. Recently, this was done for the hexagon. Presently, we successfully perform resummation for the seven-leg tree NMHV amplitude. To this end, we construct the flux-tube integrands of the fifteen independent Grassmann component of the heptagon with an infinite number of small fermion-antifermion pairs accounted for in NMHV two-channel conformal blocks.
these systems can improve water quality, engineers and scientists construct systems that replicate the functions of natural wetlands. Constructed wetlands are treatment systems that use natural processes
Abstract Expression Grammar Symbolic Regression
Korns, Michael F.
This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
Quantile Regression With Measurement Error
Wei, Ying; Carroll, Raymond J.
2009-01-01
. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a
IND - THE IND DECISION TREE PACKAGE
Buntine, W.
1994-01-01
A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The
From Rasch scores to regression
DEFF Research Database (Denmark)
Christensen, Karl Bang
2006-01-01
Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....
Testing Heteroscedasticity in Robust Regression
Czech Academy of Sciences Publication Activity Database
Kalina, Jan
2011-01-01
Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf
Regression methods for medical research
Tai, Bee Choo
2013-01-01
Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Spanning k-ended trees of 3-regular connected graphs
Directory of Open Access Journals (Sweden)
Hamed Ghasemian Zoeram
2017-10-01
Full Text Available A vertex of degree one is called an end-vertex and the set of end-vertices of G is denoted by End(G. For a positive integer k, a tree T be called k-ended tree if $|End(T| \\leq k$. In this paper, we obtain sufficient conditions for spanning k-trees of 3-regular connected graphs. We give a construction sequence of graphs satisfying the condition. At the end, we present a conjecture about spanning k-ended trees of 3-regular connected graphs.
Coded Splitting Tree Protocols
DEFF Research Database (Denmark)
Sørensen, Jesper Hemming; Stefanovic, Cedomir; Popovski, Petar
2013-01-01
This paper presents a novel approach to multiple access control called coded splitting tree protocol. The approach builds on the known tree splitting protocols, code structure and successive interference cancellation (SIC). Several instances of the tree splitting protocol are initiated, each...... instance is terminated prematurely and subsequently iterated. The combined set of leaves from all the tree instances can then be viewed as a graph code, which is decodable using belief propagation. The main design problem is determining the order of splitting, which enables successful decoding as early...
Morocco - Fruit Tree Productivity
Millennium Challenge Corporation — Date Tree Irrigation Project: The specific objectives of this evaluation are threefold: - Performance evaluation of project activities, like the mid-term evaluation,...
Wood density variation and tree ring distinctness in Gmelina arborea trees by x-ray densitometry
Directory of Open Access Journals (Sweden)
Roger Moya
2009-03-01
Full Text Available Due to its relationship with other properties, wood density is the main wood quality parameter. Modern, accuratemethods such as X-ray densitometry - are applied to determine the spatial distribution of density in wood sections and to evaluatewood quality. The objectives of this study were to determinate the influence of growing conditions on wood density variation andtree ring demarcation of gmelina trees from fast growing plantations in Costa Rica. The wood density was determined by X-raydensitometry method. Wood samples were cut from gmelina trees and were exposed to low X-rays. The radiographic films weredeveloped and scanned using a 256 gray scale with 1000 dpi resolution and the wood density was determined by CRAD and CERDsoftware. The results showed tree-ring boundaries were distinctly delimited in trees growing in site with rainfall lower than 2510 mm/year. It was demonstrated that tree age, climatic conditions and management of plantation affects wood density and its variability. Thespecific effect of variables on wood density was quantified by for multiple regression method. It was determined that tree yearexplained 25.8% of the total variation of density and 19.9% were caused by climatic condition where the tree growing. Wood densitywas less affected by the intensity of forest management with 5.9% of total variation.
A review of tree root conflicts with sidewalks, curbs, and roads
T.B. Randrup; E.G. McPherson; L.R. Costello
2003-01-01
Literature relevant to tree root and urban infrastructure conflicts is reviewed. Although tree roots can conflict with many infrastructure elements, sidewalk and curb conflicts are the focus of this review. Construction protocols, urban soils, root growth, and causal factors (soil conditions, limited planting space, tree size, variation in root architecture, management...
The balance of planting and mortality in a street tree population
Lara A. Roman; John J. Battles; Joe R. McBride
2013-01-01
Street trees have aesthetic, environmental, human health, and economic benefits in urban ecosystems. Street tree populations are constructed by cycles of planting, growth, death, removal and replacement. The goals of this study were to understand how tree mortality and planting rates affect net population growth, evaluate the shape of the mortality curve, and assess...
Use of fault and decision tree analyses to protect against industrial sabotage
International Nuclear Information System (INIS)
Fullwood, R.R.; Erdmann, R.C.
1975-01-01
Fault tree and decision tree analyses provide systematic bases for evaluation of safety systems and procedures. Heuristically, this paper shows applications of these methods for industrial sabotage analysis at a reprocessing plant. Fault trees constructed by ''leak path'' analysis for completeness through path inventory. The escape fault tree is readily developed by this method and using the reciprocal character of the trees, the attack fault tree is constructed. After construction, the events on the fault tree are corrected for their nonreciprocal character. The fault trees are algebraically solved and the protection that is afforded is ranked by the number of barriers that must be penetrated. No attempt is made to assess the barrier penetration probabilities or penetration time duration. Event trees are useful for dynamic plant protection analysis through their time-sequencing character. To illustrate their usefulness, a simple attack scenario is devised and event-tree analyzed. Two saboteur success paths and 21 failure paths are found. This example clearly shows the event tree usefulness for concisely presenting the time sequencing of key decision points. However, event trees have the disadvantage of being scenario dependent, therefore requiring a separate event tree for each scenario
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
Pellicer, Eugenio; Teixeira, José C; Moura, Helder P; Catalá, Joaquín
2014-01-01
The management of construction projects is a wide ranging and challenging discipline in an increasingly international industry, facing continual challenges and demands for improvements in safety, in quality and cost control, and in the avoidance of contractual disputes. Construction Management grew out of a Leonardo da Vinci project to develop a series of Common Learning Outcomes for European Managers in Construction. Financed by the European Union, the project aimed to develop a library of basic materials for developing construction management skills for use in a pan-European context. Focused exclusively on the management of the construction phase of a building project from the contractor's point of view, Construction Management covers the complete range of topics of which mastery is required by the construction management professional for the effective delivery of new construction projects. With the continued internationalisation of the construction industry, Construction Management will be required rea...
Tree-Based Unrooted Phylogenetic Networks.
Francis, A; Huber, K T; Moulton, V
2018-02-01
Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Kevin T. Smith
2009-01-01
Trees and tree care can capture the best of people's motivations and intentions. Trees are living memorials that help communities heal at sites of national tragedy, such as Oklahoma City and the World Trade Center. We mark the places of important historical events by the trees that grew nearby even if the original tree, such as the Charter Oak in Connecticut or...
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Decision tree methods: applications for classification and prediction.
Song, Yan-Yan; Lu, Ying
2015-04-25
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Producing The New Regressive Left
DEFF Research Database (Denmark)
Crone, Christine
members, this thesis investigates a growing political trend and ideological discourse in the Arab world that I have called The New Regressive Left. On the premise that a media outlet can function as a forum for ideology production, the thesis argues that an analysis of this material can help to trace...... the contexture of The New Regressive Left. If the first part of the thesis lays out the theoretical approach and draws the contextual framework, through an exploration of the surrounding Arab media-and ideoscapes, the second part is an analytical investigation of the discourse that permeates the programmes aired...... becomes clear from the analytical chapters is the emergence of the new cross-ideological alliance of The New Regressive Left. This emerging coalition between Shia Muslims, religious minorities, parts of the Arab Left, secular cultural producers, and the remnants of the political,strategic resistance...
International Nuclear Information System (INIS)
Mukhamedov, Farrukh; Saburov, Mansoor
2010-06-01
In the present paper we study forward Quantum Markov Chains (QMC) defined on a Cayley tree. Using the tree structure of graphs, we give a construction of quantum Markov chains on a Cayley tree. By means of such constructions we prove the existence of a phase transition for the XY-model on a Cayley tree of order three in QMC scheme. By the phase transition we mean the existence of two distinct QMC for the given family of interaction operators {K }. (author)
A Matlab program for stepwise regression
Directory of Open Access Journals (Sweden)
Yanhong Qi
2016-03-01
Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Nonparametric Mixture of Regression Models.
Huang, Mian; Li, Runze; Wang, Shaoli
2013-07-01
Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.
International Nuclear Information System (INIS)
Kalay, Z; Ben-Naim, E
2015-01-01
We study fragmentation of a random recursive tree into a forest by repeated removal of nodes. The initial tree consists of N nodes and it is generated by sequential addition of nodes with each new node attaching to a randomly-selected existing node. As nodes are removed from the tree, one at a time, the tree dissolves into an ensemble of separate trees, namely, a forest. We study statistical properties of trees and nodes in this heterogeneous forest, and find that the fraction of remaining nodes m characterizes the system in the limit N→∞. We obtain analytically the size density ϕ s of trees of size s. The size density has power-law tail ϕ s ∼s −α with exponent α=1+(1/m). Therefore, the tail becomes steeper as further nodes are removed, and the fragmentation process is unusual in that exponent α increases continuously with time. We also extend our analysis to the case where nodes are added as well as removed, and obtain the asymptotic size density for growing trees. (paper)
J.R. Simpson; E.G. McPherson
2011-01-01
Urban trees can produce a number of benefits, among them improved air quality. Biogenic volatile organic compounds (BVOCs) emitted by some species are ozone precursors. Modifying future tree planting to favor lower-emitting species can reduce these emissions and aid air management districts in meeting federally mandated emissions reductions for these compounds. Changes...
L. Linsen; B.J. Karis; E.G. McPherson; B. Hamann
2005-01-01
In computer graphics, models describing the fractal branching structure of trees typically exploit the modularity of tree structures. The models are based on local production rules, which are applied iteratively and simultaneously to create a complex branching system. The objective is to generate three-dimensional scenes of often many realistic- looking and non-...
Indian Academy of Sciences (India)
Adansonia digitata L. ( The Baobab Tree) of Bombacaceae is a tree with swollen trunk that attains a dia. of 10m. Leaves are digitately compound with leaflets up to 18cm. long. Flowers are large, solitary, waxy white, and open at dusk. They open in 30 seconds and are bat pollinated. Stamens are many. Fruit is about 30 cm ...
Tree biology and dendrochemistry
Kevin T. Smith; Walter C. Shortle
1996-01-01
Dendrochemistry, the interpretation of elemental analysis of dated tree rings, can provide a temporal record of environmental change. Using the dendrochemical record requires an understanding of tree biology. In this review, we pose four questions concerning assumptions that underlie recent dendrochemical research: 1) Does the chemical composition of the wood directly...
Harvey A. Holt
1989-01-01
Controlling individual unwanted trees in forest stands is a readily accepted method for improving the value of future harvests. The practice is especially important in mixed hardwood forests where species differ considerably in value and within species individual trees differ in quality. Individual stem control is a mechanical or chemical weeding operation that...
Dettenmaier, Megan; Kuhns, Michael; Unger, Bethany; McAvoy, Darren
2017-01-01
This fact sheet describes the complex relationship between forests and climate change based on current research. It explains ways that trees can mitigate some of the risks associated with climate change. It details the impacts that forests are having on the changing climate and discuss specific ways that trees can be used to reduce or counter carbon emissions directly and indirectly.
Structural Equation Model Trees
Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman
2013-01-01
In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…
Matching Subsequences in Trees
DEFF Research Database (Denmark)
Bille, Philip; Gørtz, Inge Li
2009-01-01
Given two rooted, labeled trees P and T the tree path subsequence problem is to determine which paths in P are subsequences of which paths in T. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new...
Comparison of Greedy Algorithms for Decision Tree Optimization
Alkhalid, Abdulaziz
2013-01-01
This chapter is devoted to the study of 16 types of greedy algorithms for decision tree construction. The dynamic programming approach is used for construction of optimal decision trees. Optimization is performed relative to minimal values of average depth, depth, number of nodes, number of terminal nodes, and number of nonterminal nodes of decision trees. We compare average depth, depth, number of nodes, number of terminal nodes and number of nonterminal nodes of constructed trees with minimum values of the considered parameters obtained based on a dynamic programming approach. We report experiments performed on data sets from UCI ML Repository and randomly generated binary decision tables. As a result, for depth, average depth, and number of nodes we propose a number of good heuristics. © Springer-Verlag Berlin Heidelberg 2013.
Dynamics in small worlds of tree topologies of wireless sensor networks
DEFF Research Database (Denmark)
Li, Qiao; Zhang, Baihai; Fan, Zhun
2012-01-01
Tree topologies, which construct spatial graphs with large characteristic path lengths and small clustering coefficients, are ubiquitous in deployments of wireless sensor networks. Small worlds are investigated in tree-based networks. Due to link additions, characteristic path lengths reduce...... rapidly and clustering coefficients increase greatly. A tree abstract, Cayley tree, is considered for the study of the navigation algorithm, which runs automatically in the small worlds of tree-based networks. In the further study, epidemics in the small worlds of tree-based wireless sensor networks...
Undergraduate Students’ Initial Ability in Understanding Phylogenetic Tree
Sa'adah, S.; Hidayat, T.; Sudargo, Fransisca
2017-04-01
The Phylogenetic tree is a visual representation depicts a hypothesis about the evolutionary relationship among taxa. Evolutionary experts use this representation to evaluate the evidence for evolution. The phylogenetic tree is currently growing for many disciplines in biology. Consequently, learning about the phylogenetic tree has become an important part of biological education and an interesting area of biology education research. Skill to understanding and reasoning of the phylogenetic tree, (called tree thinking) is an important skill for biology students. However, research showed many students have difficulty in interpreting, constructing, and comparing among the phylogenetic tree, as well as experiencing a misconception in the understanding of the phylogenetic tree. Students are often not taught how to reason about evolutionary relationship depicted in the diagram. Students are also not provided with information about the underlying theory and process of phylogenetic. This study aims to investigate the initial ability of undergraduate students in understanding and reasoning of the phylogenetic tree. The research method is the descriptive method. Students are given multiple choice questions and an essay that representative by tree thinking elements. Each correct answer made percentages. Each student is also given questionnaires. The results showed that the undergraduate students’ initial ability in understanding and reasoning phylogenetic tree is low. Many students are not able to answer questions about the phylogenetic tree. Only 19 % undergraduate student who answered correctly on indicator evaluate the evolutionary relationship among taxa, 25% undergraduate student who answered correctly on indicator applying concepts of the clade, 17% undergraduate student who answered correctly on indicator determines the character evolution, and only a few undergraduate student who can construct the phylogenetic tree.
Environmental tritium in trees
International Nuclear Information System (INIS)
Brown, R.M.
1979-01-01
The distribution of environmental tritium in the free water and organically bound hydrogen of trees growing in the vicinity of the Chalk River Nuclear Laboratories (CRNL) has been studied. The regional dispersal of HTO in the atmosphere has been observed by surveying the tritium content of leaf moisture. Measurement of the distribution of organically bound tritium in the wood of tree ring sequences has given information on past concentrations of HTO taken up by trees growing in the CRNL Liquid Waste Disposal Area. For samples at background environmental levels, cellulose separation and analysis was done. The pattern of bomb tritium in precipitation of 1955-68 was observed to be preserved in the organically bound tritium of a tree ring sequence. Reactor tritium was discernible in a tree growing at a distance of 10 km from CRNL. These techniques provide convenient means of monitoring dispersal of HTO from nuclear facilities. (author)
Jones, M; Aptaker, P S; Cox, J; Gardiner, B A; McDonald, P J
2012-05-01
This paper presents the design of the 'Tree Hugger', an open access, transportable, 1.1 MHz (1)H nuclear magnetic resonance imaging system for the in situ analysis of living trees in the forest. A unique construction employing NdFeB blocks embedded in a reinforced carbon fibre frame is used to achieve access up to 210 mm and to allow the magnet to be transported. The magnet weighs 55 kg. The feasibility of imaging living trees in situ using the 'Tree Hugger' is demonstrated. Correlations are drawn between NMR/MRI measurements and other indicators such as relative humidity, soil moisture and net solar radiation. Copyright © 2012 Elsevier Inc. All rights reserved.
Generalising tree traversals and tree transformations to DAGs
DEFF Research Database (Denmark)
Bahr, Patrick; Axelsson, Emil
2017-01-01
We present a recursion scheme based on attribute grammars that can be transparently applied to trees and acyclic graphs. Our recursion scheme allows the programmer to implement a tree traversal or a tree transformation and then apply it to compact graph representations of trees instead. The resul......We present a recursion scheme based on attribute grammars that can be transparently applied to trees and acyclic graphs. Our recursion scheme allows the programmer to implement a tree traversal or a tree transformation and then apply it to compact graph representations of trees instead...... as the complementing theory with a number of examples....
Measuring performance in health care: case-mix adjustment by boosted decision trees.
Neumann, Anke; Holstein, Josiane; Le Gall, Jean-Roger; Lepage, Eric
2004-10-01
The purpose of this paper is to investigate the suitability of boosted decision trees for the case-mix adjustment involved in comparing the performance of various health care entities. First, we present logistic regression, decision trees, and boosted decision trees in a unified framework. Second, we study in detail their application for two common performance indicators, the mortality rate in intensive care and the rate of potentially avoidable hospital readmissions. For both examples the technique of boosting decision trees outperformed standard prognostic models, in particular linear logistic regression models, with regard to predictive power. On the other hand, boosting decision trees was computationally demanding and the resulting models were rather complex and needed additional tools for interpretation. Boosting decision trees represents a powerful tool for case-mix adjustment in health care performance measurement. Depending on the specific priorities set in each context, the gain in predictive power might compensate for the inconvenience in the use of boosted decision trees.
Energy Technology Data Exchange (ETDEWEB)
El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Taourel, Patrice [Lapeyronie Hospital, Department of Medical Imaging, Montpellier (France); Molinari, Nicolas [UMR 5149 IMAG, CHU, Department of Medical Information and Statistics, Montpellier (France)
2018-02-15
To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. (orig.)
International Nuclear Information System (INIS)
El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Taourel, Patrice; Molinari, Nicolas
2018-01-01
To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. (orig.)
Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid
2018-05-12
Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.
Cactus: An Introduction to Regression
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Regression Models for Repairable Systems
Czech Academy of Sciences Publication Activity Database
Novák, Petr
2015-01-01
Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf
Survival analysis II: Cox regression
Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.
2011-01-01
In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the
Kernel regression with functional response
Ferraty, Frédéric; Laksaci, Ali; Tadj, Amel; Vieu, Philippe
2011-01-01
We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.