WorldWideScience

Sample records for interactive phylogenetic trees

  1. Constructing phylogenetic trees using interacting pathways.

    Science.gov (United States)

    Wan, Peng; Che, Dongsheng

    2013-01-01

    Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms-eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.

  2. SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.

    Science.gov (United States)

    Beccati, Alan; Gerken, Jan; Quast, Christian; Yilmaz, Pelin; Glöckner, Frank Oliver

    2017-09-30

    Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. The SILVA Tree Viewer is a web application designed for visualizing large phylogenetic trees without requiring the download of any software tool or data files. The SILVA Tree Viewer is based on Web Geographic Information Systems (Web-GIS) technology with a PostgreSQL backend. It enables zoom and pan functionalities similar to Google Maps. The SILVA Tree Viewer enables access to two phylogenetic (guide) trees provided by the SILVA database: the SSU Ref NR99 inferred from high-quality, full-length small subunit sequences, clustered at 99% sequence identity and the LSU Ref inferred from high-quality, full-length large subunit sequences. The Tree Viewer provides tree navigation, search and browse tools as well as an interactive feedback system to collect any kinds of requests ranging from taxonomy to data curation and improving the tool itself.

  3. SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees

    OpenAIRE

    Beccati, Alan; Gerken, Jan; Quast, Christian; Yilmaz, Pelin; Glöckner, Frank Oliver

    2017-01-01

    Background Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. Results The SILVA Tree Viewer is a web application designed for visualizing large phylogenetic trees without requiring the download of any software tool or data files. The SILVA T...

  4. Tree phylogenetic diversity promotes host-parasitoid interactions.

    Science.gov (United States)

    Staab, Michael; Bruelheide, Helge; Durka, Walter; Michalski, Stefan; Purschke, Oliver; Zhu, Chao-Dong; Klein, Alexandra-Maria

    2016-07-13

    Evidence from grassland experiments suggests that a plant community's phylogenetic diversity (PD) is a strong predictor of ecosystem processes, even stronger than species richness per se This has, however, never been extended to species-rich forests and host-parasitoid interactions. We used cavity-nesting Hymenoptera and their parasitoids collected in a subtropical forest as a model system to test whether hosts, parasitoids, and their interactions are influenced by tree PD and a comprehensive set of environmental variables, including tree species richness. Parasitism rate and parasitoid abundance were positively correlated with tree PD. All variables describing parasitoids decreased with elevation, and were, except parasitism rate, dependent on host abundance. Quantitative descriptors of host-parasitoid networks were independent of the environment. Our study indicates that host-parasitoid interactions in species-rich forests are related to the PD of the tree community, which influences parasitism rates through parasitoid abundance. We show that effects of tree community PD are much stronger than effects of tree species richness, can cascade to high trophic levels, and promote trophic interactions. As during habitat modification phylogenetic information is usually lost non-randomly, even species-rich habitats may not be able to continuously provide the ecosystem process parasitism if the evolutionarily most distinct plant lineages vanish. © 2016 The Author(s).

  5. Predicting rates of interspecific interaction from phylogenetic trees.

    Science.gov (United States)

    Nuismer, Scott L; Harmon, Luke J

    2015-01-01

    Integrating phylogenetic information can potentially improve our ability to explain species' traits, patterns of community assembly, the network structure of communities, and ecosystem function. In this study, we use mathematical models to explore the ecological and evolutionary factors that modulate the explanatory power of phylogenetic information for communities of species that interact within a single trophic level. We find that phylogenetic relationships among species can influence trait evolution and rates of interaction among species, but only under particular models of species interaction. For example, when interactions within communities are mediated by a mechanism of phenotype matching, phylogenetic trees make specific predictions about trait evolution and rates of interaction. In contrast, if interactions within a community depend on a mechanism of phenotype differences, phylogenetic information has little, if any, predictive power for trait evolution and interaction rate. Together, these results make clear and testable predictions for when and how evolutionary history is expected to influence contemporary rates of species interaction. © 2014 John Wiley & Sons Ltd/CNRS.

  6. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy.

    Science.gov (United States)

    Letunic, Ivica; Bork, Peer

    2011-07-01

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.

  7. jsPhyloSVG: a javascript library for visualizing interactive and vector-based phylogenetic trees on the web.

    Science.gov (United States)

    Smits, Samuel A; Ouverney, Cleber C

    2010-08-18

    Many software packages have been developed to address the need for generating phylogenetic trees intended for print. With an increased use of the web to disseminate scientific literature, there is a need for phylogenetic trees to be viewable across many types of devices and feature some of the interactive elements that are integral to the browsing experience. We propose a novel approach for publishing interactive phylogenetic trees. We present a javascript library, jsPhyloSVG, which facilitates constructing interactive phylogenetic trees from raw Newick or phyloXML formats directly within the browser in Scalable Vector Graphics (SVG) format. It is designed to work across all major browsers and renders an alternative format for those browsers that do not support SVG. The library provides tools for building rectangular and circular phylograms with integrated charting. Interactive features may be integrated and made to respond to events such as clicks on any element of the tree, including labels. jsPhyloSVG is an open-source solution for rendering dynamic phylogenetic trees. It is capable of generating complex and interactive phylogenetic trees across all major browsers without the need for plugins. It is novel in supporting the ability to interpret the tree inference formats directly, exposing the underlying markup to data-mining services. The library source code, extensive documentation and live examples are freely accessible at www.jsphylosvg.com.

  8. Phylogenetic trees

    OpenAIRE

    Baños, Hector; Bushek, Nathaniel; Davidson, Ruth; Gross, Elizabeth; Harris, Pamela E.; Krone, Robert; Long, Colby; Stewart, Allen; Walker, Robert

    2016-01-01

    We introduce the package PhylogeneticTrees for Macaulay2 which allows users to compute phylogenetic invariants for group-based tree models. We provide some background information on phylogenetic algebraic geometry and show how the package PhylogeneticTrees can be used to calculate a generating set for a phylogenetic ideal as well as a lower bound for its dimension. Finally, we show how methods within the package can be used to compute a generating set for the join of any two ideals.

  9. treespace: Statistical exploration of landscapes of phylogenetic trees.

    Science.gov (United States)

    Jombart, Thibaut; Kendall, Michelle; Almagro-Garcia, Jacob; Colijn, Caroline

    2017-11-01

    The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low-dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  10. Nonbinary Tree-Based Phylogenetic Networks.

    Science.gov (United States)

    Jetten, Laura; van Iersel, Leo

    2018-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can, for example, represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.

  11. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees.

    Science.gov (United States)

    Letunic, Ivica; Bork, Peer

    2016-07-08

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. The current version was completely redesigned and rewritten, utilizing current web technologies for speedy and streamlined processing. Numerous new features were introduced and several new data types are now supported. Trees with up to 100,000 leaves can now be efficiently displayed. Full interactive control over precise positioning of various annotation features and an unlimited number of datasets allow the easy creation of complex tree visualizations. iTOL 3 is the first tool which supports direct visualization of the recently proposed phylogenetic placements format. Finally, iTOL's account system has been redesigned to simplify the management of trees in user-defined workspaces and projects, as it is heavily used and currently handles already more than 500,000 trees from more than 10,000 individual users. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. On Tree-Based Phylogenetic Networks.

    Science.gov (United States)

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

  13. The transposition distance for phylogenetic trees

    OpenAIRE

    Rossello, Francesc; Valiente, Gabriel

    2006-01-01

    The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phylogenetic trees. In this paper, we generalize the transposition distance from fully resolved to arbi...

  14. Encoding phylogenetic trees in terms of weighted quartets.

    Science.gov (United States)

    Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles

    2008-04-01

    One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.

  15. Tree-Based Unrooted Phylogenetic Networks.

    Science.gov (United States)

    Francis, A; Huber, K T; Moulton, V

    2018-02-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.

  16. Nonbinary tree-based phylogenetic networks

    OpenAIRE

    Jetten, Laura; van Iersel, Leo

    2016-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and st...

  17. Nodal distances for rooted phylogenetic trees.

    Science.gov (United States)

    Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente, Gabriel

    2010-08-01

    Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).

  18. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    Science.gov (United States)

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  19. Phylogenetic congruence between subtropical trees and their associated fungi.

    Science.gov (United States)

    Liu, Xubing; Liang, Minxia; Etienne, Rampal S; Gilbert, Gregory S; Yu, Shixiao

    2016-12-01

    Recent studies have detected phylogenetic signals in pathogen-host networks for both soil-borne and leaf-infecting fungi, suggesting that pathogenic fungi may track or coevolve with their preferred hosts. However, a phylogenetically concordant relationship between multiple hosts and multiple fungi in has rarely been investigated. Using next-generation high-throughput DNA sequencing techniques, we analyzed fungal taxa associated with diseased leaves, rotten seeds, and infected seedlings of subtropical trees. We compared the topologies of the phylogenetic trees of the soil and foliar fungi based on the internal transcribed spacer (ITS) region with the phylogeny of host tree species based on matK , rbcL , atpB, and 5.8S genes. We identified 37 foliar and 103 soil pathogenic fungi belonging to the Ascomycota and Basidiomycota phyla and detected significantly nonrandom host-fungus combinations, which clustered on both the fungus phylogeny and the host phylogeny. The explicit evidence of congruent phylogenies between tree hosts and their potential fungal pathogens suggests either diffuse coevolution among the plant-fungal interaction networks or that the distribution of fungal species tracked spatially associated hosts with phylogenetically conserved traits and habitat preferences. Phylogenetic conservatism in plant-fungal interactions within a local community promotes host and parasite specificity, which is integral to the important role of fungi in promoting species coexistence and maintaining biodiversity of forest communities.

  20. Taxonomic colouring of phylogenetic trees of protein sequences

    Directory of Open Access Journals (Sweden)

    Andrade-Navarro Miguel A

    2006-02-01

    Full Text Available Abstract Background Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common. Results We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree. Conclusion PhyloView can be used at http://www.ogic.ca/projects/phyloview/. A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address.

  1. Incompletely resolved phylogenetic trees inflate estimates of phylogenetic conservatism.

    Science.gov (United States)

    Davies, T Jonathan; Kraft, Nathan J B; Salamin, Nicolas; Wolkovich, Elizabeth M

    2012-02-01

    The tendency for more closely related species to share similar traits and ecological strategies can be explained by their longer shared evolutionary histories and represents phylogenetic conservatism. How strongly species traits co-vary with phylogeny can significantly impact how we analyze cross-species data and can influence our interpretation of assembly rules in the rapidly expanding field of community phylogenetics. Phylogenetic conservatism is typically quantified by analyzing the distribution of species values on the phylogenetic tree that connects them. Many phylogenetic approaches, however, assume a completely sampled phylogeny: while we have good estimates of deeper phylogenetic relationships for many species-rich groups, such as birds and flowering plants, we often lack information on more recent interspecific relationships (i.e., within a genus). A common solution has been to represent these relationships as polytomies on trees using taxonomy as a guide. Here we show that such trees can dramatically inflate estimates of phylogenetic conservatism quantified using S. P. Blomberg et al.'s K statistic. Using simulations, we show that even randomly generated traits can appear to be phylogenetically conserved on poorly resolved trees. We provide a simple rarefaction-based solution that can reliably retrieve unbiased estimates of K, and we illustrate our method using data on first flowering times from Thoreau's woods (Concord, Massachusetts, USA).

  2. Transforming phylogenetic networks: Moving beyond tree space

    OpenAIRE

    Huber, Katharina T.; Moulton, Vincent; Wu, Taoyang

    2016-01-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transforme...

  3. Visualizing phylogenetic tree landscapes.

    Science.gov (United States)

    Wilgenbusch, James C; Huang, Wen; Gallivan, Kyle A

    2017-02-02

    Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from multi-source data sets using current consensus tree methods discards valuable information and can disguise potential methodological problems. Discovery of efficient and accurate dimensionality reduction methods used to display at once in 2- or 3- dimensions the relationship among these competing phylogenies will help practitioners diagnose the limits of current evolutionary models and potential problems with phylogenetic reconstruction methods when analyzing large multi-source data sets. We introduce several dimensionality reduction methods to visualize in 2- and 3-dimensions the relationship among competing phylogenies obtained from gene partitions found in three mid- to large-size mitochondrial genome alignments. We test the performance of these dimensionality reduction methods by applying several goodness-of-fit measures. The intrinsic dimensionality of each data set is also estimated to determine whether projections in 2- and 3-dimensions can be expected to reveal meaningful relationships among trees from different data partitions. Several new approaches to aid in the comparison of different phylogenetic landscapes are presented. Curvilinear Components Analysis (CCA) and a stochastic gradient decent (SGD) optimization method give the best representation of the original tree-to-tree distance matrix for each of the three- mitochondrial genome alignments and greatly outperformed the method currently used to visualize tree landscapes. The CCA + SGD method converged at least as fast as previously applied methods for visualizing tree landscapes. We demonstrate for all three mtDNA alignments that 3D

  4. Locating a tree in a phylogenetic network

    NARCIS (Netherlands)

    Iersel, van L.J.J.; Semple, C.; Steel, M.A.

    2010-01-01

    Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster

  5. Locating a tree in a phylogenetic network

    OpenAIRE

    van Iersel, Leo; Semple, Charles; Steel, Mike

    2010-01-01

    Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in the network. Both problems are known to be NP-complete in general. In this article, we consider t...

  6. Transforming phylogenetic networks: Moving beyond tree space.

    Science.gov (United States)

    Huber, Katharina T; Moulton, Vincent; Wu, Taoyang

    2016-09-07

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. The space of ultrametric phylogenetic trees.

    Science.gov (United States)

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Undergraduate Students’ Difficulties in Reading and Constructing Phylogenetic Tree

    Science.gov (United States)

    Sa'adah, S.; Tapilouw, F. S.; Hidayat, T.

    2017-02-01

    Representation is a very important communication tool to communicate scientific concepts. Biologists produce phylogenetic representation to express their understanding of evolutionary relationships. The phylogenetic tree is visual representation depict a hypothesis about the evolutionary relationship and widely used in the biological sciences. Phylogenetic tree currently growing for many disciplines in biology. Consequently, learning about phylogenetic tree become an important part of biological education and an interesting area for biology education research. However, research showed many students often struggle with interpreting the information that phylogenetic trees depict. The purpose of this study was to investigate undergraduate students’ difficulties in reading and constructing a phylogenetic tree. The method of this study is a descriptive method. In this study, we used questionnaires, interviews, multiple choice and open-ended questions, reflective journals and observations. The findings showed students experiencing difficulties, especially in constructing a phylogenetic tree. The students’ responds indicated that main reasons for difficulties in constructing a phylogenetic tree are difficult to placing taxa in a phylogenetic tree based on the data provided so that the phylogenetic tree constructed does not describe the actual evolutionary relationship (incorrect relatedness). Students also have difficulties in determining the sister group, character synapomorphy, autapomorphy from data provided (character table) and comparing among phylogenetic tree. According to them building the phylogenetic tree is more difficult than reading the phylogenetic tree. Finding this studies provide information to undergraduate instructor and students to overcome learning difficulties of reading and constructing phylogenetic tree.

  9. Undergraduate Students’ Initial Ability in Understanding Phylogenetic Tree

    Science.gov (United States)

    Sa'adah, S.; Hidayat, T.; Sudargo, Fransisca

    2017-04-01

    The Phylogenetic tree is a visual representation depicts a hypothesis about the evolutionary relationship among taxa. Evolutionary experts use this representation to evaluate the evidence for evolution. The phylogenetic tree is currently growing for many disciplines in biology. Consequently, learning about the phylogenetic tree has become an important part of biological education and an interesting area of biology education research. Skill to understanding and reasoning of the phylogenetic tree, (called tree thinking) is an important skill for biology students. However, research showed many students have difficulty in interpreting, constructing, and comparing among the phylogenetic tree, as well as experiencing a misconception in the understanding of the phylogenetic tree. Students are often not taught how to reason about evolutionary relationship depicted in the diagram. Students are also not provided with information about the underlying theory and process of phylogenetic. This study aims to investigate the initial ability of undergraduate students in understanding and reasoning of the phylogenetic tree. The research method is the descriptive method. Students are given multiple choice questions and an essay that representative by tree thinking elements. Each correct answer made percentages. Each student is also given questionnaires. The results showed that the undergraduate students’ initial ability in understanding and reasoning phylogenetic tree is low. Many students are not able to answer questions about the phylogenetic tree. Only 19 % undergraduate student who answered correctly on indicator evaluate the evolutionary relationship among taxa, 25% undergraduate student who answered correctly on indicator applying concepts of the clade, 17% undergraduate student who answered correctly on indicator determines the character evolution, and only a few undergraduate student who can construct the phylogenetic tree.

  10. Phylogenetic search through partial tree mixing

    Science.gov (United States)

    2012-01-01

    Background Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. Results When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda Conclusions The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution. PMID:23320449

  11. Folding and unfolding phylogenetic trees and networks.

    Science.gov (United States)

    Huber, Katharina T; Moulton, Vincent; Steel, Mike; Wu, Taoyang

    2016-12-01

    Phylogenetic networks are rooted, labelled directed acyclic graphswhich are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be "unfolded" to obtain a MUL-tree U(N) and, conversely, a MUL-tree T can in certain circumstances be "folded" to obtain aphylogenetic network F(T) that exhibits T. In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable networks, phylogenetic networks N for which F(U(N)) is isomorphic to N, characterise such networks, and show that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U(N). To do this, we develop aphylogenetic analogue of graph fibrations. This allows us to view U(N) as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in U(N) and reconciling phylogenetic trees with networks.

  12. IcyTree: rapid browser-based visualization for phylogenetic trees and networks.

    Science.gov (United States)

    Vaughan, Timothy G

    2017-08-01

    IcyTree is an easy-to-use application which can be used to visualize a wide variety of phylogenetic trees and networks. While numerous phylogenetic tree viewers exist already, IcyTree distinguishes itself by being a purely online tool, having a responsive user interface, supporting phylogenetic networks (ancestral recombination graphs in particular), and efficiently drawing trees that include information such as ancestral locations or trait values. IcyTree also provides intuitive panning and zooming utilities that make exploring large phylogenetic trees of many thousands of taxa feasible. IcyTree is a web application and can be accessed directly at http://tgvaughan.github.com/icytree . Currently supported web browsers include Mozilla Firefox and Google Chrome. IcyTree is written entirely in client-side JavaScript (no plugin required) and, once loaded, does not require network access to run. IcyTree is free software, and the source code is made available at http://github.com/tgvaughan/icytree under version 3 of the GNU General Public License. tgvaughan@gmail.com. © The Author(s) 2017. Published by Oxford University Press.

  13. Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses.

    Science.gov (United States)

    Fouquier, Jennifer; Rideout, Jai Ram; Bolyen, Evan; Chase, John; Shiffer, Arron; McDonald, Daniel; Knight, Rob; Caporaso, J Gregory; Kelley, Scott T

    2016-02-24

    Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child. We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic

  14. Phylogenetic trees in bioinformatics

    Energy Technology Data Exchange (ETDEWEB)

    Burr, Tom L [Los Alamos National Laboratory

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  15. TreeScaper: Visualizing and Extracting Phylogenetic Signal from Sets of Trees.

    Science.gov (United States)

    Huang, Wen; Zhou, Guifang; Marchand, Melissa; Ash, Jeremy R; Morris, David; Van Dooren, Paul; Brown, Jeremy M; Gallivan, Kyle A; Wilgenbusch, Jim C

    2016-12-01

    Modern phylogenomic analyses often result in large collections of phylogenetic trees representing uncertainty in individual gene trees, variation across genes, or both. Extracting phylogenetic signal from these tree sets can be challenging, as they are difficult to visualize, explore, and quantify. To overcome some of these challenges, we have developed TreeScaper, an application for tree set visualization as well as the identification of distinct phylogenetic signals. GUI and command-line versions of TreeScaper and a manual with tutorials can be downloaded from https://github.com/whuang08/TreeScaper/releases TreeScaper is distributed under the GNU General Public License. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.

    Science.gov (United States)

    Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu

    2013-09-15

    Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.

  17. Competitive interactions between forest trees are driven by species' trait hierarchy, not phylogenetic or functional similarity: implications for forest community assembly.

    Science.gov (United States)

    Kunstler, Georges; Lavergne, Sébastien; Courbaud, Benoît; Thuiller, Wilfried; Vieilledent, Ghislain; Zimmermann, Niklaus E; Kattge, Jens; Coomes, David A

    2012-08-01

    The relative importance of competition vs. environmental filtering in the assembly of communities is commonly inferred from their functional and phylogenetic structure, on the grounds that similar species compete most strongly for resources and are therefore less likely to coexist locally. This approach ignores the possibility that competitive effects can be determined by relative positions of species on a hierarchy of competitive ability. Using growth data, we estimated 275 interaction coefficients between tree species in the French mountains. We show that interaction strengths are mainly driven by trait hierarchy and not by functional or phylogenetic similarity. On the basis of this result, we thus propose that functional and phylogenetic convergence in local tree community might be due to competition-sorting species with different competitive abilities and not only environmental filtering as commonly assumed. We then show a functional and phylogenetic convergence of forest structure with increasing plot age, which supports this view. © 2012 Blackwell Publishing Ltd/CNRS.

  18. Estimating phylogenetic trees from genome-scale data.

    Science.gov (United States)

    Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

    2015-12-01

    The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data. © 2015 New York Academy of Sciences.

  19. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

    Science.gov (United States)

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-08-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Effects of Phylogenetic Tree Style on Student Comprehension

    Science.gov (United States)

    Dees, Jonathan Andrew

    Phylogenetic trees are powerful tools of evolutionary biology that have become prominent across the life sciences. Consequently, learning to interpret and reason from phylogenetic trees is now an essential component of biology education. However, students often struggle to understand these diagrams, even after explicit instruction. One factor that has been observed to affect student understanding of phylogenetic trees is style (i.e., diagonal or bracket). The goal of this dissertation research was to systematically explore effects of style on student interpretations and construction of phylogenetic trees in the context of an introductory biology course. Before instruction, students were significantly more accurate with bracket phylogenetic trees for a variety of interpretation and construction tasks. Explicit instruction that balanced the use of diagonal and bracket phylogenetic trees mitigated some, but not all, style effects. After instruction, students were significantly more accurate for interpretation tasks involving taxa relatedness and construction exercises when using the bracket style. Based on this dissertation research and prior studies on style effects, I advocate for introductory biology instructors to use only the bracket style. Future research should examine causes of style effects and variables other than style to inform the development of research-based instruction that best supports student understanding of phylogenetic trees.

  1. Coalescent methods for estimating phylogenetic trees.

    Science.gov (United States)

    Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V

    2009-10-01

    We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.

  2. Visual exploration of parameter influence on phylogenetic trees.

    Science.gov (United States)

    Hess, Martin; Bremm, Sebastian; Weissgraeber, Stephanie; Hamacher, Kay; Goesele, Michael; Wiemeyer, Josef; von Landesberger, Tatiana

    2014-01-01

    Evolutionary relationships between organisms are frequently derived as phylogenetic trees inferred from multiple sequence alignments (MSAs). The MSA parameter space is exponentially large, so tens of thousands of potential trees can emerge for each dataset. A proposed visual-analytics approach can reveal the parameters' impact on the trees. Given input trees created with different parameter settings, it hierarchically clusters the trees according to their structural similarity. The most important clusters of similar trees are shown together with their parameters. This view offers interactive parameter exploration and automatic identification of relevant parameters. Biologists applied this approach to real data of 16S ribosomal RNA and protein sequences of ion channels. It revealed which parameters affected the tree structures. This led to a more reliable selection of the best trees.

  3. Nonbinary Tree-Based Phylogenetic Networks

    NARCIS (Netherlands)

    Jetten, L.; van Iersel, L.J.J.

    2018-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example

  4. A bijection between phylogenetic trees and plane oriented recursive trees

    OpenAIRE

    Prodinger, Helmut

    2017-01-01

    Phylogenetic trees are binary nonplanar trees with labelled leaves, and plane oriented recursive trees are planar trees with an increasing labelling. Both families are enumerated by double factorials. A bijection is constructed, using the respective representations a 2-partitions and trapezoidal words.

  5. Visualising very large phylogenetic trees in three dimensional hyperbolic space

    Directory of Open Access Journals (Sweden)

    Liberles David A

    2004-04-01

    Full Text Available Abstract Background Common existing phylogenetic tree visualisation tools are not able to display readable trees with more than a few thousand nodes. These existing methodologies are based in two dimensional space. Results We introduce the idea of visualising phylogenetic trees in three dimensional hyperbolic space with the Walrus graph visualisation tool and have developed a conversion tool that enables the conversion of standard phylogenetic tree formats to Walrus' format. With Walrus, it becomes possible to visualise and navigate phylogenetic trees with more than 100,000 nodes. Conclusion Walrus enables desktop visualisation of very large phylogenetic trees in 3 dimensional hyperbolic space. This application is potentially useful for visualisation of the tree of life and for functional genomics derivatives, like The Adaptive Evolution Database (TAED.

  6. Fourier transform inequalities for phylogenetic trees.

    Science.gov (United States)

    Matsen, Frederick A

    2009-01-01

    Phylogenetic invariants are not the only constraints on site-pattern frequency vectors for phylogenetic trees. A mutation matrix, by its definition, is the exponential of a matrix with non-negative off-diagonal entries; this positivity requirement implies non-trivial constraints on the site-pattern frequency vectors. We call these additional constraints "edge-parameter inequalities". In this paper, we first motivate the edge-parameter inequalities by considering a pathological site-pattern frequency vector corresponding to a quartet tree with a negative internal edge. This site-pattern frequency vector nevertheless satisfies all of the constraints described up to now in the literature. We next describe two complete sets of edge-parameter inequalities for the group-based models; these constraints are square-free monomial inequalities in the Fourier transformed coordinates. These inequalities, along with the phylogenetic invariants, form a complete description of the set of site-pattern frequency vectors corresponding to bona fide trees. Said in mathematical language, this paper explicitly presents two finite lists of inequalities in Fourier coordinates of the form "monomial < or = 1", each list characterizing the phylogenetically relevant semialgebraic subsets of the phylogenetic varieties.

  7. PhyloExplorer: a web server to validate, explore and query phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Auberval Nicolas

    2009-05-01

    Full Text Available Abstract Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: http://www.ncbi.orthomam.univ-montp2.fr/phyloexplorer/ and the source code can be downloaded from: http://code.google.com/p/taxomanie/.

  8. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution.

    Science.gov (United States)

    Kendall, Michelle; Colijn, Caroline

    2016-10-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. phylogenetics, evolution, tree metrics, genetics, sequencing. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. An efficient and extensible approach for compressing phylogenetic trees.

    Science.gov (United States)

    Matthews, Suzanne J; Williams, Tiffani L

    2011-10-18

    Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community.

  10. An efficient and extensible approach for compressing phylogenetic trees

    KAUST Repository

    Matthews, Suzanne J; Williams, Tiffani L

    2011-01-01

    Background: Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend

  11. TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees.

    Science.gov (United States)

    Sauvage, Thomas; Plouviez, Sophie; Schmidt, William E; Fredericq, Suzanne

    2018-03-05

    The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets. We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.

  12. Relating phylogenetic trees to transmission trees of infectious disease outbreaks.

    Science.gov (United States)

    Ypma, Rolf J F; van Ballegooijen, W Marijn; Wallinga, Jacco

    2013-11-01

    Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.

  13. Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.

    Science.gov (United States)

    Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine

    2017-04-01

    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.

  14. Maximum parsimony, substitution model, and probability phylogenetic trees.

    Science.gov (United States)

    Weng, J F; Thomas, D A; Mareels, I

    2011-01-01

    The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.

  15. An efficient and extensible approach for compressing phylogenetic trees

    KAUST Repository

    Matthews, Suzanne J

    2011-01-01

    Background: Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference.Results: On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings.Conclusions: TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. © 2011 Matthews and Williams; licensee BioMed Central Ltd.

  16. Using tree diversity to compare phylogenetic heuristics.

    Science.gov (United States)

    Sul, Seung-Jin; Matthews, Suzanne; Williams, Tiffani L

    2009-04-29

    Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic heuristics are used to search stochastically for the best-scoring trees in tree space. Given that better tree scores are believed to be better approximations of the true phylogeny, traditional evaluation techniques have used tree scores to determine the heuristics that find the best scores in the fastest time. We develop new techniques to evaluate phylogenetic heuristics based on both tree scores and topologies to compare Pauprat and Rec-I-DCM3, two popular Maximum Parsimony search algorithms. Our results show that although Pauprat and Rec-I-DCM3 find the trees with the same best scores, topologically these trees are quite different. Furthermore, the Rec-I-DCM3 trees cluster distinctly from the Pauprat trees. In addition to our heatmap visualizations of using parsimony scores and the Robinson-Foulds distance to compare best-scoring trees found by the two heuristics, we also develop entropy-based methods to show the diversity of the trees found. Overall, Pauprat identifies more diverse trees than Rec-I-DCM3. Overall, our work shows that there is value to comparing heuristics beyond the parsimony scores that they find. Pauprat is a slower heuristic than Rec-I-DCM3. However, our work shows that there is tremendous value in using Pauprat to reconstruct trees-especially since it finds identical scoring but topologically distinct trees. Hence, instead of discounting Pauprat, effort should go in improving its implementation. Ultimately, improved performance measures lead to better phylogenetic heuristics and will result in better approximations of the true evolutionary history of the organisms of interest.

  17. Interpreting the universal phylogenetic tree

    Science.gov (United States)

    Woese, C. R.

    2000-01-01

    The universal phylogenetic tree not only spans all extant life, but its root and earliest branchings represent stages in the evolutionary process before modern cell types had come into being. The evolution of the cell is an interplay between vertically derived and horizontally acquired variation. Primitive cellular entities were necessarily simpler and more modular in design than are modern cells. Consequently, horizontal gene transfer early on was pervasive, dominating the evolutionary dynamic. The root of the universal phylogenetic tree represents the first stage in cellular evolution when the evolving cell became sufficiently integrated and stable to the erosive effects of horizontal gene transfer that true organismal lineages could exist.

  18. Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf.

    Science.gov (United States)

    Cardona, Gabriel; Mir, Arnau; Rosselló, Francesc; Rotger, Lucía; Sánchez, David

    2013-01-16

    Phylogenetic tree comparison metrics are an important tool in the study of evolution, and hence the definition of such metrics is an interesting problem in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed to measure quantitatively the difference between a pair of phylogenetic trees by first encoding them by means of their half-matrices of cophenetic values, and then comparing these matrices. This idea has been used several times since then to define dissimilarity measures between phylogenetic trees but, to our knowledge, no proper metric on weighted phylogenetic trees with nested taxa based on this idea has been formally defined and studied yet. Actually, the cophenetic values of pairs of different taxa alone are not enough to single out phylogenetic trees with weighted arcs or nested taxa. For every (rooted) phylogenetic tree T, let its cophenetic vectorφ(T) consist of all pairs of cophenetic values between pairs of taxa in T and all depths of taxa in T. It turns out that these cophenetic vectors single out weighted phylogenetic trees with nested taxa. We then define a family of cophenetic metrics dφ,p by comparing these cophenetic vectors by means of Lp norms, and we study, either analytically or numerically, some of their basic properties: neighbors, diameter, distribution, and their rank correlation with each other and with other metrics. The cophenetic metrics can be safely used on weighted phylogenetic trees with nested taxa and no restriction on degrees, and they can be computed in O(n2) time, where n stands for the number of taxa. The metrics dφ,1 and dφ,2 have positive skewed distributions, and they show a low rank correlation with the Robinson-Foulds metric and the nodal metrics, and a very high correlation with each other and with the splitted nodal metrics. The diameter of dφ,p, for p⩾1 , is in O(n(p+2)/p), and thus for low p they are more discriminative, having a wider range of values.

  19. On the Shapley Value of Unrooted Phylogenetic Trees.

    Science.gov (United States)

    Wicke, Kristina; Fischer, Mareike

    2018-01-17

    The Shapley value, a solution concept from cooperative game theory, has recently been considered for both unrooted and rooted phylogenetic trees. Here, we focus on the Shapley value of unrooted trees and first revisit the so-called split counts of a phylogenetic tree and the Shapley transformation matrix that allows for the calculation of the Shapley value from the edge lengths of a tree. We show that non-isomorphic trees may have permutation-equivalent Shapley transformation matrices and permutation-equivalent null spaces. This implies that estimating the split counts associated with a tree or the Shapley values of its leaves does not suffice to reconstruct the correct tree topology. We then turn to the use of the Shapley value as a prioritization criterion in biodiversity conservation and compare it to a greedy solution concept. Here, we show that for certain phylogenetic trees, the Shapley value may fail as a prioritization criterion, meaning that the diversity spanned by the top k species (ranked by their Shapley values) cannot approximate the total diversity of all n species.

  20. New weighting methods for phylogenetic tree reconstruction using multiple loci.

    Science.gov (United States)

    Misawa, Kazuharu; Tajima, Fumio

    2012-08-01

    Efficient determination of evolutionary distances is important for the correct reconstruction of phylogenetic trees. The performance of the pooled distance required for reconstructing a phylogenetic tree can be improved by applying large weights to appropriate distances for reconstructing phylogenetic trees and small weights to inappropriate distances. We developed two weighting methods, the modified Tajima-Takezaki method and the modified least-squares method, for reconstructing phylogenetic trees from multiple loci. By computer simulations, we found that both of the new methods were more efficient in reconstructing correct topologies than the no-weight method. Hence, we reconstructed hominoid phylogenetic trees from mitochondrial DNA using our new methods, and found that the levels of bootstrap support were significantly increased by the modified Tajima-Takezaki and by the modified least-squares method.

  1. Enumerating all maximal frequent subtrees in collections of phylogenetic trees.

    Science.gov (United States)

    Deepak, Akshay; Fernández-Baca, David

    2014-01-01

    A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees.

  2. Load Balancing Issues with Constructing Phylogenetic Trees using Neighbour-Joining Algorithm

    International Nuclear Information System (INIS)

    Al Mamun, S M

    2012-01-01

    Phylogenetic tree construction is one of the most important and interesting problems in bioinformatics. Constructing an efficient phylogenetic tree has always been a research issue. It needs to consider both the correctness and the speed of the tree construction. In this paper, we implemented the neighbour-joining algorithm, using Message Passing Interface (MPI) for constructing the phylogenetic tree. Performance is efficacious, comparing to the best sequential algorithm. From this paper, it would be clear to the researchers that how load balance can make a great effect for constructing phylogenetic trees using neighbour-joining algorithm.

  3. YBYRÁ facilitates comparison of large phylogenetic trees.

    Science.gov (United States)

    Machado, Denis Jacob

    2015-07-01

    The number and size of tree topologies that are being compared by phylogenetic systematists is increasing due to technological advancements in high-throughput DNA sequencing. However, we still lack tools to facilitate comparison among phylogenetic trees with a large number of terminals. The "YBYRÁ" project integrates software solutions for data analysis in phylogenetics. It comprises tools for (1) topological distance calculation based on the number of shared splits or clades, (2) sensitivity analysis and automatic generation of sensitivity plots and (3) clade diagnoses based on different categories of synapomorphies. YBYRÁ also provides (4) an original framework to facilitate the search for potential rogue taxa based on how much they affect average matching split distances (using MSdist). YBYRÁ facilitates comparison of large phylogenetic trees and outperforms competing software in terms of usability and time efficiency, specially for large data sets. The programs that comprises this toolkit are written in Python, hence they do not require installation and have minimum dependencies. The entire project is available under an open-source licence at http://www.ib.usp.br/grant/anfibios/researchSoftware.html .

  4. Enumerating all maximal frequent subtrees in collections of phylogenetic trees

    Science.gov (United States)

    2014-01-01

    Background A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. Results We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Conclusions Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees. PMID:25061474

  5. Phylogenetic trees and Euclidean embeddings.

    Science.gov (United States)

    Layer, Mark; Rhodes, John A

    2017-01-01

    It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.

  6. Reconstruction of certain phylogenetic networks from their tree-average distances.

    Science.gov (United States)

    Willson, Stephen J

    2013-10-01

    Trees are commonly utilized to describe the evolutionary history of a collection of biological species, in which case the trees are called phylogenetic trees. Often these are reconstructed from data by making use of distances between extant species corresponding to the leaves of the tree. Because of increased recognition of the possibility of hybridization events, more attention is being given to the use of phylogenetic networks that are not necessarily trees. This paper describes the reconstruction of certain such networks from the tree-average distances between the leaves. For a certain class of phylogenetic networks, a polynomial-time method is presented to reconstruct the network from the tree-average distances. The method is proved to work if there is a single reticulation cycle.

  7. Orthology prediction at scalable resolution by phylogenetic tree analysis

    Directory of Open Access Journals (Sweden)

    Huynen Martijn A

    2007-03-01

    Full Text Available Abstract Background Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic relations results in inparalogs and outparalogs. For situations with more than two species we lack semantics to specifically describe the phylogenetic relations, let alone to exploit them. Published procedures to extract orthologous groups from phylogenetic trees do not allow identification of orthology at various levels of resolution, nor do they document the relations between the orthologous groups. Results We introduce "levels of orthology" to describe the multi-level nature of gene relations. This is implemented in a program LOFT (Levels of Orthology From Trees that assigns hierarchical orthology numbers to genes based on a phylogenetic tree. To decide upon speciation and gene duplication events in a tree LOFT can be instructed either to perform classical species-tree reconciliation or to use the species overlap between partitions in the tree. The hierarchical orthology numbers assigned by LOFT effectively summarize the phylogenetic relations between genes. The resulting high-resolution orthologous groups are depicted in colour, facilitating visual inspection of (large trees. A benchmark for orthology prediction, that takes into account the varying levels of orthology between genes, shows that the phylogeny-based high-resolution orthology assignments made by LOFT are reliable. Conclusion The "levels of orthology" concept offers high resolution, reliable orthology, while preserving the relations between orthologous groups. A Windows as well as a preliminary Java version of LOFT is available from the LOFT website http://www.cmbi.ru.nl/LOFT.

  8. Estimating phylogenetic relationships despite discordant gene trees across loci: the species tree of a diverse species group of feather mites (Acari: Proctophyllodidae).

    Science.gov (United States)

    Knowles, Lacey L; Klimov, Pavel B

    2011-11-01

    With the increased availability of multilocus sequence data, the lack of concordance of gene trees estimated for independent loci has focused attention on both the biological processes producing the discord and the methodologies used to estimate phylogenetic relationships. What has emerged is a suite of new analytical tools for phylogenetic inference--species tree approaches. In contrast to traditional phylogenetic methods that are stymied by the idiosyncrasies of gene trees, approaches for estimating species trees explicitly take into account the cause of discord among loci and, in the process, provides a direct estimate of phylogenetic history (i.e. the history of species divergence, not divergence of specific loci). We illustrate the utility of species tree estimates with an analysis of a diverse group of feather mites, the pinnatus species group (genus Proctophyllodes). Discord among four sequenced nuclear loci is consistent with theoretical expectations, given the short time separating speciation events (as evident by short internodes relative to terminal branch lengths in the trees). Nevertheless, many of the relationships are well resolved in a Bayesian estimate of the species tree; the analysis also highlights ambiguous aspects of the phylogeny that require additional loci. The broad utility of species tree approaches is discussed, and specifically, their application to groups with high speciation rates--a history of diversification with particular prevalence in host/parasite systems where species interactions can drive rapid diversification.

  9. Topological variation in single-gene phylogenetic trees

    OpenAIRE

    Castresana, Jose

    2007-01-01

    A recent large-scale phylogenomic study has shown the great degree of topological variation that can be found among eukaryotic phylogenetic trees constructed from single genes, highlighting the problems that can be associated with gene sampling in phylogenetic studies.

  10. Reconstruction of phylogenetic trees of prokaryotes using maximal common intervals.

    Science.gov (United States)

    Heydari, Mahdi; Marashi, Sayed-Amir; Tusserkani, Ruzbeh; Sadeghi, Mehdi

    2014-10-01

    One of the fundamental problems in bioinformatics is phylogenetic tree reconstruction, which can be used for classifying living organisms into different taxonomic clades. The classical approach to this problem is based on a marker such as 16S ribosomal RNA. Since evolutionary events like genomic rearrangements are not included in reconstructions of phylogenetic trees based on single genes, much effort has been made to find other characteristics for phylogenetic reconstruction in recent years. With the increasing availability of completely sequenced genomes, gene order can be considered as a new solution for this problem. In the present work, we applied maximal common intervals (MCIs) in two or more genomes to infer their distance and to reconstruct their evolutionary relationship. Additionally, measures based on uncommon segments (UCS's), i.e., those genomic segments which are not detected as part of any of the MCIs, are also used for phylogenetic tree reconstruction. We applied these two types of measures for reconstructing the phylogenetic tree of 63 prokaryotes with known COG (clusters of orthologous groups) families. Similarity between the MCI-based (resp. UCS-based) reconstructed phylogenetic trees and the phylogenetic tree obtained from NCBI taxonomy browser is as high as 93.1% (resp. 94.9%). We show that in the case of this diverse dataset of prokaryotes, tree reconstruction based on MCI and UCS outperforms most of the currently available methods based on gene orders, including breakpoint distance and DCJ. We additionally tested our new measures on a dataset of 13 closely-related bacteria from the genus Prochlorococcus. In this case, distances like rearrangement distance, breakpoint distance and DCJ proved to be useful, while our new measures are still appropriate for phylogenetic reconstruction. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  11. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.

    Science.gov (United States)

    Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H

    2014-02-26

    Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.

  12. New substitution models for rooting phylogenetic trees.

    Science.gov (United States)

    Williams, Tom A; Heaps, Sarah E; Cherlin, Svetlana; Nye, Tom M W; Boys, Richard J; Embley, T Martin

    2015-09-26

    The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made. © 2015 The Authors.

  13. A Universal Phylogenetic Tree.

    Science.gov (United States)

    Offner, Susan

    2001-01-01

    Presents a universal phylogenetic tree suitable for use in high school and college-level biology classrooms. Illustrates the antiquity of life and that all life is related, even if it dates back 3.5 billion years. Reflects important evolutionary relationships and provides an exciting way to learn about the history of life. (SAH)

  14. Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life.

    Science.gov (United States)

    Puigbò, Pere; Wolf, Yuri I; Koonin, Eugene V

    2012-01-01

    Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the application of these methods to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."

  15. Species divergence and phylogenetic variation of ecophysiological traits in lianas and trees.

    Science.gov (United States)

    Rios, Rodrigo S; Salgado-Luarte, Cristian; Gianoli, Ernesto

    2014-01-01

    The climbing habit is an evolutionary key innovation in plants because it is associated with enhanced clade diversification. We tested whether patterns of species divergence and variation of three ecophysiological traits that are fundamental for plant adaptation to light environments (maximum photosynthetic rate [A(max)], dark respiration rate [R(d)], and specific leaf area [SLA]) are consistent with this key innovation. Using data reported from four tropical forests and three temperate forests, we compared phylogenetic distance among species as well as the evolutionary rate, phylogenetic distance and phylogenetic signal of those traits in lianas and trees. Estimates of evolutionary rates showed that R(d) evolved faster in lianas, while SLA evolved faster in trees. The mean phylogenetic distance was 1.2 times greater among liana species than among tree species. Likewise, estimates of phylogenetic distance indicated that lianas were less related than by chance alone (phylogenetic evenness across 63 species), and trees were more related than expected by chance (phylogenetic clustering across 71 species). Lianas showed evenness for R(d), while trees showed phylogenetic clustering for this trait. In contrast, for SLA, lianas exhibited phylogenetic clustering and trees showed phylogenetic evenness. Lianas and trees showed patterns of ecophysiological trait variation among species that were independent of phylogenetic relatedness. We found support for the expected pattern of greater species divergence in lianas, but did not find consistent patterns regarding ecophysiological trait evolution and divergence. R(d) followed the species-level pattern, i.e., greater divergence/evolution in lianas compared to trees, while the opposite occurred for SLA and no pattern was detected for A(max). R(d) may have driven lianas' divergence across forest environments, and might contribute to diversification in climber clades.

  16. Frugivores bias seed-adult tree associations through nonrandom seed dispersal: a phylogenetic approach.

    Science.gov (United States)

    Razafindratsima, Onja H; Dunham, Amy E

    2016-08-01

    Frugivores are the main seed dispersers in many ecosystems, such that behaviorally driven, nonrandom patterns of seed dispersal are a common process; but patterns are poorly understood. Characterizing these patterns may be essential for understanding spatial organization of fruiting trees and drivers of seed-dispersal limitation in biodiverse forests. To address this, we studied resulting spatial associations between dispersed seeds and adult tree neighbors in a diverse rainforest in Madagascar, using a temporal and phylogenetic approach. Data show that by using fruiting trees as seed-dispersal foci, frugivores bias seed dispersal under conspecific adults and under heterospecific trees that share dispersers and fruiting time with the dispersed species. Frugivore-mediated seed dispersal also resulted in nonrandom phylogenetic associations of dispersed seeds with their nearest adult neighbors, in nine out of the 16 months of our study. However, these nonrandom phylogenetic associations fluctuated unpredictably over time, ranging from clustered to overdispersed. The spatial and phylogenetic template of seed dispersal did not translate to similar patterns of association in adult tree neighborhoods, suggesting the importance of post-dispersal processes in structuring plant communities. Results suggest that frugivore-mediated seed dispersal is important for structuring early stages of plant-plant associations, setting the template for post-dispersal processes that influence ultimate patterns of plant recruitment. Importantly, if biased patterns of dispersal are common in other systems, frugivores may promote tree coexistence in biodiverse forests by limiting the frequency and diversity of heterospecific interactions of seeds they disperse. © 2016 by the Ecological Society of America.

  17. Autumn Algorithm-Computation of Hybridization Networks for Realistic Phylogenetic Trees.

    Science.gov (United States)

    Huson, Daniel H; Linz, Simone

    2018-01-01

    A minimum hybridization network is a rooted phylogenetic network that displays two given rooted phylogenetic trees using a minimum number of reticulations. Previous mathematical work on their calculation has usually assumed the input trees to be bifurcating, correctly rooted, or that they both contain the same taxa. These assumptions do not hold in biological studies and "realistic" trees have multifurcations, are difficult to root, and rarely contain the same taxa. We present a new algorithm for computing minimum hybridization networks for a given pair of "realistic" rooted phylogenetic trees. We also describe how the algorithm might be used to improve the rooting of the input trees. We introduce the concept of "autumn trees", a nice framework for the formulation of algorithms based on the mathematics of "maximum acyclic agreement forests". While the main computational problem is hard, the run-time depends mainly on how different the given input trees are. In biological studies, where the trees are reasonably similar, our parallel implementation performs well in practice. The algorithm is available in our open source program Dendroscope 3, providing a platform for biologists to explore rooted phylogenetic networks. We demonstrate the utility of the algorithm using several previously studied data sets.

  18. Community Phylogenetics: Assessing Tree Reconstruction Methods and the Utility of DNA Barcodes

    Science.gov (United States)

    Boyle, Elizabeth E.; Adamowicz, Sarah J.

    2015-01-01

    Studies examining phylogenetic community structure have become increasingly prevalent, yet little attention has been given to the influence of the input phylogeny on metrics that describe phylogenetic patterns of co-occurrence. Here, we examine the influence of branch length, tree reconstruction method, and amount of sequence data on measures of phylogenetic community structure, as well as the phylogenetic signal (Pagel’s λ) in morphological traits, using Trichoptera larval communities from Churchill, Manitoba, Canada. We find that model-based tree reconstruction methods and the use of a backbone family-level phylogeny improve estimations of phylogenetic community structure. In addition, trees built using the barcode region of cytochrome c oxidase subunit I (COI) alone accurately predict metrics of phylogenetic community structure obtained from a multi-gene phylogeny. Input tree did not alter overall conclusions drawn for phylogenetic signal, as significant phylogenetic structure was detected in two body size traits across input trees. As the discipline of community phylogenetics continues to expand, it is important to investigate the best approaches to accurately estimate patterns. Our results suggest that emerging large datasets of DNA barcode sequences provide a vast resource for studying the structure of biological communities. PMID:26110886

  19. Phylogenetic Structure of Tree Species across Different Life Stages from Seedlings to Canopy Trees in a Subtropical Evergreen Broad-Leaved Forest.

    Science.gov (United States)

    Jin, Yi; Qian, Hong; Yu, Mingjian

    2015-01-01

    Investigating patterns of phylogenetic structure across different life stages of tree species in forests is crucial to understanding forest community assembly, and investigating forest gap influence on the phylogenetic structure of forest regeneration is necessary for understanding forest community assembly. Here, we examine the phylogenetic structure of tree species across life stages from seedlings to canopy trees, as well as forest gap influence on the phylogenetic structure of forest regeneration in a forest of the subtropical region in China. We investigate changes in phylogenetic relatedness (measured as NRI) of tree species from seedlings, saplings, treelets to canopy trees; we compare the phylogenetic turnover (measured as βNRI) between canopy trees and seedlings in forest understory with that between canopy trees and seedlings in forest gaps. We found that phylogenetic relatedness generally increases from seedlings through saplings and treelets up to canopy trees, and that phylogenetic relatedness does not differ between seedlings in forest understory and those in forest gaps, but phylogenetic turnover between canopy trees and seedlings in forest understory is lower than that between canopy trees and seedlings in forest gaps. We conclude that tree species tend to be more closely related from seedling to canopy layers, and that forest gaps alter the seedling phylogenetic turnover of the studied forest. It is likely that the increasing trend of phylogenetic clustering as tree stem size increases observed in this subtropical forest is primarily driven by abiotic filtering processes, which select a set of closely related evergreen broad-leaved tree species whose regeneration has adapted to the closed canopy environments of the subtropical forest developed under the regional monsoon climate.

  20. TreeCluster: Massively scalable transmission clustering using phylogenetic trees

    OpenAIRE

    Moshiri, Alexander

    2018-01-01

    Background: The ability to infer transmission clusters from molecular data is critical to designing and evaluating viral control strategies. Viral sequencing datasets are growing rapidly, but standard methods of transmission cluster inference do not scale well beyond thousands of sequences. Results: I present TreeCluster, a cross-platform tool that performs transmission cluster inference on a given phylogenetic tree orders of magnitude faster than existing inference methods and supports multi...

  1. PhyTB: Phylogenetic tree visualisation and sample positioning for M. tuberculosis

    KAUST Repository

    Benavente, Ernest D

    2015-05-13

    Background Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting. Results We have developed a web-based tool called PhyTB (http://pathogenseq.lshtm.ac.uk/phytblive/index.php webcite) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates. Conclusion PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms (http://sourceforge.net/projects/phylotrack webcite).

  2. Student Interpretations of Phylogenetic Trees in an Introductory Biology Course

    Science.gov (United States)

    Dees, Jonathan; Momsen, Jennifer L.; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa…

  3. One tree to link them all: a phylogenetic dataset for the European tetrapoda.

    Science.gov (United States)

    Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

    2014-08-08

    Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty.

  4. Phylogenetic framework for coevolutionary studies: a compass for exploring jungles of tangled trees.

    Science.gov (United States)

    Martínez-Aquino, Andrés

    2016-08-01

    Phylogenetics is used to detect past evolutionary events, from how species originated to how their ecological interactions with other species arose, which can mirror cophylogenetic patterns. Cophylogenetic reconstructions uncover past ecological relationships between taxa through inferred coevolutionary events on trees, for example, codivergence, duplication, host-switching, and loss. These events can be detected by cophylogenetic analyses based on nodes and the length and branching pattern of the phylogenetic trees of symbiotic associations, for example, host-parasite. In the past 2 decades, algorithms have been developed for cophylogetenic analyses and implemented in different software, for example, statistical congruence index and event-based methods. Based on the combination of these approaches, it is possible to integrate temporal information into cophylogenetical inference, such as estimates of lineage divergence times between 2 taxa, for example, hosts and parasites. Additionally, the advances in phylogenetic biogeography applying methods based on parametric process models and combined Bayesian approaches, can be useful for interpreting coevolutionary histories in a scenario of biogeographical area connectivity through time. This article briefly reviews the basics of parasitology and provides an overview of software packages in cophylogenetic methods. Thus, the objective here is to present a phylogenetic framework for coevolutionary studies, with special emphasis on groups of parasitic organisms. Researchers wishing to undertake phylogeny-based coevolutionary studies can use this review as a "compass" when "walking" through jungles of tangled phylogenetic trees.

  5. Accurate phylogenetic tree reconstruction from quartets: a heuristic approach.

    Science.gov (United States)

    Reaz, Rezwana; Bayzid, Md Shamsuzzoha; Rahman, M Sohel

    2014-01-01

    Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A 'quartet' is an unrooted tree over 4 taxa, hence the quartet-based supertree methods combine many 4-taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets.

  6. Treelink: data integration, clustering and visualization of phylogenetic trees.

    Science.gov (United States)

    Allende, Christian; Sohn, Erik; Little, Cedric

    2015-12-29

    Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .

  7. Inferring phylogenetic trees from the knowledge of rare evolutionary events.

    Science.gov (United States)

    Hellmuth, Marc; Hernandez-Rosales, Maribel; Long, Yangjing; Stadler, Peter F

    2018-06-01

    Rare events have played an increasing role in molecular phylogenetics as potentially homoplasy-poor characters. In this contribution we analyze the phylogenetic information content from a combinatorial point of view by considering the binary relation on the set of taxa defined by the existence of a single event separating two taxa. We show that the graph-representation of this relation must be a tree. Moreover, we characterize completely the relationship between the tree of such relations and the underlying phylogenetic tree. With directed operations such as tandem-duplication-random-loss events in mind we demonstrate how non-symmetric information constrains the position of the root in the partially reconstructed phylogeny.

  8. Whole Genome Phylogenetic Tree Reconstruction using Colored de Bruijn Graphs

    OpenAIRE

    Lyman, Cole

    2017-01-01

    We present kleuren, a novel assembly-free method to reconstruct phylogenetic trees using the Colored de Bruijn Graph. kleuren works by constructing the Colored de Bruijn Graph and then traversing it, finding bubble structures in the graph that provide phylogenetic signal. The bubbles are then aligned and concatenated to form a supermatrix, from which a phylogenetic tree is inferred. We introduce the algorithm that kleuren uses to accomplish this task, and show its performance on reconstructin...

  9. Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction.

    Science.gov (United States)

    Mai, Uyen; Sayyari, Erfan; Mirarab, Siavash

    2017-01-01

    Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods.

  10. Phylogenetic affinity of tree shrews to Glires is attributed to fast evolution rate.

    Science.gov (United States)

    Lin, Jiannan; Chen, Guangfeng; Gu, Liang; Shen, Yuefeng; Zheng, Meizhu; Zheng, Weisheng; Hu, Xinjie; Zhang, Xiaobai; Qiu, Yu; Liu, Xiaoqing; Jiang, Cizhong

    2014-02-01

    Previous phylogenetic analyses have led to incongruent evolutionary relationships between tree shrews and other suborders of Euarchontoglires. What caused the incongruence remains elusive. In this study, we identified 6845 orthologous genes between seventeen placental mammals. Tree shrews and Primates were monophyletic in the phylogenetic trees derived from the first or/and second codon positions whereas tree shrews and Glires formed a monophyly in the trees derived from the third or all codon positions. The same topology was obtained in the phylogeny inference using the slowly and fast evolving genes, respectively. This incongruence was likely attributed to the fast substitution rate in tree shrews and Glires. Notably, sequence GC content only was not informative to resolve the controversial phylogenetic relationships between tree shrews, Glires, and Primates. Finally, estimation in the confidence of the tree selection strongly supported the phylogenetic affiliation of tree shrews to Primates as a monophyly. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Phylogenetic impoverishment of Amazonian tree communities in an experimentally fragmented forest landscape.

    Science.gov (United States)

    Santos, Bráulio A; Tabarelli, Marcelo; Melo, Felipe P L; Camargo, José L C; Andrade, Ana; Laurance, Susan G; Laurance, William F

    2014-01-01

    Amazonian rainforests sustain some of the richest tree communities on Earth, but their ecological and evolutionary responses to human threats remain poorly known. We used one of the largest experimental datasets currently available on tree dynamics in fragmented tropical forests and a recent phylogeny of angiosperms to test whether tree communities have lost phylogenetic diversity since their isolation about two decades previously. Our findings revealed an overall trend toward phylogenetic impoverishment across the experimentally fragmented landscape, irrespective of whether tree communities were in 1-ha, 10-ha, or 100-ha forest fragments, near forest edges, or in continuous forest. The magnitude of the phylogenetic diversity loss was low (phylogenetic diversity, we observed a significant decrease of 50% in phylogenetic dispersion since forest isolation, irrespective of plot location. Analyses based on tree genera that have significantly increased (28 genera) or declined (31 genera) in abundance and basal area in the landscape revealed that increasing genera are more phylogenetically related than decreasing ones. Also, the loss of phylogenetic diversity was greater in tree communities where increasing genera proliferated and decreasing genera reduced their importance values, suggesting that this taxonomic replacement is partially underlying the phylogenetic impoverishment at the landscape scale. This finding has clear implications for the current debate about the role human-modified landscapes play in sustaining biodiversity persistence and key ecosystem services, such as carbon storage. Although the generalization of our findings to other fragmented tropical forests is uncertain, it could negatively affect ecosystem productivity and stability and have broader impacts on coevolved organisms.

  12. Two results on expected values of imbalance indices of phylogenetic trees

    OpenAIRE

    Mir, Arnau; Rossello, Francesc

    2012-01-01

    We compute an explicit formula for the expected value of the Colless index of a phylogenetic tree generated under the Yule model, and an explicit formula for the expected value of the Sackin index of a phylogenetic tree generated under the uniform model.

  13. treeman: an R package for efficient and intuitive manipulation of phylogenetic trees.

    Science.gov (United States)

    Bennett, Dominic J; Sutton, Mark D; Turvey, Samuel T

    2017-01-07

    Phylogenetic trees are hierarchical structures used for representing the inter-relationships between biological entities. They are the most common tool for representing evolution and are essential to a range of fields across the life sciences. The manipulation of phylogenetic trees-in terms of adding or removing tips-is often performed by researchers not just for reasons of management but also for performing simulations in order to understand the processes of evolution. Despite this, the most common programming language among biologists, R, has few class structures well suited to these tasks. We present an R package that contains a new class, called TreeMan, for representing the phylogenetic tree. This class has a list structure allowing phylogenetic trees to be manipulated more efficiently. Computational running times are reduced because of the ready ability to vectorise and parallelise methods. Development is also improved due to fewer lines of code being required for performing manipulation processes. We present three use cases-pinning missing taxa to a supertree, simulating evolution with a tree-growth model and detecting significant phylogenetic turnover-that demonstrate the new package's speed and simplicity.

  14. TreeFam: a curated database of phylogenetic trees of animal gene families

    DEFF Research Database (Denmark)

    Li, Heng; Coghlan, Avril; Ruan, Jue

    2006-01-01

    TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively......, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins...

  15. Species trees for the tree swallows (Genus Tachycineta): an alternative phylogenetic hypothesis to the mitochondrial gene tree.

    Science.gov (United States)

    Dor, Roi; Carling, Matthew D; Lovette, Irby J; Sheldon, Frederick H; Winkler, David W

    2012-10-01

    The New World swallow genus Tachycineta comprises nine species that collectively have a wide geographic distribution and remarkable variation both within- and among-species in ecologically important traits. Existing phylogenetic hypotheses for Tachycineta are based on mitochondrial DNA sequences, thus they provide estimates of a single gene tree. In this study we sequenced multiple individuals from each species at 16 nuclear intron loci. We used gene concatenated approaches (Bayesian and maximum likelihood) as well as coalescent-based species tree inference to reconstruct phylogenetic relationships of the genus. We examined the concordance and conflict between the nuclear and mitochondrial trees and between concatenated and coalescent-based inferences. Our results provide an alternative phylogenetic hypothesis to the existing mitochondrial DNA estimate of phylogeny. This new hypothesis provides a more accurate framework in which to explore trait evolution and examine the evolution of the mitochondrial genome in this group. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. Ecological interactions are evolutionarily conserved across the entire tree of life.

    Science.gov (United States)

    Gómez, José M; Verdú, Miguel; Perfectti, Francisco

    2010-06-17

    Ecological interactions are crucial to understanding both the ecology and the evolution of organisms. Because the phenotypic traits regulating species interactions are largely a legacy of their ancestors, it is widely assumed that ecological interactions are phylogenetically conserved, with closely related species interacting with similar partners. However, the existing empirical evidence is inadequate to appropriately evaluate the hypothesis of phylogenetic conservatism in ecological interactions, because it is both ecologically and taxonomically biased. In fact, most studies on the evolution of ecological interactions have focused on specialized organisms, such as some parasites or insect herbivores, belonging to a limited subset of the overall tree of life. Here we study the evolution of host use in a large and diverse group of interactions comprising both specialist and generalist acellular, unicellular and multicellular organisms. We show that, as previously found for specialized interactions, generalized interactions can be evolutionarily conserved. Significant phylogenetic conservatism of interaction patterns was equally likely to occur in symbiotic and non-symbiotic interactions, as well as in mutualistic and antagonistic interactions. Host-use differentiation among species was higher in phylogenetically conserved clades, irrespective of their generalization degree and taxonomic position within the tree of life. Our findings strongly suggest a shared pattern in the organization of biological systems through evolutionary time, mediated by marked conservatism of ecological interactions among taxa.

  17. Edge-related loss of tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

    Science.gov (United States)

    Santos, Bráulio A; Arroyo-Rodríguez, Víctor; Moreno, Claudia E; Tabarelli, Marcelo

    2010-09-08

    Deforestation and forest fragmentation are known major causes of nonrandom extinction, but there is no information about their impact on the phylogenetic diversity of the remaining species assemblages. Using a large vegetation dataset from an old hyper-fragmented landscape in the Brazilian Atlantic rainforest we assess whether the local extirpation of tree species and functional impoverishment of tree assemblages reduce the phylogenetic diversity of the remaining tree assemblages. We detected a significant loss of tree phylogenetic diversity in forest edges, but not in core areas of small (phylogenetic distance between any two randomly chosen individuals from forest edges; an increase of 17% in the average phylogenetic distance to closest non-conspecific relative for each individual in forest edges; and to the potential manifestation of late edge effects in the core areas of small forest remnants. We found no evidence supporting fragmentation-induced phylogenetic clustering or evenness. This could be explained by the low phylogenetic conservatism of key life-history traits corresponding to vulnerable species. Edge effects must be reduced to effectively protect tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

  18. GenNon-h: Generating multiple sequence alignments on nonhomogeneous phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Kedzierska Anna M

    2012-08-01

    Full Text Available Abstract Background A number of software packages are available to generate DNA multiple sequence alignments (MSAs evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages. Results We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site, the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. Conclusion The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.

  19. Universal artifacts affect the branching of phylogenetic trees, not universal scaling laws.

    Science.gov (United States)

    Altaba, Cristian R

    2009-01-01

    The superficial resemblance of phylogenetic trees to other branching structures allows searching for macroevolutionary patterns. However, such trees are just statistical inferences of particular historical events. Recent meta-analyses report finding regularities in the branching pattern of phylogenetic trees. But is this supported by evidence, or are such regularities just methodological artifacts? If so, is there any signal in a phylogeny? In order to evaluate the impact of polytomies and imbalance on tree shape, the distribution of all binary and polytomic trees of up to 7 taxa was assessed in tree-shape space. The relationship between the proportion of outgroups and the amount of imbalance introduced with them was assessed applying four different tree-building methods to 100 combinations from a set of 10 ingroup and 9 outgroup species, and performing covariance analyses. The relevance of this analysis was explored taking 61 published phylogenies, based on nucleic acid sequences and involving various taxa, taxonomic levels, and tree-building methods. All methods of phylogenetic inference are quite sensitive to the artifacts introduced by outgroups. However, published phylogenies appear to be subject to a rather effective, albeit rather intuitive control against such artifacts. The data and methods used to build phylogenetic trees are varied, so any meta-analysis is subject to pitfalls due to their uneven intrinsic merits, which translate into artifacts in tree shape. The binary branching pattern is an imposition of methods, and seldom reflects true relationships in intraspecific analyses, yielding artifactual polytomies in short trees. Above the species level, the departure of real trees from simplistic random models is caused at least by two natural factors--uneven speciation and extinction rates; and artifacts such as choice of taxa included in the analysis, and imbalance introduced by outgroups and basal paraphyletic taxa. This artifactual imbalance accounts

  20. The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees.

    Science.gov (United States)

    Soria-Carrasco, Víctor; Talavera, Gerard; Igea, Javier; Castresana, Jose

    2007-11-01

    We introduce a new phylogenetic comparison method that measures overall differences in the relative branch length and topology of two phylogenetic trees. To do this, the algorithm first scales one of the trees to have a global divergence as similar as possible to the other tree. Then, the branch length distance, which takes differences in topology and branch lengths into account, is applied to the two trees. We thus obtain the minimum branch length distance or K tree score. Two trees with very different relative branch lengths get a high K score whereas two trees that follow a similar among-lineage rate variation get a low score, regardless of the overall rates in both trees. There are several applications of the K tree score, two of which are explained here in more detail. First, this score allows the evaluation of the performance of phylogenetic algorithms, not only with respect to their topological accuracy, but also with respect to the reproduction of a given branch length variation. In a second example, we show how the K score allows the selection of orthologous genes by choosing those that better follow the overall shape of a given reference tree. http://molevol.ibmb.csic.es/Ktreedist.html

  1. FPGA Hardware Acceleration of a Phylogenetic Tree Reconstruction with Maximum Parsimony Algorithm

    OpenAIRE

    BLOCK, Henry; MARUYAMA, Tsutomu

    2017-01-01

    In this paper, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with a maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Progressive Neighborhood and the Indirect Calculation of Tree Lengths method. This method is widely used for the acceleration of the phylogenetic tree reconstruction algorithm in software. In our implementation, we define a tree structure and accelerate the search by parallel an...

  2. Recursive algorithms for phylogenetic tree counting.

    Science.gov (United States)

    Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

    2013-10-28

    In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.

  3. Model checking software for phylogenetic trees using distribution and database methods

    Directory of Open Access Journals (Sweden)

    Requeno José Ignacio

    2013-12-01

    Full Text Available Model checking, a generic and formal paradigm stemming from computer science based on temporal logics, has been proposed for the study of biological properties that emerge from the labeling of the states defined over the phylogenetic tree. This strategy allows us to use generic software tools already present in the industry. However, the performance of traditional model checking is penalized when scaling the system for large phylogenies. To this end, two strategies are presented here. The first one consists of partitioning the phylogenetic tree into a set of subgraphs each one representing a subproblem to be verified so as to speed up the computation time and distribute the memory consumption. The second strategy is based on uncoupling the information associated to each state of the phylogenetic tree (mainly, the DNA sequence and exporting it to an external tool for the management of large information systems. The integration of all these approaches outperforms the results of monolithic model checking and helps us to execute the verification of properties in a real phylogenetic tree.

  4. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

    Science.gov (United States)

    Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver

    2008-07-01

    DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree

  5. A new algorithm to construct phylogenetic networks from trees.

    Science.gov (United States)

    Wang, J

    2014-03-06

    Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.

  6. Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

    Science.gov (United States)

    Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

    2014-11-19

    Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new

  7. New approaches to phylogenetic tree search and their application to large numbers of protein alignments.

    Science.gov (United States)

    Whelan, Simon

    2007-10-01

    Phylogenetic tree estimation plays a critical role in a wide variety of molecular studies, including molecular systematics, phylogenetics, and comparative genomics. Finding the optimal tree relating a set of sequences using score-based (optimality criterion) methods, such as maximum likelihood and maximum parsimony, may require all possible trees to be considered, which is not feasible even for modest numbers of sequences. In practice, trees are estimated using heuristics that represent a trade-off between topological accuracy and speed. I present a series of novel algorithms suitable for score-based phylogenetic tree reconstruction that demonstrably improve the accuracy of tree estimates while maintaining high computational speeds. The heuristics function by allowing the efficient exploration of large numbers of trees through novel hill-climbing and resampling strategies. These heuristics, and other computational approximations, are implemented for maximum likelihood estimation of trees in the program Leaphy, and its performance is compared to other popular phylogenetic programs. Trees are estimated from 4059 different protein alignments using a selection of phylogenetic programs and the likelihoods of the tree estimates are compared. Trees estimated using Leaphy are found to have equal to or better likelihoods than trees estimated using other phylogenetic programs in 4004 (98.6%) families and provide a unique best tree that no other program found in 1102 (27.1%) families. The improvement is particularly marked for larger families (80 to 100 sequences), where Leaphy finds a unique best tree in 81.7% of families.

  8. Edge-related loss of tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

    Directory of Open Access Journals (Sweden)

    Bráulio A Santos

    Full Text Available Deforestation and forest fragmentation are known major causes of nonrandom extinction, but there is no information about their impact on the phylogenetic diversity of the remaining species assemblages. Using a large vegetation dataset from an old hyper-fragmented landscape in the Brazilian Atlantic rainforest we assess whether the local extirpation of tree species and functional impoverishment of tree assemblages reduce the phylogenetic diversity of the remaining tree assemblages. We detected a significant loss of tree phylogenetic diversity in forest edges, but not in core areas of small (<80 ha forest fragments. This was attributed to a reduction of 11% in the average phylogenetic distance between any two randomly chosen individuals from forest edges; an increase of 17% in the average phylogenetic distance to closest non-conspecific relative for each individual in forest edges; and to the potential manifestation of late edge effects in the core areas of small forest remnants. We found no evidence supporting fragmentation-induced phylogenetic clustering or evenness. This could be explained by the low phylogenetic conservatism of key life-history traits corresponding to vulnerable species. Edge effects must be reduced to effectively protect tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

  9. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    Science.gov (United States)

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  10. Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

    Science.gov (United States)

    Sumner, Jeremy G

    2017-03-01

    We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

  11. TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

    Energy Technology Data Exchange (ETDEWEB)

    Gu, Shengyin; Anderson, Iain; Kunin, Victor; Cipriano, Michael; Minovitsky, Simon; Weber, Gunther; Amenta, Nina; Hamann, Bernd; Dubchak,Inna

    2007-05-07

    Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

  12. A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.

    Science.gov (United States)

    Brewer, Steven D.

    This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…

  13. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees.

    Science.gov (United States)

    Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Hu, Songnian; Chen, Wei-Hua

    2012-07-01

    EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at: http://www.evolgenius.info/evolview.html.

  14. On the number of vertices of each rank in phylogenetic trees and their generalizations

    OpenAIRE

    Bóna, Miklós

    2015-01-01

    We find surprisingly simple formulas for the limiting probability that the rank of a randomly selected vertex in a randomly selected phylogenetic tree or generalized phylogenetic tree is a given integer.

  15. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Science.gov (United States)

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  16. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Directory of Open Access Journals (Sweden)

    Steven Kelly

    Full Text Available The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  17. Inference of Transmission Network Structure from HIV Phylogenetic Trees.

    Science.gov (United States)

    Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan; Britton, Tom; Leitner, Thomas

    2017-01-01

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.

  18. Do Branch Lengths Help to Locate a Tree in a Phylogenetic Network?

    Science.gov (United States)

    Gambette, Philippe; van Iersel, Leo; Kelk, Steven; Pardi, Fabio; Scornavacca, Celine

    2016-09-01

    Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms, this is often translated in the following way: is a given phylogenetic tree contained in a given phylogenetic network? Recently this tree containment problem has been widely investigated from a computational perspective, but most studies have only focused on the topology of the phylogenies, ignoring a piece of information that, in the case of phylogenetic trees, is routinely inferred by evolutionary analyses: branch lengths. These measure the amount of change (e.g., nucleotide substitutions) that has occurred along each branch of the phylogeny. Here, we study a number of versions of the tree containment problem that explicitly account for branch lengths. We show that, although length information has the potential to locate more precisely a tree within a network, the problem is computationally hard in its most general form. On a positive note, for a number of special cases of biological relevance, we provide algorithms that solve this problem efficiently. This includes the case of networks of limited complexity, for which it is possible to recover, among the trees contained by the network with the same topology as the input tree, the closest one in terms of branch lengths.

  19. Evaluation of properties over phylogenetic trees using stochastic logics.

    Science.gov (United States)

    Requeno, José Ignacio; Colom, José Manuel

    2016-06-14

    Model checking has been recently introduced as an integrated framework for extracting information of the phylogenetic trees using temporal logics as a querying language, an extension of modal logics that imposes restrictions of a boolean formula along a path of events. The phylogenetic tree is considered a transition system modeling the evolution as a sequence of genomic mutations (we understand mutation as different ways that DNA can be changed), while this kind of logics are suitable for traversing it in a strict and exhaustive way. Given a biological property that we desire to inspect over the phylogeny, the verifier returns true if the specification is satisfied or a counterexample that falsifies it. However, this approach has been only considered over qualitative aspects of the phylogeny. In this paper, we repair the limitations of the previous framework for including and handling quantitative information such as explicit time or probability. To this end, we apply current probabilistic continuous-time extensions of model checking to phylogenetics. We reinterpret a catalog of qualitative properties in a numerical way, and we also present new properties that couldn't be analyzed before. For instance, we obtain the likelihood of a tree topology according to a mutation model. As case of study, we analyze several phylogenies in order to obtain the maximum likelihood with the model checking tool PRISM. In addition, we have adapted the software for optimizing the computation of maximum likelihoods. We have shown that probabilistic model checking is a competitive framework for describing and analyzing quantitative properties over phylogenetic trees. This formalism adds soundness and readability to the definition of models and specifications. Besides, the existence of model checking tools hides the underlying technology, omitting the extension, upgrade, debugging and maintenance of a software tool to the biologists. A set of benchmarks justify the feasibility of our

  20. Computing all hybridization networks for multiple binary phylogenetic input trees.

    Science.gov (United States)

    Albrecht, Benjamin

    2015-07-30

    The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.

  1. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  2. Constructing Student Problems in Phylogenetic Tree Construction.

    Science.gov (United States)

    Brewer, Steven D.

    Evolution is often equated with natural selection and is taught from a primarily functional perspective while comparative and historical approaches, which are critical for developing an appreciation of the power of evolutionary theory, are often neglected. This report describes a study of expert problem-solving in phylogenetic tree construction.…

  3. Efficient Computation of Popular Phylogenetic Tree Measures

    DEFF Research Database (Denmark)

    Tsirogiannis, Constantinos; Sandel, Brody Steven; Cheliotis, Dimitris

    2012-01-01

    Given a phylogenetic tree $\\mathcal{T}$ of n nodes, and a sample R of its tips (leaf nodes) a very common problem in ecological and evolutionary research is to evaluate a distance measure for the elements in R. Two of the most common measures of this kind are the Mean Pairwise Distance ($\\ensurem...

  4. Automatic selection of reference taxa for protein-protein interaction prediction with phylogenetic profiling

    DEFF Research Database (Denmark)

    Simonsen, Martin; Maetschke, S.R.; Ragan, M.A.

    2012-01-01

    Motivation: Phylogenetic profiling methods can achieve good accuracy in predicting protein–protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa publicly......: We present three novel methods for automating the selection of RT, using machine learning based on known protein–protein interaction networks. One of these methods in particular, Tree-Based Search, yields greatly improved prediction accuracies. We further show that different methods for constituting...... phylogenetic profiles often require very different RT sets to support high prediction accuracy....

  5. Rooting phylogenetic trees under the coalescent model using site pattern probabilities.

    Science.gov (United States)

    Tian, Yuan; Kubatko, Laura

    2017-12-19

    Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.

  6. Identifiability of tree-child phylogenetic networks under a probabilistic recombination-mutation model of evolution.

    Science.gov (United States)

    Francis, Andrew; Moulton, Vincent

    2018-06-07

    Phylogenetic networks are an extension of phylogenetic trees which are used to represent evolutionary histories in which reticulation events (such as recombination and hybridization) have occurred. A central question for such networks is that of identifiability, which essentially asks under what circumstances can we reliably identify the phylogenetic network that gave rise to the observed data? Recently, identifiability results have appeared for networks relative to a model of sequence evolution that generalizes the standard Markov models used for phylogenetic trees. However, these results are quite limited in terms of the complexity of the networks that are considered. In this paper, by introducing an alternative probabilistic model for evolution along a network that is based on some ground-breaking work by Thatte for pedigrees, we are able to obtain an identifiability result for a much larger class of phylogenetic networks (essentially the class of so-called tree-child networks). To prove our main theorem, we derive some new results for identifying tree-child networks combinatorially, and then adapt some techniques developed by Thatte for pedigrees to show that our combinatorial results imply identifiability in the probabilistic setting. We hope that the introduction of our new model for networks could lead to new approaches to reliably construct phylogenetic networks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Tree-average distances on certain phylogenetic networks have their weights uniquely determined.

    Science.gov (United States)

    Willson, Stephen J

    2012-01-01

    A phylogenetic network N has vertices corresponding to species and arcs corresponding to direct genetic inheritance from the species at the tail to the species at the head. Measurements of DNA are often made on species in the leaf set, and one seeks to infer properties of the network, possibly including the graph itself. In the case of phylogenetic trees, distances between extant species are frequently used to infer the phylogenetic trees by methods such as neighbor-joining. This paper proposes a tree-average distance for networks more general than trees. The notion requires a weight on each arc measuring the genetic change along the arc. For each displayed tree the distance between two leaves is the sum of the weights along the path joining them. At a hybrid vertex, each character is inherited from one of its parents. We will assume that for each hybrid there is a probability that the inheritance of a character is from a specified parent. Assume that the inheritance events at different hybrids are independent. Then for each displayed tree there will be a probability that the inheritance of a given character follows the tree; this probability may be interpreted as the probability of the tree. The tree-average distance between the leaves is defined to be the expected value of their distance in the displayed trees. For a class of rooted networks that includes rooted trees, it is shown that the weights and the probabilities at each hybrid vertex can be calculated given the network and the tree-average distances between the leaves. Hence these weights and probabilities are uniquely determined. The hypotheses on the networks include that hybrid vertices have indegree exactly 2 and that vertices that are not leaves have a tree-child.

  8. Potential pitfalls of modelling ribosomal RNA data in phylogenetic tree reconstruction: evidence from case studies in the Metazoa.

    Science.gov (United States)

    Letsch, Harald O; Kjer, Karl M

    2011-05-27

    Failure to account for covariation patterns in helical regions of ribosomal RNA (rRNA) genes has the potential to misdirect the estimation of the phylogenetic signal of the data. Furthermore, the extremes of length variation among taxa, combined with regional substitution rate variation can mislead the alignment of rRNA sequences and thus distort subsequent tree reconstructions. However, recent developments in phylogenetic methodology now allow a comprehensive integration of secondary structures in alignment and tree reconstruction analyses based on rRNA sequences, which has been shown to correct some of these problems. Here, we explore the potentials of RNA substitution models and the interactions of specific model setups with the inherent pattern of covariation in rRNA stems and substitution rate variation among loop regions. We found an explicit impact of RNA substitution models on tree reconstruction analyses. The application of specific RNA models in tree reconstructions is hampered by interaction between the appropriate modelling of covarying sites in stem regions, and excessive homoplasy in some loop regions. RNA models often failed to recover reasonable trees when single-stranded regions are excessively homoplastic, because these regions contribute a greater proportion of the data when covarying sites are essentially downweighted. In this context, the RNA6A model outperformed all other models, including the more parametrized RNA7 and RNA16 models. Our results depict a trade-off between increased accuracy in estimation of interdependencies in helical regions with the risk of magnifying positions lacking phylogenetic signal. We can therefore conclude that caution is warranted when applying rRNA covariation models, and suggest that loop regions be independently screened for phylogenetic signal, and eliminated when they are indistinguishable from random noise. In addition to covariation and homoplasy, other factors, like non-stationarity of substitution rates

  9. Comparing Phylogenetic Trees by Matching Nodes Using the Transfer Distance Between Partitions.

    Science.gov (United States)

    Bogdanowicz, Damian; Giaro, Krzysztof

    2017-05-01

    Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson-Foulds distance. In this article, we define a new metric for rooted trees-the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events.

  10. Taxon ordering in phylogenetic trees by means of evolutionary algorithms

    Directory of Open Access Journals (Sweden)

    Cerutti Francesco

    2011-07-01

    Full Text Available Abstract Background In in a typical "left-to-right" phylogenetic tree, the vertical order of taxa is meaningless, as only the branch path between them reflects their degree of similarity. To make unresolved trees more informative, here we propose an innovative Evolutionary Algorithm (EA method to search the best graphical representation of unresolved trees, in order to give a biological meaning to the vertical order of taxa. Methods Starting from a West Nile virus phylogenetic tree, in a (1 + 1-EA we evolved it by randomly rotating the internal nodes and selecting the tree with better fitness every generation. The fitness is a sum of genetic distances between the considered taxon and the r (radius next taxa. After having set the radius to the best performance, we evolved the trees with (λ + μ-EAs to study the influence of population on the algorithm. Results The (1 + 1-EA consistently outperformed a random search, and better results were obtained setting the radius to 8. The (λ + μ-EAs performed as well as the (1 + 1, except the larger population (1000 + 1000. Conclusions The trees after the evolution showed an improvement both of the fitness (based on a genetic distance matrix, then close taxa are actually genetically close, and of the biological interpretation. Samples collected in the same state or year moved close each other, making the tree easier to interpret. Biological relationships between samples are also easier to observe.

  11. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis.

    Science.gov (United States)

    Stoltzfus, Arlin; O'Meara, Brian; Whitacre, Jamie; Mounce, Ross; Gillespie, Emily L; Kumar, Sudhir; Rosauer, Dan F; Vos, Rutger A

    2012-10-22

    Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating

  12. A phylogenetic perspective on the individual species-area relationship in temperate and tropical tree communities.

    Science.gov (United States)

    Yang, Jie; Swenson, Nathan G; Cao, Min; Chuyong, George B; Ewango, Corneille E N; Howe, Robert; Kenfack, David; Thomas, Duncan; Wolf, Amy; Lin, Luxiang

    2013-01-01

    Ecologists have historically used species-area relationships (SARs) as a tool to understand the spatial distribution of species. Recent work has extended SARs to focus on individual-level distributions to generate individual species area relationships (ISARs). The ISAR approach quantifies whether individuals of a species tend have more or less species richness surrounding them than expected by chance. By identifying richness 'accumulators' and 'repellers', respectively, the ISAR approach has been used to infer the relative importance of abiotic and biotic interactions and neutrality. A clear limitation of the SAR and ISAR approaches is that all species are treated as evolutionarily independent and that a large amount of work has now shown that local tree neighborhoods exhibit non-random phylogenetic structure given the species richness. Here, we use nine tropical and temperate forest dynamics plots to ask: (i) do ISARs change predictably across latitude?; (ii) is the phylogenetic diversity in the neighborhood of species accumulators and repellers higher or lower than that expected given the observed species richness?; and (iii) do species accumulators, repellers distributed non-randomly on the community phylogenetic tree? The results indicate no clear trend in ISARs from the temperate zone to the tropics and that the phylogenetic diversity surrounding the individuals of species is generally only non-random on very local scales. Interestingly the distribution of species accumulators and repellers was non-random on the community phylogenies suggesting the presence of phylogenetic signal in the ISAR across latitude.

  13. Phylogenetic tree construction using trinucleotide usage profile (TUP).

    Science.gov (United States)

    Chen, Si; Deng, Lih-Yuan; Bowman, Dale; Shiau, Jyh-Jen Horng; Wong, Tit-Yee; Madahian, Behrouz; Lu, Henry Horng-Shing

    2016-10-06

    It has been a challenging task to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences. The most popular method, called feature frequency profile (FFP-k), finds the frequency distribution for all words of certain length k over the whole genome sequence using (overlapping) windows of the same length. For a satisfactory result, the recommended word length (k) ranges from 6 to 15 and it may not be a multiple of 3 (codon length). The total number of possible words needed for FFP-k can range from 4 6 =4096 to 4 15 . We propose a simple improvement over the popular FFP method using only a typical word length of 3. A new method, called Trinucleotide Usage Profile (TUP), is proposed based only on the (relative) frequency distribution using non-overlapping windows of length 3. The total number of possible words needed for TUP is 4 3 =64, which is much less than the total count for the recommended optimal "resolution" for FFP. To build a phylogenetic tree, we propose first representing each of the species by a TUP vector and then using an appropriate distance measure between pairs of the TUP vectors for the tree construction. In particular, we propose summarizing a DNA sequence by a matrix of three rows corresponding to three reading frames, recording the frequency distribution of the non-overlapping words of length 3 in each of the reading frame. We also provide a numerical measure for comparing trees constructed with various methods. Compared to the FFP method, our empirical study showed that the proposed TUP method is more capable of building phylogenetic trees with a stronger biological support. We further provide some justifications on this from the information theory viewpoint. Unlike the FFP method, the TUP method takes the advantage that the starting of the first reading frame is (usually) known. Without this information, the FFP method could only rely on the frequency distribution of

  14. PhySortR: a fast, flexible tool for sorting phylogenetic trees in R.

    Science.gov (United States)

    Stephens, Timothy G; Bhattacharya, Debashish; Ragan, Mark A; Chan, Cheong Xin

    2016-01-01

    A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, particularly robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for classifying phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa based on tip labels (i.e., leaves) on a tree, with customisable options to assess clades within the context of the whole tree. Using simulated and empirical datasets, we demonstrate the potential and scalability of PhySortR in analysis of thousands of phylogenetic trees without a priori assumption of tree-rooting, and in yielding readily interpretable trees that unambiguously satisfy the query. PhySortR is a command-line tool that is freely available and easily automatable.

  15. Disentangling environmental and spatial effects on phylogenetic structure of angiosperm tree communities in China.

    Science.gov (United States)

    Qian, Hong; Chen, Shengbin; Zhang, Jin-Long

    2017-07-17

    Niche-based and neutrality-based theories are two major classes of theories explaining the assembly mechanisms of local communities. Both theories have been frequently used to explain species diversity and composition in local communities but their relative importance remains unclear. Here, we analyzed 57 assemblages of angiosperm trees in 0.1-ha forest plots across China to examine the effects of environmental heterogeneity (relevant to niche-based processes) and spatial contingency (relevant to neutrality-based processes) on phylogenetic structure of angiosperm tree assemblages distributed across a wide range of environment and space. Phylogenetic structure was quantified with six phylogenetic metrics (i.e., phylogenetic diversity, mean pairwise distance, mean nearest taxon distance, and the standardized effect sizes of these three metrics), which emphasize on different depths of evolutionary histories and account for different degrees of species richness effects. Our results showed that the variation in phylogenetic metrics explained independently by environmental variables was on average much greater than that explained independently by spatial structure, and the vast majority of the variation in phylogenetic metrics was explained by spatially structured environmental variables. We conclude that niche-based processes have played a more important role than neutrality-based processes in driving phylogenetic structure of angiosperm tree species in forest communities in China.

  16. Trinets encode tree-child and level-2 phylogenetic networks

    NARCIS (Netherlands)

    L.J.J. van Iersel (Leo); V. Moulton

    2012-01-01

    htmlabstractPhylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to

  17. Soil phosphorus heterogeneity promotes tree species diversity and phylogenetic clustering in a tropical seasonal rainforest.

    Science.gov (United States)

    Xu, Wumei; Ci, Xiuqin; Song, Caiyun; He, Tianhua; Zhang, Wenfu; Li, Qiaoming; Li, Jie

    2016-12-01

    The niche theory predicts that environmental heterogeneity and species diversity are positively correlated in tropical forests, whereas the neutral theory suggests that stochastic processes are more important in determining species diversity. This study sought to investigate the effects of soil nutrient (nitrogen and phosphorus) heterogeneity on tree species diversity in the Xishuangbanna tropical seasonal rainforest in southwestern China. Thirty-nine plots of 400 m 2 (20 × 20 m) were randomly located in the Xishuangbanna tropical seasonal rainforest. Within each plot, soil nutrient (nitrogen and phosphorus) availability and heterogeneity, tree species diversity, and community phylogenetic structure were measured. Soil phosphorus heterogeneity and tree species diversity in each plot were positively correlated, while phosphorus availability and tree species diversity were not. The trees in plots with low soil phosphorus heterogeneity were phylogenetically overdispersed, while the phylogenetic structure of trees within the plots became clustered as heterogeneity increased. Neither nitrogen availability nor its heterogeneity was correlated to tree species diversity or the phylogenetic structure of trees within the plots. The interspecific competition in the forest plots with low soil phosphorus heterogeneity could lead to an overdispersed community. However, as heterogeneity increase, more closely related species may be able to coexist together and lead to a clustered community. Our results indicate that soil phosphorus heterogeneity significantly affects tree diversity in the Xishuangbanna tropical seasonal rainforest, suggesting that deterministic processes are dominant in this tropical forest assembly.

  18. Phylogenetic tree based on complete genomes using fractal and correlation analyses without sequence alignment

    Directory of Open Access Journals (Sweden)

    Zu-Guo Yu

    2006-06-01

    Full Text Available The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped resolve the evolution of this organelle in photosynthetic eukaryotes. In this review, we describe two algorithms to construct phylogenetic trees based on the theories of fractals and dynamic language using complete genomes. These algorithms were developed by our research group in the past few years. Our distance-based phylogenetic tree of 109 prokaryotes and eukaryotes agrees with the biologists' "tree of life" based on the 16S-like rRNA genes in a majority of basic branchings and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated into two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution.

  19. Calibrated birth-death phylogenetic time-tree priors for bayesian inference.

    Science.gov (United States)

    Heled, Joseph; Drummond, Alexei J

    2015-05-01

    Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  20. Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

    Science.gov (United States)

    Lanfear, Robert; Hua, Xia; Warren, Dan L.

    2016-01-01

    Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794

  1. On the use of cartographic projections in visualizing phylo-genetic tree space

    Directory of Open Access Journals (Sweden)

    Clement Mark

    2010-06-01

    Full Text Available Abstract Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger data sets.

  2. Phylogenetic Trees and Networks Reduce to Phylogenies on Binary States: Does It Furnish an Explanation to the Robustness of Phylogenetic Trees against Lateral Transfers.

    Science.gov (United States)

    Thuillard, Marc; Fraix-Burnet, Didier

    2015-01-01

    This article presents an innovative approach to phylogenies based on the reduction of multistate characters to binary-state characters. We show that the reduction to binary characters' approach can be applied to both character- and distance-based phylogenies and provides a unifying framework to explain simply and intuitively the similarities and differences between distance- and character-based phylogenies. Building on these results, this article gives a possible explanation on why phylogenetic trees obtained from a distance matrix or a set of characters are often quite reasonable despite lateral transfers of genetic material between taxa. In the presence of lateral transfers, outer planar networks furnish a better description of evolution than phylogenetic trees. We present a polynomial-time reconstruction algorithm for perfect outer planar networks with a fixed number of states, characters, and lateral transfers.

  3. Pylogeny: an open-source Python framework for phylogenetic tree reconstruction and search space heuristics

    Directory of Open Access Journals (Sweden)

    Alexander Safatli

    2015-06-01

    Full Text Available Summary. Pylogeny is a cross-platform library for the Python programming language that provides an object-oriented application programming interface for phylogenetic heuristic searches. Its primary function is to permit both heuristic search and analysis of the phylogenetic tree search space, as well as to enable the design of novel algorithms to search this space. To this end, the framework supports the structural manipulation of phylogenetic trees, in particular using rearrangement operators such as NNI, SPR, and TBR, the scoring of trees using parsimony and likelihood methods, the construction of a tree search space graph, and the programmatic execution of a few existing heuristic programs. The library supports a range of common phylogenetic file formats and can be used for both nucleotide and protein data. Furthermore, it is also capable of supporting GPU likelihood calculation on nucleotide character data through the BEAGLE library.Availability. Existing development and source code is available for contribution and for download by the public from GitHub (http://github.com/AlexSafatli/Pylogeny. A stable release of this framework is available for download through PyPi (Python Package Index at http://pypi.python.org/pypi/pylogeny.

  4. Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

    Science.gov (United States)

    Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak

    2006-06-06

    To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence

  5. Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks

    Directory of Open Access Journals (Sweden)

    Chang Jeong-Ho

    2006-06-01

    Full Text Available Abstract Background To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. Results To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. Conclusion By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway

  6. MrEnt: an editor for publication-quality phylogenetic tree illustrations.

    Science.gov (United States)

    Zuccon, Alessandro; Zuccon, Dario

    2014-09-01

    We developed MrEnt, a Windows-based, user-friendly software that allows the production of complex, high-resolution, publication-quality phylogenetic trees in few steps, directly from the analysis output. The program recognizes the standard Nexus tree format and the annotated tree files produced by BEAST and MrBayes. MrEnt combines in a single software a large suite of tree manipulation functions (e.g. handling of multiple trees, tree rotation, character mapping, node collapsing, compression of large clades, handling of time scale and error bars for chronograms) with drawing tools typical of standard graphic editors, including handling of graphic elements and images. The tree illustration can be printed or exported in several standard formats suitable for journal publication, PowerPoint presentation or Web publication. © 2014 John Wiley & Sons Ltd.

  7. MulRF: a software package for phylogenetic analysis using multi-copy gene trees.

    Science.gov (United States)

    Chaudhary, Ruchi; Fernández-Baca, David; Burleigh, John Gordon

    2015-02-01

    MulRF is a platform-independent software package for phylogenetic analysis using multi-copy gene trees. It seeks the species tree that minimizes the Robinson-Foulds (RF) distance to the input trees using a generalization of the RF distance to multi-labeled trees. The underlying generic tree distance measure and fast running time make MulRF useful for inferring phylogenies from large collections of gene trees, in which multiple evolutionary processes as well as phylogenetic error may contribute to gene tree discord. MulRF implements several features for customizing the species tree search and assessing the results, and it provides a user-friendly graphical user interface (GUI) with tree visualization. The species tree search is implemented in C++ and the GUI in Java Swing. MulRF's executable as well as sample datasets and manual are available at http://genome.cs.iastate.edu/CBL/MulRF/, and the source code is available at https://github.com/ruchiherself/MulRFRepo. ruchic@ufl.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. BuddySuite: Command-Line Toolkits for Manipulating Sequences, Alignments, and Phylogenetic Trees.

    Science.gov (United States)

    Bond, Stephen R; Keat, Karl E; Barreira, Sofia N; Baxevanis, Andreas D

    2017-06-01

    The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  9. Tree imbalance causes a bias in phylogenetic estimation of evolutionary timescales using heterochronous sequences.

    Science.gov (United States)

    Duchêne, David; Duchêne, Sebastian; Ho, Simon Y W

    2015-07-01

    Phylogenetic estimation of evolutionary timescales has become routine in biology, forming the basis of a wide range of evolutionary and ecological studies. However, there are various sources of bias that can affect these estimates. We investigated whether tree imbalance, a property that is commonly observed in phylogenetic trees, can lead to reduced accuracy or precision of phylogenetic timescale estimates. We analysed simulated data sets with calibrations at internal nodes and at the tips, taking into consideration different calibration schemes and levels of tree imbalance. We also investigated the effect of tree imbalance on two empirical data sets: mitogenomes from primates and serial samples of the African swine fever virus. In analyses calibrated using dated, heterochronous tips, we found that tree imbalance had a detrimental impact on precision and produced a bias in which the overall timescale was underestimated. A pronounced effect was observed in analyses with shallow calibrations. The greatest decreases in accuracy usually occurred in the age estimates for medium and deep nodes of the tree. In contrast, analyses calibrated at internal nodes did not display a reduction in estimation accuracy or precision due to tree imbalance. Our results suggest that molecular-clock analyses can be improved by increasing taxon sampling, with the specific aims of including deeper calibrations, breaking up long branches and reducing tree imbalance. © 2014 John Wiley & Sons Ltd.

  10. Phylogenetic Trees and Networks Reduce to Phylogenies on Binary States: Does It Furnish an Explanation to the Robustness of Phylogenetic Trees against Lateral Transfers

    Science.gov (United States)

    Thuillard, Marc; Fraix-Burnet, Didier

    2015-01-01

    This article presents an innovative approach to phylogenies based on the reduction of multistate characters to binary-state characters. We show that the reduction to binary characters’ approach can be applied to both character- and distance-based phylogenies and provides a unifying framework to explain simply and intuitively the similarities and differences between distance- and character-based phylogenies. Building on these results, this article gives a possible explanation on why phylogenetic trees obtained from a distance matrix or a set of characters are often quite reasonable despite lateral transfers of genetic material between taxa. In the presence of lateral transfers, outer planar networks furnish a better description of evolution than phylogenetic trees. We present a polynomial-time reconstruction algorithm for perfect outer planar networks with a fixed number of states, characters, and lateral transfers. PMID:26508826

  11. Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.

    Science.gov (United States)

    Allman, Elizabeth S; Rhodes, John A; Sullivant, Seth

    2017-02-01

    Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing that the corrected distance outperforms many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well since k-mer methods are usually the first step in constructing a guide tree for such algorithms.

  12. Native fauna on exotic trees: phylogenetic conservatism and geographic contingency in two lineages of phytophages on two lineages of trees.

    Science.gov (United States)

    Gossner, Martin M; Chao, Anne; Bailey, Richard I; Prinzing, Andreas

    2009-05-01

    The relative roles of evolutionary history and geographical and ecological contingency for community assembly remain unknown. Plant species, for instance, share more phytophages with closer relatives (phylogenetic conservatism), but for exotic plants introduced to another continent, this may be overlaid by geographically contingent evolution or immigration from locally abundant plant species (mass effects). We assessed within local forests to what extent exotic trees (Douglas-fir, red oak) recruit phytophages (Coleoptera, Heteroptera) from more closely or more distantly related native plants. We found that exotics shared more phytophages with natives from the same major plant lineage (angiosperms vs. gymnosperms) than with natives from the other lineage. This was particularly true for Heteroptera, and it emphasizes the role of host specialization in phylogenetic conservatism of host use. However, for Coleoptera on Douglas-fir, mass effects were important: immigration from beech increased with increasing beech abundance. Within a plant phylum, phylogenetic proximity of exotics and natives increased phytophage similarity, primarily in younger Coleoptera clades on angiosperms, emphasizing a role of past codiversification of hosts and phytophages. Overall, phylogenetic conservatism can shape the assembly of local phytophage communities on exotic trees. Whether it outweighs geographic contingency and mass effects depends on the interplay of phylogenetic scale, local abundance of native tree species, and the biology and evolutionary history of the phytophage taxon.

  13. Equality of Shapley value and fair proportion index in phylogenetic trees.

    Science.gov (United States)

    Fuchs, Michael; Jin, Emma Yu

    2015-11-01

    The Shapley value and the fair proportion index of phylogenetic trees have been introduced recently for the purpose of making conservation decisions in genetics. Moreover, also very recently, Hartmann (J Math Biol 67:1163-1170, 2013) has presented data which shows that there is a strong correlation between a slightly modified version of the Shapley value (which we call the modified Shapley value) and the fair proportion index. He gave an explanation of this correlation by showing that the contribution of both indices to an edge of the tree becomes identical as the number of taxa tends to infinity. In this note, we show that the Shapley value and the fair proportion index are in fact the same. Moreover, we also consider the modified Shapley value and show that its covariance with the fair proportion index in random phylogenetic trees under the Yule-Harding model and uniform model is indeed close to one.

  14. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    Science.gov (United States)

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  15. Climate-driven extinctions shape the phylogenetic structure of temperate tree floras.

    Science.gov (United States)

    Eiserhardt, Wolf L; Borchsenius, Finn; Plum, Christoffer M; Ordonez, Alejandro; Svenning, Jens-Christian

    2015-03-01

    When taxa go extinct, unique evolutionary history is lost. If extinction is selective, and the intrinsic vulnerabilities of taxa show phylogenetic signal, more evolutionary history may be lost than expected under random extinction. Under what conditions this occurs is insufficiently known. We show that late Cenozoic climate change induced phylogenetically selective regional extinction of northern temperate trees because of phylogenetic signal in cold tolerance, leading to significantly and substantially larger than random losses of phylogenetic diversity (PD). The surviving floras in regions that experienced stronger extinction are phylogenetically more clustered, indicating that non-random losses of PD are of increasing concern with increasing extinction severity. Using simulations, we show that a simple threshold model of survival given a physiological trait with phylogenetic signal reproduces our findings. Our results send a strong warning that we may expect future assemblages to be phylogenetically and possibly functionally depauperate if anthropogenic climate change affects taxa similarly. © 2015 John Wiley & Sons Ltd/CNRS.

  16. Phylogenetic diversity and biodiversity indices on phylogenetic networks.

    Science.gov (United States)

    Wicke, Kristina; Fischer, Mareike

    2018-04-01

    In biodiversity conservation it is often necessary to prioritize the species to conserve. Existing approaches to prioritization, e.g. the Fair Proportion Index and the Shapley Value, are based on phylogenetic trees and rank species according to their contribution to overall phylogenetic diversity. However, in many cases evolution is not treelike and thus, phylogenetic networks have been developed as a generalization of phylogenetic trees, allowing for the representation of non-treelike evolutionary events, such as hybridization. Here, we extend the concepts of phylogenetic diversity and phylogenetic diversity indices from phylogenetic trees to phylogenetic networks. On the one hand, we consider the treelike content of a phylogenetic network, e.g. the (multi)set of phylogenetic trees displayed by a network and the so-called lowest stable ancestor tree associated with it. On the other hand, we derive the phylogenetic diversity of subsets of taxa and biodiversity indices directly from the internal structure of the network. We consider both approaches that are independent of so-called inheritance probabilities as well as approaches that explicitly incorporate these probabilities. Furthermore, we introduce our software package NetDiversity, which is implemented in Perl and allows for the calculation of all generalized measures of phylogenetic diversity and generalized phylogenetic diversity indices established in this note that are independent of inheritance probabilities. We apply our methods to a phylogenetic network representing the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), a group of species characterized by widespread hybridization. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing.

    Science.gov (United States)

    Xiao, Jian; Cao, Hongyuan; Chen, Jun

    2017-09-15

    Next generation sequencing technologies have enabled the study of the human microbiome through direct sequencing of microbial DNA, resulting in an enormous amount of microbiome sequencing data. One unique characteristic of microbiome data is the phylogenetic tree that relates all the bacterial species. Closely related bacterial species have a tendency to exhibit a similar relationship with the environment or disease. Thus, incorporating the phylogenetic tree information can potentially improve the detection power for microbiome-wide association studies, where hundreds or thousands of tests are conducted simultaneously to identify bacterial species associated with a phenotype of interest. Despite much progress in multiple testing procedures such as false discovery rate (FDR) control, methods that take into account the phylogenetic tree are largely limited. We propose a new FDR control procedure that incorporates the prior structure information and apply it to microbiome data. The proposed procedure is based on a hierarchical model, where a structure-based prior distribution is designed to utilize the phylogenetic tree. By borrowing information from neighboring bacterial species, we are able to improve the statistical power of detecting associated bacterial species while controlling the FDR at desired levels. When the phylogenetic tree is mis-specified or non-informative, our procedure achieves a similar power as traditional procedures that do not take into account the tree structure. We demonstrate the performance of our method through extensive simulations and real microbiome datasets. We identified far more alcohol-drinking associated bacterial species than traditional methods. R package StructFDR is available from CRAN. chen.jun2@mayo.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  18. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.

    Science.gov (United States)

    Sayyari, Erfan; Mirarab, Siavash

    2016-11-11

    Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

  19. Patterns and effects of GC3 heterogeneity and parsimony informative sites on the phylogenetic tree of genes.

    Science.gov (United States)

    Ma, Shuai; Wu, Qi; Hu, Yibo; Wei, Fuwen

    2018-05-20

    The explosive growth in genomic data has provided novel insights into the conflicting signals hidden in phylogenetic trees. Although some studies have explored the effects of the GC content and parsimony informative sites (PIS) on the phylogenetic tree, the effect of the heterogeneity of the GC content at the first/second/third codon position on parsimony informative sites (GC1/2/3 PIS ) among different species and the effect of PIS on phylogenetic tree construction remain largely unexplored. Here, we used two different mammal genomic datasets to explore the patterns of GC1/2/3 PIS heterogeneity and the effect of PIS on the phylogenetic tree of genes: (i) all GC1/2/3 PIS have obvious heterogeneity between different mammals, and the levels of heterogeneity are GC3 PIS  > GC2 PIS  > GC1 PIS ; (ii) the number of PIS is positively correlated with the metrics of "good" gene tree topologies, and excluding the third codon position (C3) decreases the quality of gene trees by removing too many PIS. These results provide novel insights into the heterogeneity pattern of GC1/2/3 PIS in mammals and the relationship between GC3/PIS and gene trees. Additionally, it is necessary to carefully consider whether to exclude C3 to improve the quality of gene trees, especially in the super-tree method. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2010-06-01

    Full Text Available Abstract Background The Maximal Pairing Problem (MPP is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y, what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n4 log n w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations.

  1. A scalable method for identifying frequent subtrees in sets of large phylogenetic trees.

    Science.gov (United States)

    Ramu, Avinash; Kahveci, Tamer; Burleigh, J Gordon

    2012-10-03

    We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.

  2. Auto-validating von Neumann rejection sampling from small phylogenetic tree spaces

    Directory of Open Access Journals (Sweden)

    York Thomas

    2009-01-01

    Full Text Available Abstract Background In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces. Results The posterior samples from the auto-validating sampler are used to rigorously (i estimate posterior probabilities for different rooted topologies based on mitochondrial DNA from human, chimpanzee and gorilla, (ii conduct a non-parametric test of rate variation between protein-coding and tRNA-coding sites from three primates and (iii obtain a posterior estimate of the human-neanderthal divergence time. Conclusion This solves the open problem of rigorously drawing independent and identically distributed samples from the posterior distribution over rooted and unrooted small tree spaces (3 or 4 taxa based on any multiply-aligned sequence data.

  3. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

    Directory of Open Access Journals (Sweden)

    Kodner Robin B

    2010-10-01

    Full Text Available Abstract Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service.

  4. Exploring the relationship between sequence similarity and accurate phylogenetic trees.

    Science.gov (United States)

    Cantarel, Brandi L; Morrison, Hilary G; Pearson, William

    2006-11-01

    We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not

  5. Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.

    Science.gov (United States)

    Sayyari, Erfan; Mirarab, Siavash

    2018-02-28

    Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.

  6. Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies

    Science.gov (United States)

    Sayyari, Erfan

    2018-01-01

    Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest. PMID:29495636

  7. constNJ: an algorithm to reconstruct sets of phylogenetic trees satisfying pairwise topological constraints.

    Science.gov (United States)

    Matsen, Frederick A

    2010-06-01

    This article introduces constNJ (constrained neighbor-joining), an algorithm for phylogenetic reconstruction of sets of trees with constrained pairwise rooted subtree-prune-regraft (rSPR) distance. We are motivated by the problem of constructing sets of trees that must fit into a recombination, hybridization, or similar network. Rather than first finding a set of trees that are optimal according to a phylogenetic criterion (e.g., likelihood or parsimony) and then attempting to fit them into a network, constNJ estimates the trees while enforcing specified rSPR distance constraints. The primary input for constNJ is a collection of distance matrices derived from sequence blocks which are assumed to have evolved in a tree-like manner, such as blocks of an alignment which do not contain any recombination breakpoints. The other input is a set of rSPR constraint inequalities for any set of pairs of trees. constNJ is consistent and a strict generalization of the neighbor-joining algorithm; it uses the new notion of maximum agreement partitions (MAPs) to assure that the resulting trees satisfy the given rSPR distance constraints.

  8. AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

    Science.gov (United States)

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.

  9. Phylogenetic constraints do not explain the rarity of nitrogen-fixing trees in late-successional temperate forests.

    Science.gov (United States)

    Menge, Duncan N L; DeNoyer, Jeanne L; Lichstein, Jeremy W

    2010-08-06

    Symbiotic nitrogen (N)-fixing trees are rare in late-successional temperate forests, even though these forests are often N limited. Two hypotheses could explain this paradox. The 'phylogenetic constraints hypothesis' states that no late-successional tree taxa in temperate forests belong to clades that are predisposed to N fixation. Conversely, the 'selective constraints hypothesis' states that such taxa are present, but N-fixing symbioses would lower their fitness. Here we test the phylogenetic constraints hypothesis. Using U.S. forest inventory data, we derived successional indices related to shade tolerance and stand age for N-fixing trees, non-fixing trees in the 'potentially N-fixing clade' (smallest angiosperm clade that includes all N fixers), and non-fixing trees outside this clade. We then used phylogenetically independent contrasts (PICs) to test for associations between these successional indices and N fixation. Four results stand out from our analysis of U.S. trees. First, N fixers are less shade-tolerant than non-fixers both inside and outside of the potentially N-fixing clade. Second, N fixers tend to occur in younger stands in a given geographical region than non-fixers both inside and outside of the potentially N-fixing clade. Third, the potentially N-fixing clade contains numerous late-successional non-fixers. Fourth, although the N fixation trait is evolutionarily conserved, the successional traits are relatively labile. These results suggest that selective constraints, not phylogenetic constraints, explain the rarity of late-successional N-fixing trees in temperate forests. Because N-fixing trees could overcome N limitation to net primary production if they were abundant, this study helps to understand the maintenance of N limitation in temperate forests, and therefore the capacity of this biome to sequester carbon.

  10. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees.

    Science.gov (United States)

    He, Zilong; Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Chen, Wei-Hua; Hu, Songnian

    2016-07-08

    Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its 'dataset system' contains not only the data to be visualized on the tree, but also 'modifiers' that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new 'Demo' trees to demonstrate the basic functionalities of Evolview, and five new 'Showcase' trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  12. Why abundant tropical tree species are phylogenetically old.

    Science.gov (United States)

    Wang, Shaopeng; Chen, Anping; Fang, Jingyun; Pacala, Stephen W

    2013-10-01

    Neutral models of species diversity predict patterns of abundance for communities in which all individuals are ecologically equivalent. These models were originally developed for Panamanian trees and successfully reproduce observed distributions of abundance. Neutral models also make macroevolutionary predictions that have rarely been evaluated or tested. Here we show that neutral models predict a humped or flat relationship between species age and population size. In contrast, ages and abundances of tree species in the Panamanian Canal watershed are found to be positively correlated, which falsifies the models. Speciation rates vary among phylogenetic lineages and are partially heritable from mother to daughter species. Variable speciation rates in an otherwise neutral model lead to a demographic advantage for species with low speciation rate. This demographic advantage results in a positive correlation between species age and abundance, as found in the Panamanian tropical forest community.

  13. Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

    Science.gov (United States)

    Nye, Tom M W; Tang, Xiaoxian; Weyenberg, Grady; Yoshida, Ruriko

    2017-12-01

    Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

  14. 16S rRNA gene sequence and phylogenetic tree of lactobacillus ...

    African Journals Online (AJOL)

    ... processed by denaturing gradient gel electrophoresis (DGGE). Phylogenetic tree was constructed with the sequences of the V2-V3 region of 16S rRNA gene. Results show two distinct divisions among the Lactobacillus species. The study presents a new understanding of the nature of the Lactobacillus vaginal microbiota ...

  15. Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees.

    Science.gov (United States)

    Keller, Alexander; Förster, Frank; Müller, Tobias; Dandekar, Thomas; Schultz, Jörg; Wolf, Matthias

    2010-01-15

    In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking. This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness. Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion. This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.

  16. Reversible polymorphism-aware phylogenetic models and their application to tree inference.

    Science.gov (United States)

    Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin

    2016-10-21

    We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. The Reliability and Stability of an Inferred Phylogenetic Tree from Empirical Data.

    Science.gov (United States)

    Katsura, Yukako; Stanley, Craig E; Kumar, Sudhir; Nei, Masatoshi

    2017-03-01

    The reliability of a phylogenetic tree obtained from empirical data is usually measured by the bootstrap probability (Pb) of interior branches of the tree. If the bootstrap probability is high for most branches, the tree is considered to be reliable. If some interior branches show relatively low bootstrap probabilities, we are not sure that the inferred tree is really reliable. Here, we propose another quantity measuring the reliability of the tree called the stability of a subtree. This quantity refers to the probability of obtaining a subtree (Ps) of an inferred tree obtained. We then show that if the tree is to be reliable, both Pb and Ps must be high. We also show that Ps is given by a bootstrap probability of the subtree with the closest outgroup sequence, and computer program RESTA for computing the Pb and Ps values will be presented. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Inferring 'weak spots' in phylogenetic trees: application to mosasauroid nomenclature.

    Science.gov (United States)

    Madzia, Daniel; Cau, Andrea

    2017-01-01

    Mosasauroid squamates represented the apex predators within the Late Cretaceous marine and occasionally also freshwater ecosystems. Proper understanding of the origin of their ecological adaptations or paleobiogeographic dispersals requires adequate knowledge of their phylogeny. The studies assessing the position of mosasauroids on the squamate evolutionary tree and their origins have long given conflicting results. The phylogenetic relationships within Mosasauroidea, however, have experienced only little changes throughout the last decades. Considering the substantial improvements in the development of phylogenetic methodology that have undergone in recent years, resulting, among others, in numerous alterations in the phylogenetic hypotheses of other fossil amniotes, we test the robustness in our understanding of mosasauroid beginnings and their evolutionary history. We re-examined a data set that results from modifications assembled in the course of the last 20 years and performed multiple parsimony analyses and Bayesian tip-dating analysis. Following the inferred topologies and the 'weak spots' in the phylogeny of mosasauroids, we revise the nomenclature of the 'traditionally' recognized mosasauroid clades, to acknowledge the overall weakness among branches and the alternative topologies suggested previously, and discuss several factors that might have an impact on the differing phylogenetic hypotheses and their statistical support.

  19. Is invasion success of Australian trees mediated by their native biogeography, phylogenetic history, or both?

    Science.gov (United States)

    Miller, Joseph T; Hui, Cang; Thornhill, Andrew; Gallien, Laure; Le Roux, Johannes J; Richardson, David M

    2016-12-30

    For a plant species to become invasive it has to progress along the introduction-naturalization-invasion (INI) continuum which reflects the joint direction of niche breadth. Identification of traits that correlate with and drive species invasiveness along the continuum is a major focus of invasion biology. If invasiveness is underlain by heritable traits, and if such traits are phylogenetically conserved, then we would expect non-native species with different introduction status (i.e. position along the INI continuum) to show phylogenetic signal. This study uses two clades that contain a large number of invasive tree species from the genera Acacia and Eucalyptus to test whether geographic distribution and a novel phylogenetic conservation method can predict which species have been introduced, became naturalized, and invasive. Our results suggest that no underlying phylogenetic signal underlie the introduction status for both groups of trees, except for introduced acacias. The more invasive acacia clade contains invasive species that have smoother geographic distributions and are more marginal in the phylogenetic network. The less invasive eucalyptus group contains invasive species that are more clustered geographically, more centrally located in the phylogenetic network and have phylogenetic distances between invasive and non-invasive species that are trending toward the mean pairwise distance. This suggests that highly invasive groups may be identified because they have invasive species with smoother and faster expanding native distributions and are located more to the edges of phylogenetic networks than less invasive groups. Published by Oxford University Press on behalf of the Annals of Botany Company.

  20. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    Science.gov (United States)

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.

  1. Phylogenetic tree reconstruction accuracy and model fit when proportions of variable sites change across the tree.

    Science.gov (United States)

    Shavit Grievink, Liat; Penny, David; Hendy, Michael D; Holland, Barbara R

    2010-05-01

    Commonly used phylogenetic models assume a homogeneous process through time in all parts of the tree. However, it is known that these models can be too simplistic as they do not account for nonhomogeneous lineage-specific properties. In particular, it is now widely recognized that as constraints on sequences evolve, the proportion and positions of variable sites can vary between lineages causing heterotachy. The extent to which this model misspecification affects tree reconstruction is still unknown. Here, we evaluate the effect of changes in the proportions and positions of variable sites on model fit and tree estimation. We consider 5 current models of nucleotide sequence evolution in a Bayesian Markov chain Monte Carlo framework as well as maximum parsimony (MP). We show that for a tree with 4 lineages where 2 nonsister taxa undergo a change in the proportion of variable sites tree reconstruction under the best-fitting model, which is chosen using a relative test, often results in the wrong tree. In this case, we found that an absolute test of model fit is a better predictor of tree estimation accuracy. We also found further evidence that MP is not immune to heterotachy. In addition, we show that increased sampling of taxa that have undergone a change in proportion and positions of variable sites is critical for accurate tree reconstruction.

  2. Analyzing Phylogenetic Trees with Timed and Probabilistic Model Checking: The Lactose Persistence Case Study.

    Science.gov (United States)

    Requeno, José Ignacio; Colom, José Manuel

    2014-12-01

    Model checking is a generic verification technique that allows the phylogeneticist to focus on models and specifications instead of on implementation issues. Phylogenetic trees are considered as transition systems over which we interrogate phylogenetic questions written as formulas of temporal logic. Nonetheless, standard logics become insufficient for certain practices of phylogenetic analysis since they do not allow the inclusion of explicit time and probabilities. The aim of this paper is to extend the application of model checking techniques beyond qualitative phylogenetic properties and adapt the existing logical extensions and tools to the field of phylogeny. The introduction of time and probabilities in phylogenetic specifications is motivated by the study of a real example: the analysis of the ratio of lactose intolerance in some populations and the date of appearance of this phenotype.

  3. Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Dandekar Thomas

    2010-01-01

    Full Text Available Abstract Background In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking. Results This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness. Conclusions Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion. Reviewers This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber and Eugene V. Koonin. Open peer review Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.

  4. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

    Science.gov (United States)

    Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal

    2012-01-01

    Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of

  5. The algebra of the general Markov model on phylogenetic trees and networks.

    Science.gov (United States)

    Sumner, J G; Holland, B R; Jarvis, P D

    2012-04-01

    It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.

  6. Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

    Science.gov (United States)

    Mirzaei, Sajad; Wu, Yufeng

    2016-01-01

    Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.

  7. PHYLOGEOrec: A QGIS plugin for spatial phylogeographic reconstruction from phylogenetic tree and geographical information data

    Science.gov (United States)

    Nashrulloh, Maulana Malik; Kurniawan, Nia; Rahardi, Brian

    2017-11-01

    The increasing availability of genetic sequence data associated with explicit geographic and environment (including biotic and abiotic components) information offers new opportunities to study the processes that shape biodiversity and its patterns. Developing phylogeography reconstruction, by integrating phylogenetic and biogeographic knowledge, provides richer and deeper visualization and information on diversification events than ever before. Geographical information systems such as QGIS provide an environment for spatial modeling, analysis, and dissemination by which phylogenetic models can be explicitly linked with their associated spatial data, and subsequently, they will be integrated with other related georeferenced datasets describing the biotic and abiotic environment. We are introducing PHYLOGEOrec, a QGIS plugin for building spatial phylogeographic reconstructions constructed from phylogenetic tree and geographical information data based on QGIS2threejs. By using PHYLOGEOrec, researchers can integrate existing phylogeny and geographical information data, resulting in three-dimensional geographic visualizations of phylogenetic trees in the Keyhole Markup Language (KML) format. Such formats can be overlaid on a map using QGIS and finally, spatially viewed in QGIS by means of a QGIS2threejs engine for further analysis. KML can also be viewed in reputable geobrowsers with KML-support (i.e., Google Earth).

  8. Pareto-optimal phylogenetic tree reconciliation.

    Science.gov (United States)

    Libeskind-Hadas, Ran; Wu, Yi-Chieh; Bansal, Mukul S; Kellis, Manolis

    2014-06-15

    Phylogenetic tree reconciliation is a widely used method for reconstructing the evolutionary histories of gene families and species, hosts and parasites and other dependent pairs of entities. Reconciliation is typically performed using maximum parsimony, in which each evolutionary event type is assigned a cost and the objective is to find a reconciliation of minimum total cost. It is generally understood that reconciliations are sensitive to event costs, but little is understood about the relationship between event costs and solutions. Moreover, choosing appropriate event costs is a notoriously difficult problem. We address this problem by giving an efficient algorithm for computing Pareto-optimal sets of reconciliations, thus providing the first systematic method for understanding the relationship between event costs and reconciliations. This, in turn, results in new techniques for computing event support values and, for cophylogenetic analyses, performing robust statistical tests. We provide new software tools and demonstrate their use on a number of datasets from evolutionary genomic and cophylogenetic studies. Our Python tools are freely available at www.cs.hmc.edu/∼hadas/xscape. . © The Author 2014. Published by Oxford University Press.

  9. Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum.

    Science.gov (United States)

    Gatesy, John; Springer, Mark S

    2014-11-01

    Large datasets are required to solve difficult phylogenetic problems that are deep in the Tree of Life. Currently, two divergent systematic methods are commonly applied to such datasets: the traditional supermatrix approach (= concatenation) and "shortcut" coalescence (= coalescence methods wherein gene trees and the species tree are not co-estimated). When applied to ancient clades, these contrasting frameworks often produce congruent results, but in recent phylogenetic analyses of Placentalia (placental mammals), this is not the case. A recent series of papers has alternatively disputed and defended the utility of shortcut coalescence methods at deep phylogenetic scales. Here, we examine this exchange in the context of published phylogenomic data from Mammalia; in particular we explore two critical issues - the delimitation of data partitions ("genes") in coalescence analysis and hidden support that emerges with the combination of such partitions in phylogenetic studies. Hidden support - increased support for a clade in combined analysis of all data partitions relative to the support evident in separate analyses of the various data partitions, is a hallmark of the supermatrix approach and a primary rationale for concatenating all characters into a single matrix. In the most extreme cases of hidden support, relationships that are contradicted by all gene trees are supported when all of the genes are analyzed together. A valid fear is that shortcut coalescence methods might bypass or distort character support that is hidden in individual loci because small gene fragments are analyzed in isolation. Given the extensive systematic database for Mammalia, the assumptions and applicability of shortcut coalescence methods can be assessed with rigor to complement a small but growing body of simulation work that has directly compared these methods to concatenation. We document several remarkable cases of hidden support in both supermatrix and coalescence paradigms and argue

  10. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  11. On Nakhleh's metric for reduced phylogenetic networks

    OpenAIRE

    Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente Feruglio, Gabriel Alejandro

    2009-01-01

    We prove that Nakhleh’s metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, semibinary tree-sibling time consistent phylogenetic networks, and multilabeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phyl...

  12. Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment?

    Directory of Open Access Journals (Sweden)

    Hartmann Stefanie

    2008-03-01

    Full Text Available Abstract Background While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. Conclusion These results demonstrate that partial gene

  13. Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

    Science.gov (United States)

    Hartmann, Stefanie; Vision, Todd J

    2008-03-26

    While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a

  14. Novel information theory-based measures for quantifying incongruence among phylogenetic trees.

    Science.gov (United States)

    Salichos, Leonidas; Stamatakis, Alexandros; Rokas, Antonis

    2014-05-01

    Phylogenies inferred from different data matrices often conflict with each other necessitating the development of measures that quantify this incongruence. Here, we introduce novel measures that use information theory to quantify the degree of conflict or incongruence among all nontrivial bipartitions present in a set of trees. The first measure, internode certainty (IC), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode (internal branch) in a given set of trees jointly with that of the most prevalent conflicting bipartition in the same tree set. The second measure, IC All (ICA), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode in a given set of trees in conjunction with that of all conflicting bipartitions in the same underlying tree set. Finally, the tree certainty (TC) and TC All (TCA) measures are the sum of IC and ICA values across all internodes of a phylogeny, respectively. IC, ICA, TC, and TCA can be calculated from different types of data that contain nontrivial bipartitions, including from bootstrap replicate trees to gene trees or individual characters. Given a set of phylogenetic trees, the IC and ICA values of a given internode reflect its specific degree of incongruence, and the TC and TCA values describe the global degree of incongruence between trees in the set. All four measures are implemented and freely available in version 8.0.0 and subsequent versions of the widely used program RAxML.

  15. Not seeing the forest for the trees: size of the minimum spanning trees (MSTs) forest and branch significance in MST-based phylogenetic analysis.

    Science.gov (United States)

    Teixeira, Andreia Sofia; Monteiro, Pedro T; Carriço, João A; Ramirez, Mário; Francisco, Alexandre P

    2015-01-01

    Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in such trees. In this paper, we address this problem for MSTs by introducing a new edge betweenness metric for undirected and weighted graphs. This spanning edge betweenness metric is defined as the fraction of equivalent MSTs where a given edge is present. The metric provides a per edge statistic that is similar to that of the bootstrap approach frequently used in phylogenetics to support the grouping of taxa. We provide methods for the exact computation of this metric based on the well known Kirchhoff's matrix tree theorem. Moreover, we implement and make available a module for the PHYLOViZ software and evaluate the proposed metric concerning both effectiveness and computational performance. Analysis of trees generated using multilocus sequence typing data (MLST) and the goeBURST algorithm revealed that the space of possible MSTs in real data sets is extremely large. Selection of the edge to be represented using bootstrap could lead to unreliable results since alternative edges are present in the same fraction of equivalent MSTs. The choice of the MST to be presented, results from criteria implemented in the algorithm that must be based in biologically plausible models.

  16. LifePrint: a novel k-tuple distance method for construction of phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Fabián Reyes-Prieto

    2011-01-01

    Full Text Available Fabián Reyes-Prieto1, Adda J García-Chéquer1, Hueman Jaimes-Díaz1, Janet Casique-Almazán1, Juana M Espinosa-Lara1, Rosaura Palma-Orozco2, Alfonso Méndez-Tenorio1, Rogelio Maldonado-Rodríguez1, Kenneth L Beattie31Laboratory of Biotechnology and Genomic Bioinformatics, Department of Biochemistry, National School of Biological Sciences, 2Superior School of Computer Sciences, National Polytechnic Institute, Mexico City, Mexico; 3Amerigenics Inc, Crossville, Tennessee, USAPurpose: Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes.Methods: We designed a representative sample of all possible DNA tuples of length 9 (9-tuples. The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9 that are distinct from each other by at least two internal and noncontiguous nucleotide differences. For validation of our k-tuple distance method, we analyzed several real and simulated viroid genomes. Using different distance metrics, we scrutinized diverse viroid genomes to estimate the k-tuple distances between these genomic sequences. Then we used the estimated genomic k-tuple distances to construct phylogenetic trees using the neighbor-joining algorithm. A comparison of the accuracy of LPS9 and the previously reported 5-tuple method was made using symmetric differences between the trees estimated from each method and a simulated “true” phylogenetic tree.Results: The identified optimal search scheme for LPS9 allows only up to two nucleotide differences between each 9-tuple and the scrutinized genome. Similarity search results of simulated viroid genomes indicate that, in most cases, LPS9 is able to detect single-base substitutions between genomes efficiently. Analysis of simulated genomic variants with a high proportion of base substitutions indicates that LPS9 is able to discern relationships between genomic variants with up to 40% of nucleotide

  17. Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants.

    Science.gov (United States)

    Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D

    2017-12-01

    Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for

  18. Characterizing the phylogenetic tree community structure of a protected tropical rain forest area in Cameroon.

    Science.gov (United States)

    Manel, Stéphanie; Couvreur, Thomas L P; Munoz, François; Couteron, Pierre; Hardy, Olivier J; Sonké, Bonaventure

    2014-01-01

    Tropical rain forests, the richest terrestrial ecosystems in biodiversity on Earth are highly threatened by global changes. This paper aims to infer the mechanisms governing species tree assemblages by characterizing the phylogenetic structure of a tropical rain forest in a protected area of the Congo Basin, the Dja Faunal Reserve (Cameroon). We re-analyzed a dataset of 11538 individuals belonging to 372 taxa found along nine transects spanning five habitat types. We generated a dated phylogenetic tree including all sampled taxa to partition the phylogenetic diversity of the nine transects into alpha and beta components at the level of the transects and of the habitat types. The variation in phylogenetic composition among transects did not deviate from a random pattern at the scale of the Dja Faunal Reserve, probably due to a common history and weak environmental variation across the park. This lack of phylogenetic structure combined with an isolation-by-distance pattern of taxonomic diversity suggests that neutral dispersal limitation is a major driver of community assembly in the Dja. To assess any lack of sensitivity to the variation in habitat types, we restricted the analyses of transects to the terra firme primary forest and found results consistent with those of the whole dataset at the level of the transects. Additionally to previous analyses, we detected a weak but significant phylogenetic turnover among habitat types, suggesting that species sort in varying environments, even though it is not predominating on the overall phylogenetic structure. Finer analyses of clades indicated a signal of clustering for species from the Annonaceae family, while species from the Apocynaceae family indicated overdispersion. These results can contribute to the conservation of the park by improving our understanding of the processes dictating community assembly in these hyperdiverse but threatened regions of the world.

  19. Characterizing the Phylogenetic Tree Community Structure of a Protected Tropical Rain Forest Area in Cameroon

    Science.gov (United States)

    Munoz, François; Couteron, Pierre; Hardy, Olivier J.; Sonké, Bonaventure

    2014-01-01

    Tropical rain forests, the richest terrestrial ecosystems in biodiversity on Earth are highly threatened by global changes. This paper aims to infer the mechanisms governing species tree assemblages by characterizing the phylogenetic structure of a tropical rain forest in a protected area of the Congo Basin, the Dja Faunal Reserve (Cameroon). We re-analyzed a dataset of 11538 individuals belonging to 372 taxa found along nine transects spanning five habitat types. We generated a dated phylogenetic tree including all sampled taxa to partition the phylogenetic diversity of the nine transects into alpha and beta components at the level of the transects and of the habitat types. The variation in phylogenetic composition among transects did not deviate from a random pattern at the scale of the Dja Faunal Reserve, probably due to a common history and weak environmental variation across the park. This lack of phylogenetic structure combined with an isolation-by-distance pattern of taxonomic diversity suggests that neutral dispersal limitation is a major driver of community assembly in the Dja. To assess any lack of sensitivity to the variation in habitat types, we restricted the analyses of transects to the terra firme primary forest and found results consistent with those of the whole dataset at the level of the transects. Additionally to previous analyses, we detected a weak but significant phylogenetic turnover among habitat types, suggesting that species sort in varying environments, even though it is not predominating on the overall phylogenetic structure. Finer analyses of clades indicated a signal of clustering for species from the Annonaceae family, while species from the Apocynaceae family indicated overdispersion. These results can contribute to the conservation of the park by improving our understanding of the processes dictating community assembly in these hyperdiverse but threatened regions of the world. PMID:24936786

  20. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences.

    Science.gov (United States)

    Zheng, Xiaoyan; Cai, Danying; Potter, Daniel; Postman, Joseph; Liu, Jing; Teng, Yuanwen

    2014-11-01

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence datasets. Phylogenetic trees based on both cpDNA and nuclear LFY2int2-N (LN) data resulted in poor resolution, especially, only five primary species were monophyletic in the LN tree. A phylogenetic network of LN suggested that reticulation caused by hybridization is one of the major evolutionary processes for Pyrus species. Polytomies of the gene trees and star-like structure of cpDNA networks suggested rapid radiation is another major evolutionary process, especially for the occidental species. Pyrus calleryana and P. regelii were the earliest diverged Pyrus species. Two North African species, P. cordata, P. spinosa and P. betulaefolia were descendent of primitive stock Pyrus species and still share some common molecular characters. Southwestern China, where a large number of P. pashia populations are found, is probably the most important diversification center of Pyrus. More accessions and nuclear genes are needed for further understanding the evolutionary histories of Pyrus. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.

    Science.gov (United States)

    Xi, Zhenxiang; Liu, Liang; Davis, Charles C

    2015-11-01

    The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014). Copyright © 2015 Elsevier Inc. All rights reserved.

  2. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  3. SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees.

    Science.gov (United States)

    DeBlasio, Dan F; Wisecaver, Jennifer H

    2016-01-01

    We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.

  4. SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Dan F. DeBlasio

    2016-08-01

    Full Text Available We present the phylogeny analysis software SICLE (Sister Clade Extractor, an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.

  5. Local-scale Partitioning of Functional and Phylogenetic Beta Diversity in a Tropical Tree Assemblage.

    Science.gov (United States)

    Yang, Jie; Swenson, Nathan G; Zhang, Guocheng; Ci, Xiuqin; Cao, Min; Sha, Liqing; Li, Jie; Ferry Slik, J W; Lin, Luxiang

    2015-08-03

    The relative degree to which stochastic and deterministic processes underpin community assembly is a central problem in ecology. Quantifying local-scale phylogenetic and functional beta diversity may shed new light on this problem. We used species distribution, soil, trait and phylogenetic data to quantify whether environmental distance, geographic distance or their combination are the strongest predictors of phylogenetic and functional beta diversity on local scales in a 20-ha tropical seasonal rainforest dynamics plot in southwest China. The patterns of phylogenetic and functional beta diversity were generally consistent. The phylogenetic and functional dissimilarity between subplots (10 × 10 m, 20 × 20 m, 50 × 50 m and 100 × 100 m) was often higher than that expected by chance. The turnover of lineages and species function within habitats was generally slower than that across habitats. Partitioning the variation in phylogenetic and functional beta diversity showed that environmental distance was generally a better predictor of beta diversity than geographic distance thereby lending relatively more support for deterministic environmental filtering over stochastic processes. Overall, our results highlight that deterministic processes play a stronger role than stochastic processes in structuring community composition in this diverse assemblage of tropical trees.

  6. Characterization of a branch of the phylogenetic tree

    International Nuclear Information System (INIS)

    Samuel, Stuart A.; Weng, Gezhi

    2003-04-01

    We use a combination of analytic models and computer simulations to gain insight into the dynamics of evolution. Our results suggest that certain interesting phenomena should eventually emerge from the fossil record. For example, there should be a 'tortoise and hare effect': Those genera with the smallest species death rate are likely to survive much longer than genera with large species birth and death rates. A complete characterization of the behavior of a branch of the phylogenetic tree corresponding to a genus and accurate mathematical representations of the various stages are obtained. We apply our results to address certain controversial issues that have arisen in paleontology such as the importance of punctuated equilibrium and whether unique Cambrian phyla have survived to the present

  7. Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear and Giant Panda Ancestry.

    Science.gov (United States)

    Maier, Caroline Alexandra

    2001-01-01

    Presents an activity in which students seek answers to questions about evolutionary relationships by using genetic databases and bioinformatics software. Students build genetic distance matrices and phylogenetic trees based on molecular sequence data using web-based resources. Provides a flowchart of steps involved in accessing, retrieving, and…

  8. The shape and temporal dynamics of phylogenetic trees arising from geographic speciation.

    Science.gov (United States)

    Pigot, Alex L; Phillimore, Albert B; Owens, Ian P F; Orme, C David L

    2010-12-01

    Phylogenetic trees often depart from the expectations of stochastic models, exhibiting imbalance in diversification among lineages and slowdowns in the rate of lineage accumulation through time. Such departures have led to a widespread perception that ecological differences among species or adaptation and subsequent niche filling are required to explain patterns of diversification. However, a key element missing from models of diversification is the geographical context of speciation and extinction. In this study, we develop a spatially explicit model of geographic range evolution and cladogenesis, where speciation arises via vicariance or peripatry, and explore the effects of these processes on patterns of diversification. We compare the results with those observed in 41 reconstructed avian trees. Our model shows that nonconstant rates of speciation and extinction are emergent properties of the apportioning of geographic ranges that accompanies speciation. The dynamics of diversification exhibit wide variation, depending on the mode of speciation, tendency for range expansion, and rate of range evolution. By varying these parameters, the model is able to capture many, but not all, of the features exhibited by birth-death trees and extant bird clades. Under scenarios with relatively stable geographic ranges, strong slowdowns in diversification rates are produced, with faster rates of range dynamics leading to constant or accelerating rates of apparent diversification. A peripatric model of speciation with stable ranges also generates highly unbalanced trees typical of bird phylogenies but fails to produce realistic range size distributions among the extant species. Results most similar to those of a birth-death process are reached under a peripatric speciation scenario with highly volatile range dynamics. Taken together, our results demonstrate that considering the geographical context of speciation and extinction provides a more conservative null model of

  9. The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data.

    Science.gov (United States)

    O'Reilly, Joseph E; Donoghue, Philip C J

    2018-03-01

    Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.

  10. A method for investigating relative timing information on phylogenetic trees.

    Science.gov (United States)

    Ford, Daniel; Matsen, Frederick A; Stadler, Tanja

    2009-04-01

    In this paper, we present a new way to describe the timing of branching events in phylogenetic trees. Our description is in terms of the relative timing of diversification events between sister clades; as such it is complementary to existing methods using lineages-through-time plots which consider diversification in aggregate. The method can be applied to look for evidence of diversification happening in lineage-specific "bursts", or the opposite, where diversification between 2 clades happens in an unusually regular fashion. In order to be able to distinguish interesting events from stochasticity, we discuss 2 classes of neutral models on trees with relative timing information and develop a statistical framework for testing these models. These model classes include both the coalescent with ancestral population size variation and global rate speciation-extinction models. We end the paper with 2 example applications: first, we show that the evolution of the hepatitis C virus deviates from the coalescent with arbitrary population size. Second, we analyze a large tree of ants, demonstrating that a period of elevated diversification rates does not appear to have occurred in a bursting manner.

  11. TreePlus: interactive exploration of networks with enhanced tree layouts.

    Science.gov (United States)

    Lee, Bongshin; Parr, Cynthia S; Plaisant, Catherine; Bederson, Benjamin B; Veksler, Vladislav D; Gray, Wayne D; Kotfila, Christopher

    2006-01-01

    Despite extensive research, it is still difficult to produce effective interactive layouts for large graphs. Dense layout and occlusion make food webs, ontologies, and social networks difficult to understand and interact with. We propose a new interactive Visual Analytics component called TreePlus that is based on a tree-style layout. TreePlus reveals the missing graph structure with visualization and interaction while maintaining good readability. To support exploration of the local structure of the graph and gathering of information from the extensive reading of labels, we use a guiding metaphor of "Plant a seed and watch it grow." It allows users to start with a node and expand the graph as needed, which complements the classic overview techniques that can be effective at (but often limited to) revealing clusters. We describe our design goals, describe the interface, and report on a controlled user study with 28 participants comparing TreePlus with a traditional graph interface for six tasks. In general, the advantage of TreePlus over the traditional interface increased as the density of the displayed data increased. Participants also reported higher levels of confidence in their answers with TreePlus and most of them preferred TreePlus.

  12. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    Science.gov (United States)

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-11-28

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  13. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    Science.gov (United States)

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus).

    Science.gov (United States)

    Moody, Michael L; Rieseberg, Loren H

    2012-07-01

    The annual sunflowers (Helianthus sect. Helianthus) present a formidable challenge for phylogenetic inference because of ancient hybrid speciation, recent introgression, and suspected issues with deep coalescence. Here we analyze sequence data from 11 nuclear DNA (nDNA) genes for multiple genotypes of species within the section to (1) reconstruct the phylogeny of this group, (2) explore the utility of nDNA gene trees for detecting hybrid speciation and introgression; and (3) test an empirical method of hybrid identification based on the phylogenetic congruence of nDNA gene trees from tightly linked genes. We uncovered considerable topological heterogeneity among gene trees with or without three previously identified hybrid species included in the analyses, as well as a general lack of reciprocal monophyly of species. Nonetheless, partitioned Bayesian analyses provided strong support for the reciprocal monophyly of all species except H. annuus (0.89 PP), the most widespread and abundant annual sunflower. Previous hypotheses of relationships among taxa were generally strongly supported (1.0 PP), except among taxa typically associated with H. annuus, apparently due to the paraphyly of the latter in all gene trees. While the individual nDNA gene trees provided a useful means for detecting recent hybridization, identification of ancient hybridization was problematic for all ancient hybrid species, even when linkage was considered. We discuss biological factors that affect the efficacy of phylogenetic methods for hybrid identification.

  15. Phylogenetic diversity and relationships among species of genus ...

    African Journals Online (AJOL)

    Fifty six Nicotiana species were used to construct phylogenetic trees and to asses the genetic relationships between them. Genetic distances estimated from RAPD analysis was used to construct phylogenetic trees using Phylogenetic Inference Package (PHYLIP). Since phylogenetic relationships estimated for closely ...

  16. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences

    Science.gov (United States)

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence d...

  17. Phylogenetic Structure of Foliar Spectral Traits in Tropical Forest Canopies

    Directory of Open Access Journals (Sweden)

    Kelly M. McManus

    2016-02-01

    Full Text Available The Spectranomics approach to tropical forest remote sensing has established a link between foliar reflectance spectra and the phylogenetic composition of tropical canopy tree communities vis-à-vis the taxonomic organization of biochemical trait variation. However, a direct relationship between phylogenetic affiliation and foliar reflectance spectra of species has not been established. We sought to develop this relationship by quantifying the extent to which underlying patterns of phylogenetic structure drive interspecific variation among foliar reflectance spectra within three Neotropical canopy tree communities with varying levels of soil fertility. We interpreted the resulting spectral patterns of phylogenetic signal in the context of foliar biochemical traits that may contribute to the spectral-phylogenetic link. We utilized a multi-model ensemble to elucidate trait-spectral relationships, and quantified phylogenetic signal for spectral wavelengths and traits using Pagel’s lambda statistic. Foliar reflectance spectra showed evidence of phylogenetic influence primarily within the visible and shortwave infrared spectral regions. These regions were also selected by the multi-model ensemble as those most important to the quantitative prediction of several foliar biochemical traits. Patterns of phylogenetic organization of spectra and traits varied across sites and with soil fertility, indicative of the complex interactions between the environmental and phylogenetic controls underlying patterns of biodiversity.

  18. Phylogenetic assemblage structure of North American trees is more strongly shaped by glacial-interglacial climate variability in gymnosperms than in angiosperms.

    Science.gov (United States)

    Ma, Ziyu; Sandel, Brody; Svenning, Jens-Christian

    2016-05-01

    How fast does biodiversity respond to climate change? The relationship of past and current climate with phylogenetic assemblage structure helps us to understand this question. Studies of angiosperm tree diversity in North America have already suggested effects of current water-energy balance and tropical niche conservatism. However, the role of glacial-interglacial climate variability remains to be determined, and little is known about any of these relationships for gymnosperms. Moreover, phylogenetic endemism, the concentration of unique lineages in restricted ranges, may also be related to glacial-interglacial climate variability and needs more attention. We used a refined phylogeny of both angiosperms and gymnosperms to map phylogenetic diversity, clustering and endemism of North American trees in 100-km grid cells, and climate change velocity since Last Glacial Maximum together with postglacial accessibility to recolonization to quantify glacial-interglacial climate variability. We found: (1) Current climate is the dominant factor explaining the overall patterns, with more clustered angiosperm assemblages toward lower temperature, consistent with tropical niche conservatism. (2) Long-term climate stability is associated with higher angiosperm endemism, while higher postglacial accessibility is linked to to more phylogenetic clustering and endemism in gymnosperms. (3) Factors linked to glacial-interglacial climate change have stronger effects on gymnosperms than on angiosperms. These results suggest that paleoclimate legacies supplement current climate in shaping phylogenetic patterns in North American trees, and especially so for gymnosperms.

  19. FluReF, an automated flu virus reassortment finder based on phylogenetic trees.

    Science.gov (United States)

    Yurovsky, Alisa; Moret, Bernard M E

    2011-01-01

    Reassortments are events in the evolution of the genome of influenza (flu), whereby segments of the genome are exchanged between different strains. As reassortments have been implicated in major human pandemics of the last century, their identification has become a health priority. While such identification can be done "by hand" on a small dataset, researchers and health authorities are building up enormous databases of genomic sequences for every flu strain, so that it is imperative to develop automated identification methods. However, current methods are limited to pairwise segment comparisons. We present FluReF, a fully automated flu virus reassortment finder. FluReF is inspired by the visual approach to reassortment identification and uses the reconstructed phylogenetic trees of the individual segments and of the full genome. We also present a simple flu evolution simulator, based on the current, source-sink, hypothesis for flu cycles. On synthetic datasets produced by our simulator, FluReF, tuned for a 0% false positive rate, yielded false negative rates of less than 10%. FluReF corroborated two new reassortments identified by visual analysis of 75 Human H3N2 New York flu strains from 2005-2008 and gave partial verification of reassortments found using another bioinformatics method. FluReF finds reassortments by a bottom-up search of the full-genome and segment-based phylogenetic trees for candidate clades--groups of one or more sampled viruses that are separated from the other variants from the same season. Candidate clades in each tree are tested to guarantee confidence values, using the lengths of key edges as well as other tree parameters; clades with reassortments must have validated incongruencies among segment trees. FluReF demonstrates robustness of prediction for geographically and temporally expanded datasets, and is not limited to finding reassortments with previously collected sequences. The complete source code is available from http://lcbb.epfl.ch/software.html.

  20. Applying species-tree analyses to deep phylogenetic histories: challenges and potential suggested from a survey of empirical phylogenetic studies.

    Science.gov (United States)

    Lanier, Hayley C; Knowles, L Lacey

    2015-02-01

    Coalescent-based methods for species-tree estimation are becoming a dominant approach for reconstructing species histories from multi-locus data, with most of the studies examining these methodologies focused on recently diverged species. However, deeper phylogenies, such as the datasets that comprise many Tree of Life (ToL) studies, also exhibit gene-tree discordance. This discord may also arise from the stochastic sorting of gene lineages during the speciation process (i.e., reflecting the random coalescence of gene lineages in ancestral populations). It remains unknown whether guidelines regarding methodologies and numbers of loci established by simulation studies at shallow tree depths translate into accurate species relationships for deeper phylogenetic histories. We address this knowledge gap and specifically identify the challenges and limitations of species-tree methods that account for coalescent variance for deeper phylogenies. Using simulated data with characteristics informed by empirical studies, we evaluate both the accuracy of estimated species trees and the characteristics associated with recalcitrant nodes, with a specific focus on whether coalescent variance is generally responsible for the lack of resolution. By determining the proportion of coalescent genealogies that support a particular node, we demonstrate that (1) species-tree methods account for coalescent variance at deep nodes and (2) mutational variance - not gene-tree discord arising from the coalescent - posed the primary challenge for accurate reconstruction across the tree. For example, many nodes were accurately resolved despite predicted discord from the random coalescence of gene lineages and nodes with poor support were distributed across a range of depths (i.e., they were not restricted to a particular recent divergences). Given their broad taxonomic scope and large sampling of taxa, deep level phylogenies pose several potential methodological complications including

  1. Trends over time in tree and seedling phylogenetic diversity indicate regional differences in forest biodiversity change.

    Science.gov (United States)

    Potter, Kevin M; Woodall, Christopher W

    2012-03-01

    Changing climate conditions may impact the short-term ability of forest tree species to regenerate in many locations. In the longer term, tree species may be unable to persist in some locations while they become established in new places. Over both time frames, forest tree biodiversity may change in unexpected ways. Using repeated inventory measurements five years apart from more than 7000 forested plots in the eastern United States, we tested three hypotheses: phylogenetic diversity is substantially different from species richness as a measure of biodiversity; forest communities have undergone recent changes in phylogenetic diversity that differ by size class, region, and seed dispersal strategy; and these patterns are consistent with expected early effects of climate change. Specifically, the magnitude of diversity change across broad regions should be greater among seedlings than in trees, should be associated with latitude and elevation, and should be greater among species with high dispersal capacity. Our analyses demonstrated that phylogenetic diversity and species richness are decoupled at small and medium scales and are imperfectly associated at large scales. This suggests that it is appropriate to apply indicators of biodiversity change based on phylogenetic diversity, which account for evolutionary relationships among species and may better represent community functional diversity. Our results also detected broadscale patterns of forest biodiversity change that are consistent with expected early effects of climate change. First, the statistically significant increase over time in seedling diversity in the South suggests that conditions there have become more favorable for the reproduction and dispersal of a wider variety of species, whereas the significant decrease in northern seedling diversity indicates that northern conditions have become less favorable. Second, we found weak correlations between seedling diversity change and latitude in both zones

  2. TreePics: visualizing trees with pictures

    Directory of Open Access Journals (Sweden)

    Nicolas Puillandre

    2017-09-01

    Full Text Available While many programs are available to edit phylogenetic trees, associating pictures with branch tips in an efficient and automatic way is not an available option. Here, we present TreePics, a standalone software that uses a web browser to visualize phylogenetic trees in Newick format and that associates pictures (typically, pictures of the voucher specimens to the tip of each branch. Pictures are visualized as thumbnails and can be enlarged by a mouse rollover. Further, several pictures can be selected and displayed in a separate window for visual comparison. TreePics works either online or in a full standalone version, where it can display trees with several thousands of pictures (depending on the memory available. We argue that TreePics can be particularly useful in a preliminary stage of research, such as to quickly detect conflicts between a DNA-based phylogenetic tree and morphological variation, that may be due to contamination that needs to be removed prior to final analyses, or the presence of species complexes.

  3. Molecular Phylogenetics: Concepts for a Newcomer.

    Science.gov (United States)

    Ajawatanawong, Pravech

    Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.

  4. Maximum likelihood phylogenetic reconstruction from high-resolution whole-genome data and a tree of 68 eukaryotes.

    Science.gov (United States)

    Lin, Yu; Hu, Fei; Tang, Jijun; Moret, Bernard M E

    2013-01-01

    The rapid accumulation of whole-genome data has renewed interest in the study of the evolution of genomic architecture, under such events as rearrangements, duplications, losses. Comparative genomics, evolutionary biology, and cancer research all require tools to elucidate the mechanisms, history, and consequences of those evolutionary events, while phylogenetics could use whole-genome data to enhance its picture of the Tree of Life. Current approaches in the area of phylogenetic analysis are limited to very small collections of closely related genomes using low-resolution data (typically a few hundred syntenic blocks); moreover, these approaches typically do not include duplication and loss events. We describe a maximum likelihood (ML) approach for phylogenetic analysis that takes into account genome rearrangements as well as duplications, insertions, and losses. Our approach can handle high-resolution genomes (with 40,000 or more markers) and can use in the same analysis genomes with very different numbers of markers. Because our approach uses a standard ML reconstruction program (RAxML), it scales up to large trees. We present the results of extensive testing on both simulated and real data showing that our approach returns very accurate results very quickly. In particular, we analyze a dataset of 68 high-resolution eukaryotic genomes, with from 3,000 to 42,000 genes, from the eGOB database; the analysis, including bootstrapping, takes just 3 hours on a desktop system and returns a tree in agreement with all well supported branches, while also suggesting resolutions for some disputed placements.

  5. A program for verification of phylogenetic network models.

    Science.gov (United States)

    Gunawan, Andreas D M; Lu, Bingxin; Zhang, Louxin

    2016-09-01

    Genetic material is transferred in a non-reproductive manner across species more frequently than commonly thought, particularly in the bacteria kingdom. On one hand, extant genomes are thus more properly considered as a fusion product of both reproductive and non-reproductive genetic transfers. This has motivated researchers to adopt phylogenetic networks to study genome evolution. On the other hand, a gene's evolution is usually tree-like and has been studied for over half a century. Accordingly, the relationships between phylogenetic trees and networks are the basis for the reconstruction and verification of phylogenetic networks. One important problem in verifying a network model is determining whether or not certain existing phylogenetic trees are displayed in a phylogenetic network. This problem is formally called the tree containment problem. It is NP-complete even for binary phylogenetic networks. We design an exponential time but efficient method for determining whether or not a phylogenetic tree is displayed in an arbitrary phylogenetic network. It is developed on the basis of the so-called reticulation-visible property of phylogenetic networks. A C-program is available for download on http://www.math.nus.edu.sg/∼matzlx/tcp_package matzlx@nus.edu.sg Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Building a Phylogenetic Tree of the Human and Ape Superfamily Using DNA-DNA Hybridization Data

    Science.gov (United States)

    Maier, Caroline Alexander

    2004-01-01

    The study describes the process of DNA-DNA hybridization and the history of its use by Sibley and Alquist in simple, straightforward, and interesting language that students easily understand to create their own phylogenetic tree of the hominoid superfamily. They calibrate the DNA clock and use it to estimate the divergence dates of the various…

  7. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation.

    Science.gov (United States)

    Hoang, Diep Thi; Vinh, Le Sy; Flouri, Tomáš; Stamatakis, Alexandros; von Haeseler, Arndt; Minh, Bui Quang

    2018-02-02

    The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .

  8. Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement

    Directory of Open Access Journals (Sweden)

    Charlebois Robert L

    2005-04-01

    Full Text Available Abstract Background When organismal phylogenies based on sequences of single marker genes are poorly resolved, a logical approach is to add more markers, on the assumption that weak but congruent phylogenetic signal will be reinforced in such multigene trees. Such approaches are valid only when the several markers indeed have identical phylogenies, an issue which many multigene methods (such as the use of concatenated gene sequences or the assembly of supertrees do not directly address. Indeed, even when the true history is a mixture of vertical descent for some genes and lateral gene transfer (LGT for others, such methods produce unique topologies. Results We have developed software that aims to extract evidence for vertical and lateral inheritance from a set of gene trees compared against an arbitrary reference tree. This evidence is then displayed as a synthesis showing support over the tree for vertical inheritance, overlaid with explicit lateral gene transfer (LGT events inferred to have occurred over the history of the tree. Like splits-tree methods, one can thus identify nodes at which conflict occurs. Additionally one can make reasonable inferences about vertical and lateral signal, assigning putative donors and recipients. Conclusion A tool such as ours can serve to explore the reticulated dimensionality of molecular evolution, by dissecting vertical and lateral inheritance at high resolution. By this, we mean that individual nodes can be examined not only for congruence, but also for coherence in light of LGT. We assert that our tools will facilitate the comparison of phylogenetic trees, and the interpretation of conflicting data.

  9. Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction.

    Science.gov (United States)

    Beiko, Robert G; Ragan, Mark A

    2009-01-01

    Phylogenomic methods can be used to investigate the tangled evolutionary relationships among genomes. Building 'all the trees of all the genes' can potentially identify common pathways of horizontal gene transfer (HGT) among taxa at varying levels of phylogenetic depth. Phylogenetic affinities can be aggregated and merged with the information about genetic linkage and biochemical function to examine hypotheses of adaptive evolution via HGT. Additionally, the use of many genetic data sets increases the power of statistical tests for phylogenetic artifacts. However, large-scale phylogenetic analyses pose several challenges, including the necessary abandonment of manual validation techniques, the need to translate inferred phylogenetic discordance into inferred HGT events, and the challenges involved in aggregating results from search-based inference methods. In this chapter we describe a tree search procedure to recover the most parsimonious pathways of HGT, and examine some of the assumptions that are made by this method.

  10. Rearrangement moves on rooted phylogenetic networks.

    Science.gov (United States)

    Gambette, Philippe; van Iersel, Leo; Jones, Mark; Lafond, Manuel; Pardi, Fabio; Scornavacca, Celine

    2017-08-01

    Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network-that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose "horizontal" moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and "vertical" moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves-named rNNI and rSPR-reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results-separating the contributions of horizontal and vertical moves-we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for

  11. Rearrangement moves on rooted phylogenetic networks.

    Directory of Open Access Journals (Sweden)

    Philippe Gambette

    2017-08-01

    Full Text Available Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network-that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose "horizontal" moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and "vertical" moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves-named rNNI and rSPR-reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results-separating the contributions of horizontal and vertical moves-we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide

  12. Age-Dependent and Lineage-Dependent Speciation and Extinction in the Imbalance of Phylogenetic Trees.

    Science.gov (United States)

    Holman, Eric W

    2017-11-01

    It is known that phylogenetic trees are more imbalanced than expected from a birth-death model with constant rates of speciation and extinction, and also that imbalance can be better fit by allowing the rate of speciation to decrease as the age of the parent species increases. If imbalance is measured in more detail, at nodes within trees as a function of the number of species descended from the nodes, age-dependent models predict levels of imbalance comparable to real trees for small numbers of descendent species, but predicted imbalance approaches an asymptote not found in real trees as the number of descendent species becomes large. Age-dependence must therefore be complemented by another process such as inheritance of different rates along different lineages, which is known to predict insufficient imbalance at nodes with few descendent species, but can predict increasing imbalance with increasing numbers of descendent species. [Crump-Mode-Jagers process; diversification; macroevolution; taxon sampling; tree of life.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Genetic distances and phylogenetic trees of different Awassi sheep populations based on DNA sequencing.

    Science.gov (United States)

    Al-Atiyat, R M; Aljumaah, R S

    2014-08-27

    This study aimed to estimate evolutionary distances and to reconstruct phylogeny trees between different Awassi sheep populations. Thirty-two sheep individuals from three different geographical areas of Jordan and the Kingdom of Saudi Arabia (KSA) were randomly sampled. DNA was extracted from the tissue samples and sequenced using the T7 promoter universal primer. Different phylogenetic trees were reconstructed from 0.64-kb DNA sequences using the MEGA software with the best general time reverse distance model. Three methods of distance estimation were then used. The maximum composite likelihood test was considered for reconstructing maximum likelihood, neighbor-joining and UPGMA trees. The maximum likelihood tree indicated three major clusters separated by cytosine (C) and thymine (T). The greatest distance was shown between the South sheep and North sheep. On the other hand, the KSA sheep as an outgroup showed shorter evolutionary distance to the North sheep population than to the others. The neighbor-joining and UPGMA trees showed quite reliable clusters of evolutionary differentiation of Jordan sheep populations from the Saudi population. The overall results support geographical information and ecological types of the sheep populations studied. Summing up, the resulting phylogeny trees may contribute to the limited information about the genetic relatedness and phylogeny of Awassi sheep in nearby Arab countries.

  14. Molecular phylogenetic trees - On the validity of the Goodman-Moore augmentation algorithm

    Science.gov (United States)

    Holmquist, R.

    1979-01-01

    A response is made to the reply of Nei and Tateno (1979) to the letter of Holmquist (1978) supporting the validity of the augmentation algorithm of Moore (1977) in reconstructions of nucleotide substitutions by means of the maximum parsimony principle. It is argued that the overestimation of the augmented numbers of nucleotide substitutions (augmented distances) found by Tateno and Nei (1978) is due to an unrepresentative data sample and that it is only necessary that evolution be stochastically uniform in different regions of the phylogenetic network for the augmentation method to be useful. The importance of the average value of the true distance over all links is explained, and the relative variances of the true and augmented distances are calculated to be almost identical. The effects of topological changes in the phylogenetic tree on the augmented distance and the question of the correctness of ancestral sequences inferred by the method of parsimony are also clarified.

  15. A Metric on Phylogenetic Tree Shapes.

    Science.gov (United States)

    Colijn, C; Plazzotta, G

    2018-01-01

    The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees' branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  16. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  17. Reconstructing phylogenetic networks using maximum parsimony.

    Science.gov (United States)

    Nakhleh, Luay; Jin, Guohua; Zhao, Fengmei; Mellor-Crummey, John

    2005-01-01

    Phylogenies - the evolutionary histories of groups of organisms - are one of the most widely used tools throughout the life sciences, as well as objects of research within systematics, evolutionary biology, epidemiology, etc. Almost every tool devised to date to reconstruct phylogenies produces trees; yet it is widely understood and accepted that trees oversimplify the evolutionary histories of many groups of organims, most prominently bacteria (because of horizontal gene transfer) and plants (because of hybrid speciation). Various methods and criteria have been introduced for phylogenetic tree reconstruction. Parsimony is one of the most widely used and studied criteria, and various accurate and efficient heuristics for reconstructing trees based on parsimony have been devised. Jotun Hein suggested a straightforward extension of the parsimony criterion to phylogenetic networks. In this paper we formalize this concept, and provide the first experimental study of the quality of parsimony as a criterion for constructing and evaluating phylogenetic networks. Our results show that, when extended to phylogenetic networks, the parsimony criterion produces promising results. In a great majority of the cases in our experiments, the parsimony criterion accurately predicts the numbers and placements of non-tree events.

  18. Consequences of recombination on traditional phylogenetic analysis

    DEFF Research Database (Denmark)

    Schierup, M H; Hein, J

    2000-01-01

    We investigate the shape of a phylogenetic tree reconstructed from sequences evolving under the coalescent with recombination. The motivation is that evolutionary inferences are often made from phylogenetic trees reconstructed from population data even though recombination may well occur (mt......DNA or viral sequences) or does occur (nuclear sequences). We investigate the size and direction of biases when a single tree is reconstructed ignoring recombination. Standard software (PHYLIP) was used to construct the best phylogenetic tree from sequences simulated under the coalescent with recombination....... With recombination present, the length of terminal branches and the total branch length are larger, and the time to the most recent common ancestor smaller, than for a tree reconstructed from sequences evolving with no recombination. The effects are pronounced even for small levels of recombination that may...

  19. Sampling strategies for improving tree accuracy and phylogenetic analyses: a case study in ciliate protists, with notes on the genus Paramecium.

    Science.gov (United States)

    Yi, Zhenzhen; Strüder-Kypke, Michaela; Hu, Xiaozhong; Lin, Xiaofeng; Song, Weibo

    2014-02-01

    In order to assess how dataset-selection for multi-gene analyses affects the accuracy of inferred phylogenetic trees in ciliates, we chose five genes and the genus Paramecium, one of the most widely used model protist genera, and compared tree topologies of the single- and multi-gene analyses. Our empirical study shows that: (1) Using multiple genes improves phylogenetic accuracy, even when their one-gene topologies are in conflict with each other. (2) The impact of missing data on phylogenetic accuracy is ambiguous: resolution power and topological similarity, but not number of represented taxa, are the most important criteria of a dataset for inclusion in concatenated analyses. (3) As an example, we tested the three classification models of the genus Paramecium with a multi-gene based approach, and only the monophyly of the subgenus Paramecium is supported. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. A parametric method for assessing diversification-rate variation in phylogenetic trees.

    Science.gov (United States)

    Shah, Premal; Fitzpatrick, Benjamin M; Fordyce, James A

    2013-02-01

    Phylogenetic hypotheses are frequently used to examine variation in rates of diversification across the history of a group. Patterns of diversification-rate variation can be used to infer underlying ecological and evolutionary processes responsible for patterns of cladogenesis. Most existing methods examine rate variation through time. Methods for examining differences in diversification among groups are more limited. Here, we present a new method, parametric rate comparison (PRC), that explicitly compares diversification rates among lineages in a tree using a variety of standard statistical distributions. PRC can identify subclades of the tree where diversification rates are at variance with the remainder of the tree. A randomization test can be used to evaluate how often such variance would appear by chance alone. The method also allows for comparison of diversification rate among a priori defined groups. Further, the application of the PRC method is not restricted to monophyletic groups. We examined the performance of PRC using simulated data, which showed that PRC has acceptable false-positive rates and statistical power to detect rate variation. We apply the PRC method to the well-studied radiation of North American Plethodon salamanders, and support the inference that the large-bodied Plethodon glutinosus clade has a higher historical rate of diversification compared to other Plethodon salamanders. © 2012 The Author(s). Evolution© 2012 The Society for the Study of Evolution.

  1. Application of unweighted pair group methods with arithmetic average (UPGMA) for identification of kinship types and spreading of ebola virus through establishment of phylogenetic tree

    Science.gov (United States)

    Andriani, Tri; Irawan, Mohammad Isa

    2017-08-01

    Ebola Virus Disease (EVD) is a disease caused by a virus of the genus Ebolavirus (EBOV), family Filoviridae. Ebola virus is classifed into five types, namely Zaire ebolavirus (ZEBOV), Sudan ebolavirus (SEBOV), Bundibugyo ebolavirus (BEBOV), Tai Forest ebolavirus also known as Cote d'Ivoire ebolavirus (CIEBOV), and Reston ebolavirus (REBOV). Identification of kinship types of Ebola virus can be performed using phylogenetic trees. In this study, the phylogenetic tree constructed by UPGMA method in which there are Multiple Alignment using Progressive Method. The results concluded that the phylogenetic tree formation kinship ebola virus types that kind of Tai Forest ebolavirus close to Bundibugyo ebolavirus but the layout state ebola epidemic spread far apart. The genetic distance for this type of Bundibugyo ebolavirus with Tai Forest ebolavirus is 0.3725. Type Tai Forest ebolavirus similar to Bundibugyo ebolavirus not inuenced by the proximity of the area ebola epidemic spread.

  2. Distance-Based Phylogenetic Methods Around a Polytomy.

    Science.gov (United States)

    Davidson, Ruth; Sullivant, Seth

    2014-01-01

    Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.

  3. Molecular Phylogenetic: Organism Taxonomy Method Based on Evolution History

    Directory of Open Access Journals (Sweden)

    N.L.P Indi Dharmayanti

    2011-03-01

    Full Text Available Phylogenetic is described as taxonomy classification of an organism based on its evolution history namely its phylogeny and as a part of systematic science that has objective to determine phylogeny of organism according to its characteristic. Phylogenetic analysis from amino acid and protein usually became important area in sequence analysis. Phylogenetic analysis can be used to follow the rapid change of a species such as virus. The phylogenetic evolution tree is a two dimensional of a species graphic that shows relationship among organisms or particularly among their gene sequences. The sequence separation are referred as taxa (singular taxon that is defined as phylogenetically distinct units on the tree. The tree consists of outer branches or leaves that represents taxa and nodes and branch represent correlation among taxa. When the nucleotide sequence from two different organism are similar, they were inferred to be descended from common ancestor. There were three methods which were used in phylogenetic, namely (1 Maximum parsimony, (2 Distance, and (3 Maximum likehoood. Those methods generally are applied to construct the evolutionary tree or the best tree for determine sequence variation in group. Every method is usually used for different analysis and data.

  4. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  5. On Determining if Tree-based Networks Contain Fixed Trees.

    Science.gov (United States)

    Anaya, Maria; Anipchenko-Ulaj, Olga; Ashfaq, Aisha; Chiu, Joyce; Kaiser, Mahedi; Ohsawa, Max Shoji; Owen, Megan; Pavlechko, Ella; St John, Katherine; Suleria, Shivam; Thompson, Keith; Yap, Corrine

    2016-05-01

    We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is [Formula: see text]-hard to decide, by reduction from 3-Dimensional Matching (3DM) and further that the problem is fixed-parameter tractable.

  6. Persistence of Neighborhood Demographic Influences over Long Phylogenetic Distances May Help Drive Post-Speciation Adaptation in Tropical Forests.

    Science.gov (United States)

    Wills, Christopher; Harms, Kyle E; Wiegand, Thorsten; Punchi-Manage, Ruwan; Gilbert, Gregory S; Erickson, David; Kress, W John; Hubbell, Stephen P; Gunatilleke, C V Savitri; Gunatilleke, I A U Nimal

    2016-01-01

    Studies of forest dynamics plots (FDPs) have revealed a variety of negative density-dependent (NDD) demographic interactions, especially among conspecific trees. These interactions can affect growth rate, recruitment and mortality, and they play a central role in the maintenance of species diversity in these complex ecosystems. Here we use an equal area annulus (EAA) point-pattern method to comprehensively analyze data from two tropical FDPs, Barro Colorado Island in Panama and Sinharaja in Sri Lanka. We show that these NDD interactions also influence the continued evolutionary diversification of even distantly related tree species in these FDPs. We examine the details of a wide range of these interactions between individual trees and the trees that surround them. All these interactions, and their cumulative effects, are strongest among conspecific focal and surrounding tree species in both FDPs. They diminish in magnitude with increasing phylogenetic distance between heterospecific focal and surrounding trees, but do not disappear or change the pattern of their dependence on size, density, frequency or physical distance even among the most distantly related trees. The phylogenetic persistence of all these effects provides evidence that interactions between tree species that share an ecosystem may continue to promote adaptive divergence even after the species' gene pools have become separated. Adaptive divergence among taxa would operate in stark contrast to an alternative possibility that has previously been suggested, that distantly related species with dispersal-limited distributions and confronted with unpredictable neighbors will tend to converge on common strategies of resource use. In addition, we have also uncovered a positive density-dependent effect: growth rates of large trees are boosted in the presence of a smaller basal area of surrounding trees. We also show that many of the NDD interactions switch sign rapidly as focal trees grow in size, and that

  7. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    Science.gov (United States)

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  8. Point estimates in phylogenetic reconstructions

    OpenAIRE

    Benner, Philipp; Bacak, Miroslav; Bourguignon, Pierre-Yves

    2013-01-01

    Motivation: The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space of phylogenetic trees, and ad hoc methods such as the derivation of a consensus tree makeup for the ill-definition of the usual concepts of posterior mean, while bootstrap methods mitigate the absence of a sound concept of variance. Yielding satisfactory results with sufficiently concentrated pos...

  9. phangorn: phylogenetic analysis in R.

    Science.gov (United States)

    Schliep, Klaus Peter

    2011-02-15

    phangorn is a package for phylogenetic reconstruction and analysis in the R language. Previously it was only possible to estimate phylogenetic trees with distance methods in R. phangorn, now offers the possibility of reconstructing phylogenies with distance based methods, maximum parsimony or maximum likelihood (ML) and performing Hadamard conjugation. Extending the general ML framework, this package provides the possibility of estimating mixture and partition models. Furthermore, phangorn offers several functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analyses. phangorn can be obtained through the CRAN homepage http://cran.r-project.org/web/packages/phangorn/index.html. phangorn is licensed under GPL 2.

  10. Inferring Phylogenetic Networks Using PhyloNet.

    Science.gov (United States)

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  11. On the quirks of maximum parsimony and likelihood on phylogenetic networks.

    Science.gov (United States)

    Bryant, Christopher; Fischer, Mareike; Linz, Simone; Semple, Charles

    2017-03-21

    Maximum parsimony is one of the most frequently-discussed tree reconstruction methods in phylogenetic estimation. However, in recent years it has become more and more apparent that phylogenetic trees are often not sufficient to describe evolution accurately. For instance, processes like hybridization or lateral gene transfer that are commonplace in many groups of organisms and result in mosaic patterns of relationships cannot be represented by a single phylogenetic tree. This is why phylogenetic networks, which can display such events, are becoming of more and more interest in phylogenetic research. It is therefore necessary to extend concepts like maximum parsimony from phylogenetic trees to networks. Several suggestions for possible extensions can be found in recent literature, for instance the softwired and the hardwired parsimony concepts. In this paper, we analyze the so-called big parsimony problem under these two concepts, i.e. we investigate maximum parsimonious networks and analyze their properties. In particular, we show that finding a softwired maximum parsimony network is possible in polynomial time. We also show that the set of maximum parsimony networks for the hardwired definition always contains at least one phylogenetic tree. Lastly, we investigate some parallels of parsimony to different likelihood concepts on phylogenetic networks. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Regularity dimension of sequences and its application to phylogenetic tree reconstruction

    International Nuclear Information System (INIS)

    Pham, Tuan D.

    2012-01-01

    The concept of dimension is a central development of chaos theory for studying nonlinear dynamical systems. Different types of dimensions have been derived to interpret different geometrical or physical observations. Approximate entropy and its modified methods have been introduced for studying regularity and complexity of time-series data in physiology and biology. Here, the concept of power laws and entropy measure are adopted to develop the regularity dimension of sequences to model a mathematical relationship between the frequency with which information about signal regularity changes in various scales. The proposed regularity dimension is applied to reconstruct phylogenetic trees using mitochondrial DNA (mtDNA) sequences for the family Hominidae, which can be validated according to the hypothesized evolutionary relationships between organisms.

  13. Phylogenetic comparative methods on phylogenetic networks with reticulations.

    Science.gov (United States)

    Bastide, Paul; Solís-Lemus, Claudia; Kriebel, Ricardo; Sparks, K William; Ané, Cécile

    2018-04-25

    The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species' traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. One natural extension of the BM is to use a weighted average model for the trait of a hybrid, at a reticulation point. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel's λ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents, for instance because of hybrid vigor or hybrid depression. These two phenomena are rather commonly observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts, and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios, and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to a dataset of Xiphophorus fishes, to confirm and complete previous analysis in this group. All the methods developed here are available in the Julia package PhyloNetworks.

  14. Tree-grass interactions in savannas

    CSIR Research Space (South Africa)

    Scholes, RJ

    1997-01-01

    Full Text Available Savannas occur where trees and grasses interact to create a biome that is neither grassland nor forest. Woody and gramineous plants interact by many mechanisms, some negative (competition) and some positive (facilitation). The strength and sign...

  15. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    Directory of Open Access Journals (Sweden)

    Villemereuil Pierre de

    2012-06-01

    Full Text Available Abstract Background Uncertainty in comparative analyses can come from at least two sources: a phylogenetic uncertainty in the tree topology or branch lengths, and b uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow and inflated significance in hypothesis testing (e.g. p-values will be too small. Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible

  16. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    Science.gov (United States)

    2012-01-01

    Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for

  17. Diversity Dynamics in Nymphalidae Butterflies: Effect of Phylogenetic Uncertainty on Diversification Rate Shift Estimates

    Science.gov (United States)

    Peña, Carlos; Espeland, Marianne

    2015-01-01

    The species rich butterfly family Nymphalidae has been used to study evolutionary interactions between plants and insects. Theories of insect-hostplant dynamics predict accelerated diversification due to key innovations. In evolutionary biology, analysis of maximum credibility trees in the software MEDUSA (modelling evolutionary diversity using stepwise AIC) is a popular method for estimation of shifts in diversification rates. We investigated whether phylogenetic uncertainty can produce different results by extending the method across a random sample of trees from the posterior distribution of a Bayesian run. Using the MultiMEDUSA approach, we found that phylogenetic uncertainty greatly affects diversification rate estimates. Different trees produced diversification rates ranging from high values to almost zero for the same clade, and both significant rate increase and decrease in some clades. Only four out of 18 significant shifts found on the maximum clade credibility tree were consistent across most of the sampled trees. Among these, we found accelerated diversification for Ithomiini butterflies. We used the binary speciation and extinction model (BiSSE) and found that a hostplant shift to Solanaceae is correlated with increased net diversification rates in Ithomiini, congruent with the diffuse cospeciation hypothesis. Our results show that taking phylogenetic uncertainty into account when estimating net diversification rate shifts is of great importance, as very different results can be obtained when using the maximum clade credibility tree and other trees from the posterior distribution. PMID:25830910

  18. Diversity dynamics in Nymphalidae butterflies: effect of phylogenetic uncertainty on diversification rate shift estimates.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available The species rich butterfly family Nymphalidae has been used to study evolutionary interactions between plants and insects. Theories of insect-hostplant dynamics predict accelerated diversification due to key innovations. In evolutionary biology, analysis of maximum credibility trees in the software MEDUSA (modelling evolutionary diversity using stepwise AIC is a popular method for estimation of shifts in diversification rates. We investigated whether phylogenetic uncertainty can produce different results by extending the method across a random sample of trees from the posterior distribution of a Bayesian run. Using the MultiMEDUSA approach, we found that phylogenetic uncertainty greatly affects diversification rate estimates. Different trees produced diversification rates ranging from high values to almost zero for the same clade, and both significant rate increase and decrease in some clades. Only four out of 18 significant shifts found on the maximum clade credibility tree were consistent across most of the sampled trees. Among these, we found accelerated diversification for Ithomiini butterflies. We used the binary speciation and extinction model (BiSSE and found that a hostplant shift to Solanaceae is correlated with increased net diversification rates in Ithomiini, congruent with the diffuse cospeciation hypothesis. Our results show that taking phylogenetic uncertainty into account when estimating net diversification rate shifts is of great importance, as very different results can be obtained when using the maximum clade credibility tree and other trees from the posterior distribution.

  19. T-BAS: Tree-Based Alignment Selector toolkit for phylogenetic-based placement, alignment downloads and metadata visualization: an example with the Pezizomycotina tree of life.

    Science.gov (United States)

    Carbone, Ignazio; White, James B; Miadlikowska, Jolanta; Arnold, A Elizabeth; Miller, Mark A; Kauff, Frank; U'Ren, Jana M; May, Georgiana; Lutzoni, François

    2017-04-15

    High-quality phylogenetic placement of sequence data has the potential to greatly accelerate studies of the diversity, systematics, ecology and functional biology of diverse groups. We developed the Tree-Based Alignment Selector (T-BAS) toolkit to allow evolutionary placement and visualization of diverse DNA sequences representing unknown taxa within a robust phylogenetic context, and to permit the downloading of highly curated, single- and multi-locus alignments for specific clades. In its initial form, T-BAS v1.0 uses a core phylogeny of 979 taxa (including 23 outgroup taxa, as well as 61 orders, 175 families and 496 genera) representing all 13 classes of largest subphylum of Fungi-Pezizomycotina (Ascomycota)-based on sequence alignments for six loci (nr5.8S, nrLSU, nrSSU, mtSSU, RPB1, RPB2 ). T-BAS v1.0 has three main uses: (i) Users may download alignments and voucher tables for members of the Pezizomycotina directly from the reference tree, facilitating systematics studies of focal clades. (ii) Users may upload sequence files with reads representing unknown taxa and place these on the phylogeny using either BLAST or phylogeny-based approaches, and then use the displayed tree to select reference taxa to include when downloading alignments. The placement of unknowns can be performed for large numbers of Sanger sequences obtained from fungal cultures and for alignable, short reads of environmental amplicons. (iii) User-customizable metadata can be visualized on the tree. T-BAS Version 1.0 is available online at http://tbas.hpc.ncsu.edu . Registration is required to access the CIPRES Science Gateway and NSF XSEDE's large computational resources. icarbon@ncsu.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks

    OpenAIRE

    Bryant, Christopher; Fischer, Mareike; Linz, Simone; Semple, Charles

    2015-01-01

    Maximum parsimony is one of the most frequently-discussed tree reconstruction methods in phylogenetic estimation. However, in recent years it has become more and more apparent that phylogenetic trees are often not sufficient to describe evolution accurately. For instance, processes like hybridization or lateral gene transfer that are commonplace in many groups of organisms and result in mosaic patterns of relationships cannot be represented by a single phylogenetic tree. This is why phylogene...

  1. Ultrafast Approximation for Phylogenetic Bootstrap

    NARCIS (Netherlands)

    Bui Quang Minh, [No Value; Nguyen, Thi; von Haeseler, Arndt

    Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and

  2. Polytomy identification in microbial phylogenetic reconstruction

    Directory of Open Access Journals (Sweden)

    Lin Guan

    2011-12-01

    Full Text Available Abstract Background A phylogenetic tree, showing ancestral relations among organisms, is commonly represented as a rooted tree with sets of bifurcating branches (dichotomies for simplicity, although polytomies (multifurcating branches may reflect more accurate evolutionary relationships. To represent the true evolutionary relationships, it is important to systematically identify the polytomies from a bifurcating tree and generate a taxonomy-compatible multifurcating tree. For this purpose we propose a novel approach, "PolyPhy", which would classify a set of bifurcating branches of a phylogenetic tree into a set of branches with dichotomies and polytomies by considering genome distances among genomes and tree topological properties. Results PolyPhy employs a machine learning technique, BLR (Bayesian logistic regression classifier, to identify possible bifurcating subtrees as polytomies from the trees resulted from ComPhy. Other than considering genome-scale distances between all pairs of species, PolyPhy also takes into account different properties of tree topology between dichotomy and polytomy, such as long-branch retraction and short-branch contraction, and quantifies these properties into comparable rates among different sub-branches. We extract three tree topological features, 'LR' (Leaf rate, 'IntraR' (Intra-subset branch rate and 'InterR' (Inter-subset branch rate, all of which are calculated from bifurcating tree branch sets for classification. We have achieved F-measure (balanced measure between precision and recall of 81% with about 0.9 area under the curve (AUC of ROC. Conclusions PolyPhy is a fast and robust method to identify polytomies from phylogenetic trees based on genome-wide inference of evolutionary relationships among genomes. The software package and test data can be downloaded from http://digbio.missouri.edu/ComPhy/phyloTreeBiNonBi-1.0.zip.

  3. Effect of site-specific heterogeneous evolution on phylogenetic reconstruction: a simple evaluation.

    Science.gov (United States)

    Cheng, Qiqun; Su, Zhixi; Zhong, Yang; Gu, Xun

    2009-07-15

    Recent studies have shown that heterogeneous evolution may mislead phylogenetic analysis, which has been neglected for a long time. We evaluate the effect of heterogeneous evolution on phylogenetic analysis, using 18 fish mitogenomic coding sequences as an example. Using the software DIVERGE, we identify 198 amino acid sites that have experienced heterogeneous evolution. After removing these sites, the rest of sites are shown to be virtually homogeneous in the evolutionary rate. There are some differences between phylogenetic trees built with heterogeneous sites ("before tree") and without heterogeneous sites ("after tree"). Our study demonstrates that for phylogenetic reconstruction, an effective approach is to identify and remove sites with heterogeneous evolution, and suggests that researchers can use the software DIVERGE to remove the influence of heterogeneous evolution before reconstructing phylogenetic trees.

  4. EM for phylogenetic topology reconstruction on nonhomogeneous data.

    Science.gov (United States)

    Ibáñez-Marcelo, Esther; Casanellas, Marta

    2014-06-17

    The reconstruction of the phylogenetic tree topology of four taxa is, still nowadays, one of the main challenges in phylogenetics. Its difficulties lie in considering not too restrictive evolutionary models, and correctly dealing with the long-branch attraction problem. The correct reconstruction of 4-taxon trees is crucial for making quartet-based methods work and being able to recover large phylogenies. We adapt the well known expectation-maximization algorithm to evolutionary Markov models on phylogenetic 4-taxon trees. We then use this algorithm to estimate the substitution parameters, compute the corresponding likelihood, and to infer the most likely quartet. In this paper we consider an expectation-maximization method for maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models on trees. We study its success on reconstructing 4-taxon topologies and its performance as input method in quartet-based phylogenetic reconstruction methods such as QFIT and QuartetSuite. Our results show that the method proposed here outperforms neighbor-joining and the usual (time-homogeneous continuous-time) maximum likelihood methods on 4-leaved trees with among-lineage instantaneous rate heterogeneity, and perform similarly to usual continuous-time maximum-likelihood when data satisfies the assumptions of both methods. The method presented in this paper is well suited for reconstructing the topology of any number of taxa via quartet-based methods and is highly accurate, specially regarding largely divergent trees and time nonhomogeneous data.

  5. Maximum Parsimony on Phylogenetic networks

    Science.gov (United States)

    2012-01-01

    Background Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past. Results In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores. Conclusion The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are

  6. Phylogenetic Position of Barbus lacerta Heckel, 1843

    Directory of Open Access Journals (Sweden)

    Mustafa Korkmaz

    2015-11-01

    As a result, five clades come out from phylogenetic reconstruction and in phylogenetic tree Barbus lacerta determined to be sister group of Barbus macedonicus, Barbus oligolepis and Barbus plebejus complex.

  7. Interactive wood combustion for botanical tree models

    KAUST Repository

    Pirk, Sören

    2017-11-22

    We present a novel method for the combustion of botanical tree models. Tree models are represented as connected particles for the branching structure and a polygonal surface mesh for the combustion. Each particle stores biological and physical attributes that drive the kinetic behavior of a plant and the exothermic reaction of the combustion. Coupled with realistic physics for rods, the particles enable dynamic branch motions. We model material properties, such as moisture and charring behavior, and associate them with individual particles. The combustion is efficiently processed in the surface domain of the tree model on a polygonal mesh. A user can dynamically interact with the model by initiating fires and by inducing stress on branches. The flames realistically propagate through the tree model by consuming the available resources. Our method runs at interactive rates and supports multiple tree instances in parallel. We demonstrate the effectiveness of our approach through numerous examples and evaluate its plausibility against the combustion of real wood samples.

  8. Investigating how students communicate tree-thinking

    Science.gov (United States)

    Boyce, Carrie Jo

    Learning is often an active endeavor that requires students work at building conceptual understandings of complex topics. Personal experiences, ideas, and communication all play large roles in developing knowledge of and understanding complex topics. Sometimes these experiences can promote formation of scientifically inaccurate or incomplete ideas. Representations are tools used to help individuals understand complex topics. In biology, one way that educators help people understand evolutionary histories of organisms is by using representations called phylogenetic trees. In order to understand phylogenetics trees, individuals need to understand the conventions associated with phylogenies. My dissertation, supported by the Tree-Thinking Representational Competence and Word Association frameworks, is a mixed-methods study investigating the changes in students' tree-reading, representational competence and mental association of phylogenetic terminology after participation in varied instruction. Participants included 128 introductory biology majors from a mid-sized southern research university. Participants were enrolled in either Introductory Biology I, where they were not taught phylogenetics, or Introductory Biology II, where they were explicitly taught phylogenetics. I collected data using a pre- and post-assessment consisting of a word association task and tree-thinking diagnostic (n=128). Additionally, I recruited a subset of students from both courses (n=37) to complete a computer simulation designed to teach students about phylogenetic trees. I then conducted semi-structured interviews consisting of a word association exercise with card sort task, a retrospective pre-assessment discussion, a post-assessment discussion, and interview questions. I found that students who received explicit lecture instruction had a significantly higher increase in scores on a tree-thinking diagnostic than students who did not receive lecture instruction. Students who received both

  9. Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters

    NARCIS (Netherlands)

    Iersel, van L.J.J.; Kelk, S.M.; Rupp, R.; Huson, D.H.

    2010-01-01

    Phylogenetic trees are widely used to display estimates of how groups of species are evolved. Each phylogenetic tree can be seen as a collection of clusters, subgroups of the species that evolved from a common ancestor. When phylogenetic trees are obtained for several datasets (e.g. for different

  10. A program to compute the soft Robinson-Foulds distance between phylogenetic networks.

    Science.gov (United States)

    Lu, Bingxin; Zhang, Louxin; Leong, Hon Wai

    2017-03-14

    Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete. A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks. Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.

  11. DendroPy: a Python library for phylogenetic computing.

    Science.gov (United States)

    Sukumaran, Jeet; Holder, Mark T

    2010-06-15

    DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy).

  12. Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

    Science.gov (United States)

    2013-01-01

    Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as

  13. The Fair Proportion is a Shapley Value on phylogenetic networks too

    OpenAIRE

    Coronado, Tomás M.; Riera, Gabriel; Rosselló, Francesc

    2018-01-01

    The Fair Proportion of a species in a phylogenetic tree is a very simple measure that has been used to assess its value relative to the overall phylogenetic diversity represented by the tree. It has recently been proved by Fuchs and Jin to be equal to the Shapley Value of the coallitional game that sends each subset of species to its rooted Phylogenetic Diversity in the tree. We prove in this paper that this result extends to the natural translations of the Fair Proportion and the rooted Phyl...

  14. Wood nitrogen concentrations in tropical trees: phylogenetic patterns and ecological correlates.

    Science.gov (United States)

    Martin, Adam R; Erickson, David L; Kress, W John; Thomas, Sean C

    2014-11-01

    In tropical and temperate trees, wood chemical traits are hypothesized to covary with species' life-history strategy along a 'wood economics spectrum' (WES), but evidence supporting these expected patterns remains scarce. Due to its role in nutrient storage, we hypothesize that wood nitrogen (N) concentration will covary along the WES, being higher in slow-growing species with high wood density (WD), and lower in fast-growing species with low WD. In order to test this hypothesis we quantified wood N concentrations in 59 Panamanian hardwood species, and used this dataset to examine ecological correlates and phylogenetic patterns of wood N. Wood N varied > 14-fold among species between 0.04 and 0.59%; closely related species were more similar in wood N than expected by chance. Wood N was positively correlated with WD, and negatively correlated with log-transformed relative growth rates, although these relationships were relatively weak. We found evidence for co-evolution between wood N and both WD and log-transformed mortality rates. Our study provides evidence that wood N covaries with tree life-history parameters, and that these patterns consistently co-evolve in tropical hardwoods. These results provide some support for the hypothesized WES, and suggest that wood is an increasingly important N pool through tropical forest succession. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.

  15. A roadmap for global synthesis of the plant tree of life.

    Science.gov (United States)

    Eiserhardt, Wolf L; Antonelli, Alexandre; Bennett, Dominic J; Botigué, Laura R; Burleigh, J Gordon; Dodsworth, Steven; Enquist, Brian J; Forest, Félix; Kim, Jan T; Kozlov, Alexey M; Leitch, Ilia J; Maitner, Brian S; Mirarab, Siavash; Piel, William H; Pérez-Escobar, Oscar A; Pokorny, Lisa; Rahbek, Carsten; Sandel, Brody; Smith, Stephen A; Stamatakis, Alexandros; Vos, Rutger A; Warnow, Tandy; Baker, William J

    2018-03-01

    Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis. © 2018 Botanical Society of America.

  16. Ant-Based Phylogenetic Reconstruction (ABPR: A new distance algorithm for phylogenetic estimation based on ant colony optimization

    Directory of Open Access Journals (Sweden)

    Karla Vittori

    2008-12-01

    Full Text Available We propose a new distance algorithm for phylogenetic estimation based on Ant Colony Optimization (ACO, named Ant-Based Phylogenetic Reconstruction (ABPR. ABPR joins two taxa iteratively based on evolutionary distance among sequences, while also accounting for the quality of the phylogenetic tree built according to the total length of the tree. Similar to optimization algorithms for phylogenetic estimation, the algorithm allows exploration of a larger set of nearly optimal solutions. We applied the algorithm to four empirical data sets of mitochondrial DNA ranging from 12 to 186 sequences, and from 898 to 16,608 base pairs, and covering taxonomic levels from populations to orders. We show that ABPR performs better than the commonly used Neighbor-Joining algorithm, except when sequences are too closely related (e.g., population-level sequences. The phylogenetic relationships recovered at and above species level by ABPR agree with conventional views. However, like other algorithms of phylogenetic estimation, the proposed algorithm failed to recover expected relationships when distances are too similar or when rates of evolution are very variable, leading to the problem of long-branch attraction. ABPR, as well as other ACO-based algorithms, is emerging as a fast and accurate alternative method of phylogenetic estimation for large data sets.

  17. A reconstruction problem for a class of phylogenetic networks with lateral gene transfers.

    Science.gov (United States)

    Cardona, Gabriel; Pons, Joan Carles; Rosselló, Francesc

    2015-01-01

    Lateral, or Horizontal, Gene Transfers are a type of asymmetric evolutionary events where genetic material is transferred from one species to another. In this paper we consider LGT networks, a general model of phylogenetic networks with lateral gene transfers which consist, roughly, of a principal rooted tree with its leaves labelled on a set of taxa, and a set of extra secondary arcs between nodes in this tree representing lateral gene transfers. An LGT network gives rise in a natural way to a principal phylogenetic subtree and a set of secondary phylogenetic subtrees, which, roughly, represent, respectively, the main line of evolution of most genes and the secondary lines of evolution through lateral gene transfers. We introduce a set of simple conditions on an LGT network that guarantee that its principal and secondary phylogenetic subtrees are pairwise different and that these subtrees determine, up to isomorphism, the LGT network. We then give an algorithm that, given a set of pairwise different phylogenetic trees [Formula: see text] on the same set of taxa, outputs, when it exists, the LGT network that satisfies these conditions and such that its principal phylogenetic tree is [Formula: see text] and its secondary phylogenetic trees are [Formula: see text].

  18. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    Science.gov (United States)

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  19. Phylogenetic classification of bony fishes.

    Science.gov (United States)

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution

  20. Phylogenetic rooting using minimal ancestor deviation.

    Science.gov (United States)

    Tria, Fernando Domingues Kümmel; Landan, Giddy; Dagan, Tal

    2017-06-19

    Ancestor-descendent relations play a cardinal role in evolutionary theory. Those relations are determined by rooting phylogenetic trees. Existing rooting methods are hampered by evolutionary rate heterogeneity or the unavailability of auxiliary phylogenetic information. Here we present a rooting approach, the minimal ancestor deviation (MAD) method, which accommodates heterotachy by using all pairwise topological and metric information in unrooted trees. We demonstrate the performance of the method, in comparison to existing rooting methods, by the analysis of phylogenies from eukaryotes and prokaryotes. MAD correctly recovers the known root of eukaryotes and uncovers evidence for the origin of cyanobacteria in the ocean. MAD is more robust and consistent than existing methods, provides measures of the root inference quality and is applicable to any tree with branch lengths.

  1. Synthesis of phylogeny and taxonomy into a comprehensive tree of life

    Science.gov (United States)

    Hinchliff, Cody E.; Smith, Stephen A.; Allman, James F.; Burleigh, J. Gordon; Chaudhary, Ruchi; Coghill, Lyndon M.; Crandall, Keith A.; Deng, Jiabin; Drew, Bryan T.; Gazis, Romina; Gude, Karl; Hibbett, David S.; Katz, Laura A.; Laughinghouse, H. Dail; McTavish, Emily Jane; Midford, Peter E.; Owen, Christopher L.; Ree, Richard H.; Rees, Jonathan A.; Soltis, Douglas E.; Williams, Tiffani; Cranston, Karen A.

    2015-01-01

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966

  2. A Practical Algorithm for Reconstructing Level-1 Phylogenetic Networks

    NARCIS (Netherlands)

    K.T. Huber; L.J.J. van Iersel (Leo); S.M. Kelk (Steven); R. Suchecki

    2010-01-01

    htmlabstractRecently much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks - a type of

  3. A practical algorithm for reconstructing level-1 phylogenetic networks

    NARCIS (Netherlands)

    Huber, K.T.; Iersel, van L.J.J.; Kelk, S.M.; Suchecki, R.

    2011-01-01

    Recently, much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here, we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks-a type of network

  4. Robustness of ancestral sequence reconstruction to phylogenetic uncertainty.

    Science.gov (United States)

    Hanson-Smith, Victor; Kolaczkowski, Bryan; Thornton, Joseph W

    2010-09-01

    Ancestral sequence reconstruction (ASR) is widely used to formulate and test hypotheses about the sequences, functions, and structures of ancient genes. Ancestral sequences are usually inferred from an alignment of extant sequences using a maximum likelihood (ML) phylogenetic algorithm, which calculates the most likely ancestral sequence assuming a probabilistic model of sequence evolution and a specific phylogeny--typically the tree with the ML. The true phylogeny is seldom known with certainty, however. ML methods ignore this uncertainty, whereas Bayesian methods incorporate it by integrating the likelihood of each ancestral state over a distribution of possible trees. It is not known whether Bayesian approaches to phylogenetic uncertainty improve the accuracy of inferred ancestral sequences. Here, we use simulation-based experiments under both simplified and empirically derived conditions to compare the accuracy of ASR carried out using ML and Bayesian approaches. We show that incorporating phylogenetic uncertainty by integrating over topologies very rarely changes the inferred ancestral state and does not improve the accuracy of the reconstructed ancestral sequence. Ancestral state reconstructions are robust to uncertainty about the underlying tree because the conditions that produce phylogenetic uncertainty also make the ancestral state identical across plausible trees; conversely, the conditions under which different phylogenies yield different inferred ancestral states produce little or no ambiguity about the true phylogeny. Our results suggest that ML can produce accurate ASRs, even in the face of phylogenetic uncertainty. Using Bayesian integration to incorporate this uncertainty is neither necessary nor beneficial.

  5. Species trees from consensus single nucleotide polymorphism (SNP) data: Testing phylogenetic approaches with simulated and empirical data.

    Science.gov (United States)

    Schmidt-Lebuhn, Alexander N; Aitken, Nicola C; Chuah, Aaron

    2017-11-01

    Datasets of hundreds or thousands of SNPs (Single Nucleotide Polymorphisms) from multiple individuals per species are increasingly used to study population structure, species delimitation and shallow phylogenetics. The principal software tool to infer species or population trees from SNP data is currently the BEAST template SNAPP which uses a Bayesian coalescent analysis. However, it is computationally extremely demanding and tolerates only small amounts of missing data. We used simulated and empirical SNPs from plants (Australian Craspedia, Asteraceae, and Pelargonium, Geraniaceae) to compare species trees produced (1) by SNAPP, (2) using SVD quartets, and (3) using Bayesian and parsimony analysis with several different approaches to summarising data from multiple samples into one set of traits per species. Our aims were to explore the impact of tree topology and missing data on the results, and to test which data summarising and analyses approaches would best approximate the results obtained from SNAPP for empirical data. SVD quartets retrieved the correct topology from simulated data, as did SNAPP except in the case of a very unbalanced phylogeny. Both methods failed to retrieve the correct topology when large amounts of data were missing. Bayesian analysis of species level summary data scoring the two alleles of each SNP as independent characters and parsimony analysis of data scoring each SNP as one character produced trees with branch length distributions closest to the true trees on which SNPs were simulated. For empirical data, Bayesian inference and Dollo parsimony analysis of data scored allele-wise produced phylogenies most congruent with the results of SNAPP. In the case of study groups divergent enough for missing data to be phylogenetically informative (because of additional mutations preventing amplification of genomic fragments or bioinformatic establishment of homology), scoring of SNP data as a presence/absence matrix irrespective of allele

  6. TreeRipper web application: towards a fully automated optical tree recognition software

    Directory of Open Access Journals (Sweden)

    Hughes Joseph

    2011-05-01

    Full Text Available Abstract Background Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20th century, the worldwide web now provides faster and automated ways of transferring and sharing phylogenetic knowledge. However, novel software is needed to defrost these published phylogenies for the 21st century. Results TreeRipper is a simple website for the fully-automated recognition of multifurcating phylogenetic trees (http://linnaeus.zoology.gla.ac.uk/~jhughes/treeripper/. The program accepts a range of input image formats (PNG, JPG/JPEG or GIF. The underlying command line c++ program follows a number of cleaning steps to detect lines, remove node labels, patch-up broken lines and corners and detect line edges. The edge contour is then determined to detect the branch length, tip label positions and the topology of the tree. Optical Character Recognition (OCR is used to convert the tip labels into text with the freely available tesseract-ocr software. 32% of images meeting the prerequisites for TreeRipper were successfully recognised, the largest tree had 115 leaves. Conclusions Despite the diversity of ways phylogenies have been illustrated making the design of a fully automated tree recognition software difficult, TreeRipper is a step towards automating the digitization of past phylogenies. We also provide a dataset of 100 tree images and associated tree files for training and/or benchmarking future software. TreeRipper is an open source project licensed under the GNU General Public Licence v3.

  7. PALM: a paralleled and integrated framework for phylogenetic inference with automatic likelihood model selectors.

    Directory of Open Access Journals (Sweden)

    Shu-Hwa Chen

    Full Text Available BACKGROUND: Selecting an appropriate substitution model and deriving a tree topology for a given sequence set are essential in phylogenetic analysis. However, such time consuming, computationally intensive tasks rely on knowledge of substitution model theories and related expertise to run through all possible combinations of several separate programs. To ensure a thorough and efficient analysis and avert tedious manipulations of various programs, this work presents an intuitive framework, the phylogenetic reconstruction with automatic likelihood model selectors (PALM, with convincing, updated algorithms and a best-fit model selection mechanism for seamless phylogenetic analysis. METHODOLOGY: As an integrated framework of ClustalW, PhyML, MODELTEST, ProtTest, and several in-house programs, PALM evaluates the fitness of 56 substitution models for nucleotide sequences and 112 substitution models for protein sequences with scores in various criteria. The input for PALM can be either sequences in FASTA format or a sequence alignment file in PHYLIP format. To accelerate the computing of maximum likelihood and bootstrapping, this work integrates MPICH2/PhyML, PalmMonitor and Palm job controller across several machines with multiple processors and adopts the task parallelism approach. Moreover, an intuitive and interactive web component, PalmTree, is developed for displaying and operating the output tree with options of tree rooting, branches swapping, viewing the branch length values, and viewing bootstrapping score, as well as removing nodes to restart analysis iteratively. SIGNIFICANCE: The workflow of PALM is straightforward and coherent. Via a succinct, user-friendly interface, researchers unfamiliar with phylogenetic analysis can easily use this server to submit sequences, retrieve the output, and re-submit a job based on a previous result if some sequences are to be deleted or added for phylogenetic reconstruction. PALM results in an inference of

  8. Phylogenetic classification and the universal tree.

    Science.gov (United States)

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  9. Physiology and Genetics of Tree-Phytophage Interactions

    Science.gov (United States)

    Frances Lieutier; William J. Mattson; Michael R. Wagner

    1999-01-01

    Interactions between trees and phytophagous organisms represent an important fundamental process in the evolution of forest ecosystems. Through evolutionary time, the special traits of trees have lead the herbivore populations to differentiate and evolve in order to cope with the variability in natural resistance mechanisms of their hosts. Conversely, damage by...

  10. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  11. Nucleotide diversity and phylogenetic relationships among ...

    Indian Academy of Sciences (India)

    Navya

    2 attached at the base of tree as the diverging Iridaceae relative's lineage. Present study revealed that psbA-trnH region are useful in addressing questions of phylogenetic relationships among the Gladiolus cultivars, as these intergenic spacers are more variable and have more phylogenetically informative sites than the ...

  12. Live phylogeny with polytomies: Finding the most compact parsimonious trees.

    Science.gov (United States)

    Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G

    2017-08-01

    Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  14. Phylogenetic Analysis Using Protein Mass Spectrometry.

    Science.gov (United States)

    Ma, Shiyong; Downard, Kevin M; Wong, Jason W H

    2017-01-01

    Through advances in molecular biology, comparative analysis of DNA sequences is currently the cornerstone in the study of molecular evolution and phylogenetics. Nevertheless, protein mass spectrometry offers some unique opportunities to enable phylogenetic analyses in organisms where DNA may be difficult or costly to obtain. To date, the methods of phylogenetic analysis using protein mass spectrometry can be classified into three categories: (1) de novo protein sequencing followed by classical phylogenetic reconstruction, (2) direct phylogenetic reconstruction using proteolytic peptide mass maps, and (3) mapping of mass spectral data onto classical phylogenetic trees. In this chapter, we provide a brief description of the three methods and the protocol for each method along with relevant tools and algorithms.

  15. Phylogenetic mixtures and linear invariants for equal input models.

    Science.gov (United States)

    Casanellas, Marta; Steel, Mike

    2017-04-01

    The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).

  16. The prevalence of terraced treescapes in analyses of phylogenetic data sets.

    Science.gov (United States)

    Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J

    2018-04-04

    The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. Terraces were identified in nearly all data sets with taxon coverage densities tree. Terraces found during bootstrap resampling reduced overall support. If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

  17. Orthology prediction at scalable resolution by phylogenetic tree analysis

    NARCIS (Netherlands)

    Heijden, R.T.J.M. van der; Snel, B.; Noort, V. van; Huynen, M.A.

    2007-01-01

    BACKGROUND: Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic

  18. Phylogenetic congruence between subtropical trees and their associated fungi

    NARCIS (Netherlands)

    Liu, Xubing; Liang, Minxia; Etienne, Rampal S.; Gilbert, Gregory S; Yu, Shixiao

    2016-01-01

    Recent studies have detected phylogenetic signals in pathogen-host networks for both soil-borne and leaf-infecting fungi, suggesting that pathogenic fungi may track or coevolve with their preferred hosts. However, a phylogenetically concordant relationship between multiple hosts and multiple fungi

  19. Selection of organisms for the co-evolution-based study of protein interactions.

    Science.gov (United States)

    Herman, Dorota; Ochoa, David; Juan, David; Lopez, Daniel; Valencia, Alfonso; Pazos, Florencio

    2011-09-12

    The prediction and study of protein interactions and functional relationships based on similarity of phylogenetic trees, exemplified by the mirrortree and related methodologies, is being widely used. Although dependence between the performance of these methods and the set of organisms used to build the trees was suspected, so far nobody assessed it in an exhaustive way, and, in general, previous works used as many organisms as possible. In this work we asses the effect of using different sets of organism (chosen according with various phylogenetic criteria) on the performance of this methodology in detecting protein interactions of different nature. We show that the performance of three mirrortree-related methodologies depends on the set of organisms used for building the trees, and it is not always directly related to the number of organisms in a simple way. Certain subsets of organisms seem to be more suitable for the predictions of certain types of interactions. This relationship between type of interaction and optimal set of organism for detecting them makes sense in the light of the phylogenetic distribution of the organisms and the nature of the interactions. In order to obtain an optimal performance when predicting protein interactions, it is recommended to use different sets of organisms depending on the available computational resources and data, as well as the type of interactions of interest.

  20. Maximum Gene-Support Tree

    Directory of Open Access Journals (Sweden)

    Yunfeng Shan

    2008-01-01

    Full Text Available Genomes and genes diversify during evolution; however, it is unclear to what extent genes still retain the relationship among species. Model species for molecular phylogenetic studies include yeasts and viruses whose genomes were sequenced as well as plants that have the fossil-supported true phylogenetic trees available. In this study, we generated single gene trees of seven yeast species as well as single gene trees of nine baculovirus species using all the orthologous genes among the species compared. Homologous genes among seven known plants were used for validation of the finding. Four algorithms—maximum parsimony (MP, minimum evolution (ME, maximum likelihood (ML, and neighbor-joining (NJ—were used. Trees were reconstructed before and after weighting the DNA and protein sequence lengths among genes. Rarely a gene can always generate the “true tree” by all the four algorithms. However, the most frequent gene tree, termed “maximum gene-support tree” (MGS tree, or WMGS tree for the weighted one, in yeasts, baculoviruses, or plants was consistently found to be the “true tree” among the species. The results provide insights into the overall degree of divergence of orthologous genes of the genomes analyzed and suggest the following: 1 The true tree relationship among the species studied is still maintained by the largest group of orthologous genes; 2 There are usually more orthologous genes with higher similarities between genetically closer species than between genetically more distant ones; and 3 The maximum gene-support tree reflects the phylogenetic relationship among species in comparison.

  1. Integrating Taxonomic, Functional and Phylogenetic Beta Diversities: Interactive Effects with the Biome and Land Use across Taxa.

    Science.gov (United States)

    Corbelli, Julian Martin; Zurita, Gustavo Andres; Filloy, Julieta; Galvis, Juan Pablo; Vespa, Natalia Isabel; Bellocq, Isabel

    2015-01-01

    The spatial distribution of species, functional traits and phylogenetic relationships at both the regional and local scales provide complementary approaches to study patterns of biodiversity and help to untangle the mechanisms driving community assembly. Few studies have simultaneously considered the taxonomic (TBD), functional (FBD) and phylogenetic (PBD) facets of beta diversity. Here we analyze the associations between TBD, FBD, and PBD with the biome (representing different regional species pools) and land use, and investigate whether TBD, FBD and PBD were correlated. In the study design we considered two widely used indicator taxa (birds and ants) from two contrasting biomes (subtropical forest and grassland) and land uses (tree plantations and cropfields) in the southern Neotropics. Non-metric multidimensional scaling showed that taxonomic, functional and phylogenetic distances were associated to biome and land use; study sites grouped into four groups on the bi-dimensional space (cropfields in forest and grassland, and tree plantations in forest and grassland), and that was consistent across beta diversity facets and taxa. Mantel and PERMANOVA tests showed that TBD, FBD and PBD were positively correlated for both bird and ant assemblages; in general, partial correlations were also significant. Some of the functional traits considered here were conserved along phylogeny. Our results will contribute to the development of sound land use planning and beta diversity conservation.

  2. Integrating Taxonomic, Functional and Phylogenetic Beta Diversities: Interactive Effects with the Biome and Land Use across Taxa

    Science.gov (United States)

    Corbelli, Julian Martin; Zurita, Gustavo Andres; Filloy, Julieta; Galvis, Juan Pablo; Vespa, Natalia Isabel; Bellocq, Isabel

    2015-01-01

    The spatial distribution of species, functional traits and phylogenetic relationships at both the regional and local scales provide complementary approaches to study patterns of biodiversity and help to untangle the mechanisms driving community assembly. Few studies have simultaneously considered the taxonomic (TBD), functional (FBD) and phylogenetic (PBD) facets of beta diversity. Here we analyze the associations between TBD, FBD, and PBD with the biome (representing different regional species pools) and land use, and investigate whether TBD, FBD and PBD were correlated. In the study design we considered two widely used indicator taxa (birds and ants) from two contrasting biomes (subtropical forest and grassland) and land uses (tree plantations and cropfields) in the southern Neotropics. Non-metric multidimensional scaling showed that taxonomic, functional and phylogenetic distances were associated to biome and land use; study sites grouped into four groups on the bi-dimensional space (cropfields in forest and grassland, and tree plantations in forest and grassland), and that was consistent across beta diversity facets and taxa. Mantel and PERMANOVA tests showed that TBD, FBD and PBD were positively correlated for both bird and ant assemblages; in general, partial correlations were also significant. Some of the functional traits considered here were conserved along phylogeny. Our results will contribute to the development of sound land use planning and beta diversity conservation. PMID:25978319

  3. Integrating Taxonomic, Functional and Phylogenetic Beta Diversities: Interactive Effects with the Biome and Land Use across Taxa.

    Directory of Open Access Journals (Sweden)

    Julian Martin Corbelli

    Full Text Available The spatial distribution of species, functional traits and phylogenetic relationships at both the regional and local scales provide complementary approaches to study patterns of biodiversity and help to untangle the mechanisms driving community assembly. Few studies have simultaneously considered the taxonomic (TBD, functional (FBD and phylogenetic (PBD facets of beta diversity. Here we analyze the associations between TBD, FBD, and PBD with the biome (representing different regional species pools and land use, and investigate whether TBD, FBD and PBD were correlated. In the study design we considered two widely used indicator taxa (birds and ants from two contrasting biomes (subtropical forest and grassland and land uses (tree plantations and cropfields in the southern Neotropics. Non-metric multidimensional scaling showed that taxonomic, functional and phylogenetic distances were associated to biome and land use; study sites grouped into four groups on the bi-dimensional space (cropfields in forest and grassland, and tree plantations in forest and grassland, and that was consistent across beta diversity facets and taxa. Mantel and PERMANOVA tests showed that TBD, FBD and PBD were positively correlated for both bird and ant assemblages; in general, partial correlations were also significant. Some of the functional traits considered here were conserved along phylogeny. Our results will contribute to the development of sound land use planning and beta diversity conservation.

  4. Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics.

    Science.gov (United States)

    Grünewald, Stefan; Long, Yangjing; Wu, Yaokun

    2018-03-09

    Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.

  5. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

    Science.gov (United States)

    Chang, Jia-Ming; Di Tommaso, Paolo; Notredame, Cedric

    2014-06-01

    Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. A Consistent Phylogenetic Backbone for the Fungi

    Science.gov (United States)

    Ebersberger, Ingo; de Matos Simoes, Ricardo; Kupczok, Anne; Gube, Matthias; Kothe, Erika; Voigt, Kerstin; von Haeseler, Arndt

    2012-01-01

    The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen phylogenetic conclusions. We reason that branches (splits) recovered with independent data and different tree reconstruction methods are likely to reflect true evolutionary relationships. Two complementary phylogenomic data sets based on 99 fungal genomes and 109 fungal expressed sequence tag (EST) sets analyzed with four different tree reconstruction methods shed light from different angles on the fungal tree of life. Eleven additional data sets address specifically the phylogenetic position of Blastocladiomycota, Ustilaginomycotina, and Dothideomycetes, respectively. The combined evidence from the resulting trees supports the deep-level stability of the fungal groups toward a comprehensive natural system of the fungi. In addition, our analysis reveals methodologically interesting aspects. Enrichment for EST encoded data—a common practice in phylogenomic analyses—introduces a strong bias toward slowly evolving and functionally correlated genes. Consequently, the generalization of phylogenomic data sets as collections of randomly selected genes cannot be taken for granted. A thorough characterization of the data to assess possible influences on the tree reconstruction should therefore become a standard in phylogenomic analyses. PMID:22114356

  7. Functional & phylogenetic diversity of copepod communities

    Science.gov (United States)

    Benedetti, F.; Ayata, S. D.; Blanco-Bercial, L.; Cornils, A.; Guilhaumon, F.

    2016-02-01

    The diversity of natural communities is classically estimated through species identification (taxonomic diversity) but can also be estimated from the ecological functions performed by the species (functional diversity), or from the phylogenetic relationships among them (phylogenetic diversity). Estimating functional diversity requires the definition of specific functional traits, i.e., phenotypic characteristics that impact fitness and are relevant to ecosystem functioning. Estimating phylogenetic diversity requires the description of phylogenetic relationships, for instance by using molecular tools. In the present study, we focused on the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. First, we implemented a specific trait database for the most commonly-sampled and abundant copepod species of the Mediterranean Sea. Our database includes 191 species, described by seven traits encompassing diverse ecological functions: minimal and maximal body length, trophic group, feeding type, spawning strategy, diel vertical migration and vertical habitat. Clustering analysis in the functional trait space revealed that Mediterranean copepods can be gathered into groups that have different ecological roles. Second, we reconstructed a phylogenetic tree using the available sequences of 18S rRNA. Our tree included 154 of the analyzed Mediterranean copepod species. We used these two datasets to describe the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. The replacement component (turn-over) and the species richness difference component (nestedness) of the beta diversity indices were identified. Finally, by comparing various and complementary aspects of plankton diversity (taxonomic, functional, and phylogenetic diversity) we were able to gain a better understanding of the relationships among the zooplankton community, biodiversity, ecosystem function, and environmental forcing.

  8. A format for phylogenetic placements.

    Directory of Open Access Journals (Sweden)

    Frederick A Matsen

    Full Text Available We have developed a unified format for phylogenetic placements, that is, mappings of environmental sequence data (e.g., short reads into a phylogenetic tree. We are motivated to do so by the growing number of tools for computing and post-processing phylogenetic placements, and the lack of an established standard for storing them. The format is lightweight, versatile, extensible, and is based on the JSON format, which can be parsed by most modern programming languages. Our format is already implemented in several tools for computing and post-processing parsimony- and likelihood-based phylogenetic placements and has worked well in practice. We believe that establishing a standard format for analyzing read placements at this early stage will lead to a more efficient development of powerful and portable post-analysis tools for the growing applications of phylogenetic placement.

  9. Global patterns of amphibian phylogenetic diversity

    DEFF Research Database (Denmark)

    Fritz, Susanne; Rahbek, Carsten

    2012-01-01

    Aim  Phylogenetic diversity can provide insight into how evolutionary processes may have shaped contemporary patterns of species richness. Here, we aim to test for the influence of phylogenetic history on global patterns of amphibian species richness, and to identify areas where macroevolutionary...... processes such as diversification and dispersal have left strong signatures on contemporary species richness. Location  Global; equal-area grid cells of approximately 10,000 km2. Methods  We generated an amphibian global supertree (6111 species) and repeated analyses with the largest available molecular...... phylogeny (2792 species). We combined each tree with global species distributions to map four indices of phylogenetic diversity. To investigate congruence between global spatial patterns of amphibian species richness and phylogenetic diversity, we selected Faith’s phylogenetic diversity (PD) index...

  10. Rooting the tree of life: the phylogenetic jury is still out.

    Science.gov (United States)

    Gouy, Richard; Baurain, Denis; Philippe, Hervé

    2015-09-26

    This article aims to shed light on difficulties in rooting the tree of life (ToL) and to explore the (sociological) reasons underlying the limited interest in accurately addressing this fundamental issue. First, we briefly review the difficulties plaguing phylogenetic inference and the ways to improve the modelling of the substitution process, which is highly heterogeneous, both across sites and over time. We further observe that enriched taxon samplings, better gene samplings and clever data removal strategies have led to numerous revisions of the ToL, and that these improved shallow phylogenies nearly always relocate simple organisms higher in the ToL provided that long-branch attraction artefacts are kept at bay. Then, we note that, despite the flood of genomic data available since 2000, there has been a surprisingly low interest in inferring the root of the ToL. Furthermore, the rare studies dealing with this question were almost always based on methods dating from the 1990s that have been shown to be inaccurate for much more shallow issues! This leads us to argue that the current consensus about a bacterial root for the ToL can be traced back to the prejudice of Aristotle's Great Chain of Beings, in which simple organisms are ancestors of more complex life forms. Finally, we demonstrate that even the best models cannot yet handle the complexity of the evolutionary process encountered both at shallow depth, when the outgroup is too distant, and at the level of the inter-domain relationships. Altogether, we conclude that the commonly accepted bacterial root is still unproven and that the root of the ToL should be revisited using phylogenomic supermatrices to ensure that new evidence for eukaryogenesis, such as the recently described Lokiarcheota, is interpreted in a sound phylogenetic framework. © 2015 The Author(s).

  11. Minimum triplet covers of binary phylogenetic X-trees.

    Science.gov (United States)

    Huber, K T; Moulton, V; Steel, M

    2017-12-01

    Trees with labelled leaves and with all other vertices of degree three play an important role in systematic biology and other areas of classification. A classical combinatorial result ensures that such trees can be uniquely reconstructed from the distances between the leaves (when the edges are given any strictly positive lengths). Moreover, a linear number of these pairwise distance values suffices to determine both the tree and its edge lengths. A natural set of pairs of leaves is provided by any 'triplet cover' of the tree (based on the fact that each non-leaf vertex is the median vertex of three leaves). In this paper we describe a number of new results concerning triplet covers of minimum size. In particular, we characterize such covers in terms of an associated graph being a 2-tree. Also, we show that minimum triplet covers are 'shellable' and thereby provide a set of pairs for which the inter-leaf distance values will uniquely determine the underlying tree and its associated branch lengths.

  12. Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

    Science.gov (United States)

    Pardi, Fabio; Scornavacca, Celine

    2015-04-01

    Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

  13. Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

    Directory of Open Access Journals (Sweden)

    Fabio Pardi

    2015-04-01

    Full Text Available Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

  14. Phylogenetic relationships of the lancelets of the genus ...

    African Journals Online (AJOL)

    phylogenetic relationships of the Branchiostoma lancelets from South (Xiamen) and North (Qingdao and Rizhao) China, and phylogenetic trees constructed also included the existing data from Japanese waters. The genetic distances of the lancelets between South and North China averaged 0.19, 0.21, and 0.17 based on ...

  15. Efficient parsimony-based methods for phylogenetic network reconstruction.

    Science.gov (United States)

    Jin, Guohua; Nakhleh, Luay; Snir, Sagi; Tuller, Tamir

    2007-01-15

    Phylogenies--the evolutionary histories of groups of organisms-play a major role in representing relationships among biological entities. Although many biological processes can be effectively modeled as tree-like relationships, others, such as hybrid speciation and horizontal gene transfer (HGT), result in networks, rather than trees, of relationships. Hybrid speciation is a significant evolutionary mechanism in plants, fish and other groups of species. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Maximum parsimony is one of the most commonly used criteria for phylogenetic tree inference. Roughly speaking, inference based on this criterion seeks the tree that minimizes the amount of evolution. In 1990, Jotun Hein proposed using this criterion for inferring the evolution of sequences subject to recombination. Preliminary results on small synthetic datasets. Nakhleh et al. (2005) demonstrated the criterion's application to phylogenetic network reconstruction in general and HGT detection in particular. However, the naive algorithms used by the authors are inapplicable to large datasets due to their demanding computational requirements. Further, no rigorous theoretical analysis of computing the criterion was given, nor was it tested on biological data. In the present work we prove that the problem of scoring the parsimony of a phylogenetic network is NP-hard and provide an improved fixed parameter tractable algorithm for it. Further, we devise efficient heuristics for parsimony-based reconstruction of phylogenetic networks. We test our methods on both synthetic and biological data (rbcL gene in bacteria) and obtain very promising results.

  16. Phylogenetic relationships of Chaetomium isolates based on the ...

    African Journals Online (AJOL)

    Biotech Unit

    2013-02-27

    Feb 27, 2013 ... Phylogenetic analysis of Chaetomium species. The evolutionary history was inferred using the maximum parsimony method. The bootstrap consensus tree inferred from. 1000 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). The MP tree was obtained using.

  17. Comparative phylogenetic analysis of intergenic spacers and small ...

    African Journals Online (AJOL)

    The phylogenetic analysis of test isolates included assessment of variation in sequences and length of IGS and SSU-rRNA genes with reference to 16 different microsporidian sequences. The results proved that IGS sequences have more variation than SSU-rRNA gene sequences. Analysis of phylogenetic trees reveal that ...

  18. Mode of morphological differentiation in the Latitarsi-ground beetles (Coleoptera, Carabidae) of the world inferred from a phylogenetic tree of mitochondrial ND5 gene sequences.

    Science.gov (United States)

    Su, Zhi-Hui; Imura, Yûki; Zhou, Hong-Zhang; Okamoto, Munehiro; Osawa, Syozo

    2003-02-01

    The Latitarsi is one large division of the subtribe Carabina (subfamily Carabinae, family Carabidae), and has been considered as a discrete morphological group consisting of 17 genera. The phylogenetic relationships and evolutionary pattern of the Latitarsi ground beetles have been investigated by analyzing mitochondrial NADH dehydrogenase subunit 5 (ND5) gene sequences. The phylogenetic tree suggests that the Latitarsi members do not form a single cluster, i.e., not monophyletic and at least 16 lineages belonging to the so-called Latitarsi emerged at about the same time of the Carabina radiation together with the members of other divisions. This suggests that these lineages (A, B, C, H, L, N, O, P, Q, R, S, T, U, V, W and X in Fig. 2a) may be treated each as a phylogenetically distinct division equivalent to other divisions. The group with bootstrap value of more than 80 percent has been considered as a single lineage (division) with two exceptions, V and X. The independency of each lineage has been assumed by the traditional morphology as well as a single clustering on the trees constructed by independent methods, unchanged topology by replacement of outgroups, etc. Generally speaking, the members in a single lineage are geographically linked. Many phylogenetic lineages are composed of a single or only a few species without conspicuous morphological differentiation. In contrast to such a "silent morphological evolution", a remarkable morphological differentiation occasionally took place in several lineages.

  19. Combining Phylogenetic and Occurrence Information for Risk Assessment of Pest and Pathogen Interactions with Host Plants

    Directory of Open Access Journals (Sweden)

    Ángel L. Robles-Fernández

    2017-08-01

    Full Text Available Phytosanitary agencies conduct plant biosecurity activities, including early detection of potential introduction pathways, to improve control and eradication of pest and pathogen incursions. For such actions, analytical tools based on solid scientific knowledge regarding plant-pest or pathogen relationships for pest risk assessment are needed. Recent evidence indicating that closely related species share a higher chance of becoming infected or attacked by pests has allowed the identification of taxa with different degrees of vulnerability. Here, we use information readily available online about pest-host interactions and their geographic distributions, in combination with host phylogenetic reconstructions, to estimate a pest-host interaction (in some cases infection index in geographic space as a more comprehensive, spatially explicit tool for risk assessment. We demonstrate this protocol using phylogenetic relationships for 20 beetle species and 235 host plant genera: first, we estimate the probability of a host sharing pests, and second, we project the index in geographic space. Overall, the predictions allow identification of the pest-host interaction type (e.g., generalist or specialist, which is largely determined by both host range and phylogenetic constraints. Furthermore, the results can be valuable in terms of identifying hotspots where pests and vulnerable hosts interact. This knowledge is useful for anticipating biological invasions or spreading of disease. We suggest that our understanding of biotic interactions will improve after combining information from multiple dimensions of biodiversity at multiple scales (e.g., phylogenetic signal and host-vector-pathogen geographic distribution.

  20. Effects of phylogenetic reconstruction method on the robustness of species delimitation using single-locus data.

    Science.gov (United States)

    Tang, Cuong Q; Humphreys, Aelys M; Fontaneto, Diego; Barraclough, Timothy G; Paradis, Emmanuel

    2014-10-01

    Coalescent-based species delimitation methods combine population genetic and phylogenetic theory to provide an objective means for delineating evolutionarily significant units of diversity. The generalised mixed Yule coalescent (GMYC) and the Poisson tree process (PTP) are methods that use ultrametric (GMYC or PTP) or non-ultrametric (PTP) gene trees as input, intended for use mostly with single-locus data such as DNA barcodes. Here, we assess how robust the GMYC and PTP are to different phylogenetic reconstruction and branch smoothing methods. We reconstruct over 400 ultrametric trees using up to 30 different combinations of phylogenetic and smoothing methods and perform over 2000 separate species delimitation analyses across 16 empirical data sets. We then assess how variable diversity estimates are, in terms of richness and identity, with respect to species delimitation, phylogenetic and smoothing methods. The PTP method generally generates diversity estimates that are more robust to different phylogenetic methods. The GMYC is more sensitive, but provides consistent estimates for BEAST trees. The lower consistency of GMYC estimates is likely a result of differences among gene trees introduced by the smoothing step. Unresolved nodes (real anomalies or methodological artefacts) affect both GMYC and PTP estimates, but have a greater effect on GMYC estimates. Branch smoothing is a difficult step and perhaps an underappreciated source of bias that may be widespread among studies of diversity and diversification. Nevertheless, careful choice of phylogenetic method does produce equivalent PTP and GMYC diversity estimates. We recommend simultaneous use of the PTP model with any model-based gene tree (e.g. RAxML) and GMYC approaches with BEAST trees for obtaining species hypotheses.

  1. On the distribution of interspecies correlation for Markov models of character evolution on Yule trees.

    Science.gov (United States)

    Mulder, Willem H; Crawford, Forrest W

    2015-01-07

    Efforts to reconstruct phylogenetic trees and understand evolutionary processes depend fundamentally on stochastic models of speciation and mutation. The simplest continuous-time model for speciation in phylogenetic trees is the Yule process, in which new species are "born" from existing lineages at a constant rate. Recent work has illuminated some of the structural properties of Yule trees, but it remains mostly unknown how these properties affect sequence and trait patterns observed at the tips of the phylogenetic tree. Understanding the interplay between speciation and mutation under simple models of evolution is essential for deriving valid phylogenetic inference methods and gives insight into the optimal design of phylogenetic studies. In this work, we derive the probability distribution of interspecies covariance under Brownian motion and Ornstein-Uhlenbeck models of phenotypic change on a Yule tree. We compute the probability distribution of the number of mutations shared between two randomly chosen taxa in a Yule tree under discrete Markov mutation models. Our results suggest summary measures of phylogenetic information content, illuminate the correlation between site patterns in sequences or traits of related organisms, and provide heuristics for experimental design and reconstruction of phylogenetic trees. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. PhyDesign: an online application for profiling phylogenetic informativeness

    Directory of Open Access Journals (Sweden)

    Townsend Jeffrey P

    2011-05-01

    Full Text Available Abstract Background The rapid increase in number of sequenced genomes for species across of the tree of life is revealing a diverse suite of orthologous genes that could potentially be employed to inform molecular phylogenetic studies that encompass broader taxonomic sampling. Optimal usage of this diversity of loci requires user-friendly tools to facilitate widespread cost-effective locus prioritization for phylogenetic sampling. The Townsend (2007 phylogenetic informativeness provides a unique empirical metric for guiding marker selection. However, no software or automated methodology to evaluate sequence alignments and estimate the phylogenetic informativeness metric has been available. Results Here, we present PhyDesign, a platform-independent online application that implements the Townsend (2007 phylogenetic informativeness analysis, providing a quantitative prediction of the utility of loci to solve specific phylogenetic questions. An easy-to-use interface facilitates uploading of alignments and ultrametric trees to calculate and depict profiles of informativeness over specified time ranges, and provides rankings of locus prioritization for epochs of interest. Conclusions By providing these profiles, PhyDesign facilitates locus prioritization increasing the efficiency of sequencing for phylogenetic purposes compared to traditional studies with more laborious and low capacity screening methods, as well as increasing the accuracy of phylogenetic studies. Together with a manual and sample files, the application is freely accessible at http://phydesign.townsend.yale.edu.

  3. Conus pennaceus : a phylogenetic analysis of the Mozambican ...

    African Journals Online (AJOL)

    The genus Conus has over 500 species and is the most species-rich taxon of marine invertebrates. Based on mitochondrial DNA, this study focuses on the phylogenetics of Conus, particularly the pennaceus complex collected along the Mozambican coast. Phylogenetic trees based on both the 16S and the 12S ribosomal ...

  4. Tetrapods on the EDGE: Overcoming data limitations to identify phylogenetic conservation priorities

    Science.gov (United States)

    Gray, Claudia L.; Wearn, Oliver R.; Owen, Nisha R.

    2018-01-01

    The scale of the ongoing biodiversity crisis requires both effective conservation prioritisation and urgent action. As extinction is non-random across the tree of life, it is important to prioritise threatened species which represent large amounts of evolutionary history. The EDGE metric prioritises species based on their Evolutionary Distinctiveness (ED), which measures the relative contribution of a species to the total evolutionary history of their taxonomic group, and Global Endangerment (GE), or extinction risk. EDGE prioritisations rely on adequate phylogenetic and extinction risk data to generate meaningful priorities for conservation. However, comprehensive phylogenetic trees of large taxonomic groups are extremely rare and, even when available, become quickly out-of-date due to the rapid rate of species descriptions and taxonomic revisions. Thus, it is important that conservationists can use the available data to incorporate evolutionary history into conservation prioritisation. We compared published and new methods to estimate missing ED scores for species absent from a phylogenetic tree whilst simultaneously correcting the ED scores of their close taxonomic relatives. We found that following artificial removal of species from a phylogenetic tree, the new method provided the closest estimates of their “true” ED score, differing from the true ED score by an average of less than 1%, compared to the 31% and 38% difference of the previous methods. The previous methods also substantially under- and over-estimated scores as more species were artificially removed from a phylogenetic tree. We therefore used the new method to estimate ED scores for all tetrapods. From these scores we updated EDGE prioritisation rankings for all tetrapod species with IUCN Red List assessments, including the first EDGE prioritisation for reptiles. Further, we identified criteria to identify robust priority species in an effort to further inform conservation action whilst

  5. A support vector machine based test for incongruence between sets of trees in tree space

    Science.gov (United States)

    2012-01-01

    Background The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. Results Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. Conclusions The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The

  6. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    Science.gov (United States)

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  7. Plant DNA barcodes and assessment of phylogenetic community structure of a tropical mixed dipterocarp forest in Brunei Darussalam (Borneo)

    Science.gov (United States)

    Abu Salim, Kamariah; Chase, Mark W.; Dexter, Kyle G.; Pennington, R. Toby; Tan, Sylvester; Kaye, Maria Ellen; Samuel, Rosabelle

    2017-01-01

    DNA barcoding is a fast and reliable tool to assess and monitor biodiversity and, via community phylogenetics, to investigate ecological and evolutionary processes that may be responsible for the community structure of forests. In this study, DNA barcodes for the two widely used plastid coding regions rbcL and matK are used to contribute to identification of morphologically undetermined individuals, as well as to investigate phylogenetic structure of tree communities in 70 subplots (10 × 10m) of a 25-ha forest-dynamics plot in Brunei (Borneo, Southeast Asia). The combined matrix (rbcL + matK) comprised 555 haplotypes (from ≥154 genera, 68 families and 25 orders sensu APG, Angiosperm Phylogeny Group, 2016), making a substantial contribution to tree barcode sequences from Southeast Asia. Barcode sequences were used to reconstruct phylogenetic relationships using maximum likelihood, both with and without constraining the topology of taxonomic orders to match that proposed by the Angiosperm Phylogeny Group. A third phylogenetic tree was reconstructed using the program Phylomatic to investigate the influence of phylogenetic resolution on results. Detection of non-random patterns of community assembly was determined by net relatedness index (NRI) and nearest taxon index (NTI). In most cases, community assembly was either random or phylogenetically clustered, which likely indicates the importance to community structure of habitat filtering based on phylogenetically correlated traits in determining community structure. Different phylogenetic trees gave similar overall results, but the Phylomatic tree produced greater variation across plots for NRI and NTI values, presumably due to noise introduced by using an unresolved phylogenetic tree. Our results suggest that using a DNA barcode tree has benefits over the traditionally used Phylomatic approach by increasing precision and accuracy and allowing the incorporation of taxonomically unidentified individuals into analyses

  8. Plant DNA barcodes and assessment of phylogenetic community structure of a tropical mixed dipterocarp forest in Brunei Darussalam (Borneo.

    Directory of Open Access Journals (Sweden)

    Jacqueline Heckenhauer

    Full Text Available DNA barcoding is a fast and reliable tool to assess and monitor biodiversity and, via community phylogenetics, to investigate ecological and evolutionary processes that may be responsible for the community structure of forests. In this study, DNA barcodes for the two widely used plastid coding regions rbcL and matK are used to contribute to identification of morphologically undetermined individuals, as well as to investigate phylogenetic structure of tree communities in 70 subplots (10 × 10m of a 25-ha forest-dynamics plot in Brunei (Borneo, Southeast Asia. The combined matrix (rbcL + matK comprised 555 haplotypes (from ≥154 genera, 68 families and 25 orders sensu APG, Angiosperm Phylogeny Group, 2016, making a substantial contribution to tree barcode sequences from Southeast Asia. Barcode sequences were used to reconstruct phylogenetic relationships using maximum likelihood, both with and without constraining the topology of taxonomic orders to match that proposed by the Angiosperm Phylogeny Group. A third phylogenetic tree was reconstructed using the program Phylomatic to investigate the influence of phylogenetic resolution on results. Detection of non-random patterns of community assembly was determined by net relatedness index (NRI and nearest taxon index (NTI. In most cases, community assembly was either random or phylogenetically clustered, which likely indicates the importance to community structure of habitat filtering based on phylogenetically correlated traits in determining community structure. Different phylogenetic trees gave similar overall results, but the Phylomatic tree produced greater variation across plots for NRI and NTI values, presumably due to noise introduced by using an unresolved phylogenetic tree. Our results suggest that using a DNA barcode tree has benefits over the traditionally used Phylomatic approach by increasing precision and accuracy and allowing the incorporation of taxonomically unidentified individuals

  9. snpTree - a web-server to identify and construct SNP trees from whole genome sequence data

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Kaas, Rolf Sommer; Thomsen, Martin Christen Frølund

    2012-01-01

    identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed...... to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic...... skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Results Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can...

  10. Phylogenetics of neotropical Platymiscium (Leguminosae

    DEFF Research Database (Denmark)

    Saslis-Lagoudakis, C. Haris; Chase, Mark W; Robinson, Daniel N

    2008-01-01

    Platymiscium is a neotropical legume genus of forest trees in the Pterocarpus clade of the pantropical "dalbergioid" clade. It comprises 19 species (29 taxa), distributed from Mexico to southern Brazil. This study presents a molecular phylogenetic analysis of Platymiscium and allies inferred from...

  11. Phylogenetic prediction of Alternaria leaf blight resistance in wild and cultivated species of carrots (Daucus, Apiaceae)

    Science.gov (United States)

    Plant scientists make inferences and predictions from phylogenetic trees to solve scientific problems. Crop losses due to disease damage is an important problem that many plant breeders would like to solve, so the ability to predict traits like disease resistance from phylogenetic trees derived from...

  12. Multispecies coalescent analysis of the early diversification of neotropical primates: phylogenetic inference under strong gene trees/species tree conflict.

    Science.gov (United States)

    Schrago, Carlos G; Menezes, Albert N; Furtado, Carolina; Bonvicino, Cibele R; Seuanez, Hector N

    2014-11-05

    Neotropical primates (NP) are presently distributed in the New World from Mexico to northern Argentina, comprising three large families, Cebidae, Atelidae, and Pitheciidae, consequently to their diversification following their separation from Old World anthropoids near the Eocene/Oligocene boundary, some 40 Ma. The evolution of NP has been intensively investigated in the last decade by studies focusing on their phylogeny and timescale. However, despite major efforts, the phylogenetic relationship between these three major clades and the age of their last common ancestor are still controversial because these inferences were based on limited numbers of loci and dating analyses that did not consider the evolutionary variation associated with the distribution of gene trees within the proposed phylogenies. We show, by multispecies coalescent analyses of selected genome segments, spanning along 92,496,904 bp that the early diversification of extant NP was marked by a 2-fold increase of their effective population size and that Atelids and Cebids are more closely related respective to Pitheciids. The molecular phylogeny of NP has been difficult to solve because of population-level phenomena at the early evolution of the lineage. The association of evolutionary variation with the distribution of gene trees within proposed phylogenies is crucial for distinguishing the mean genetic divergence between species (the mean coalescent time between loci) from speciation time. This approach, based on extensive genomic data provided by new generation DNA sequencing, provides more accurate reconstructions of phylogenies and timescales for all organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. On defining a unique phylogenetic tree with homoplastic characters.

    Science.gov (United States)

    Goloboff, Pablo A; Wilkinson, Mark

    2018-05-01

    This paper discusses the problem of whether creating a matrix with all the character state combinations that have a fixed number of steps (or extra steps) on a given tree T, produces the same tree T when analyzed with maximum parsimony or maximum likelihood. Exhaustive enumeration of cases up to 20 taxa for binary characters, and up to 12 taxa for 4-state characters, shows that the same tree is recovered (as unique most likely or most parsimonious tree) as long as the number of extra steps is within 1/4 of the number of taxa. This dependence, 1/4 of the number of taxa, is discussed with a general argumentation, in terms of the spread of the character changes on the tree used to select character state distributions. The present finding allows creating matrices which have as much homoplasy as possible for the most parsimonious or likely tree to be predictable, and examination of these matrices with hill-climbing search algorithms provides additional evidence on the (lack of a) necessary relationship between homoplasy and the ability of search methods to find optimal trees. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. A taxonomic and phylogenetic re-appraisal of the genus Curvularia

    Science.gov (United States)

    Species of Curvularia are important plant and human pathogens worldwide. In this study, the genus Curvularia is re-assessed based on molecular phylogenetic analysis and morphological observations of available isolates and specimens. A multi-gene phylogenetic tree inferred from ITS, TEF and GPDH gene...

  15. A phylogenetic blueprint for a modern whale.

    Science.gov (United States)

    Gatesy, John; Geisler, Jonathan H; Chang, Joseph; Buell, Carl; Berta, Annalisa; Meredith, Robert W; Springer, Mark S; McGowen, Michael R

    2013-02-01

    The emergence of Cetacea in the Paleogene represents one of the most profound macroevolutionary transitions within Mammalia. The move from a terrestrial habitat to a committed aquatic lifestyle engendered wholesale changes in anatomy, physiology, and behavior. The results of this remarkable transformation are extant whales that include the largest, biggest brained, fastest swimming, loudest, deepest diving mammals, some of which can detect prey with a sophisticated echolocation system (Odontoceti - toothed whales), and others that batch feed using racks of baleen (Mysticeti - baleen whales). A broad-scale reconstruction of the evolutionary remodeling that culminated in extant cetaceans has not yet been based on integration of genomic and paleontological information. Here, we first place Cetacea relative to extant mammalian diversity, and assess the distribution of support among molecular datasets for relationships within Artiodactyla (even-toed ungulates, including Cetacea). We then merge trees derived from three large concatenations of molecular and fossil data to yield a composite hypothesis that encompasses many critical events in the evolutionary history of Cetacea. By combining diverse evidence, we infer a phylogenetic blueprint that outlines the stepwise evolutionary development of modern whales. This hypothesis represents a starting point for more detailed, comprehensive phylogenetic reconstructions in the future, and also highlights the synergistic interaction between modern (genomic) and traditional (morphological+paleontological) approaches that ultimately must be exploited to provide a rich understanding of evolutionary history across the entire tree of Life. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. STRIDE: Species Tree Root Inference from Gene Duplication Events.

    Science.gov (United States)

    Emms, David M; Kelly, Steven

    2017-12-01

    The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Selective logging in tropical forests decreases the robustness of liana-tree interaction networks to the loss of host tree species.

    Science.gov (United States)

    Magrach, Ainhoa; Senior, Rebecca A; Rogers, Andrew; Nurdin, Deddy; Benedick, Suzan; Laurance, William F; Santamaria, Luis; Edwards, David P

    2016-03-16

    Selective logging is one of the major drivers of tropical forest degradation, causing important shifts in species composition. Whether such changes modify interactions between species and the networks in which they are embedded remain fundamental questions to assess the 'health' and ecosystem functionality of logged forests. We focus on interactions between lianas and their tree hosts within primary and selectively logged forests in the biodiversity hotspot of Malaysian Borneo. We found that lianas were more abundant, had higher species richness, and different species compositions in logged than in primary forests. Logged forests showed heavier liana loads disparately affecting slow-growing tree species, which could exacerbate the loss of timber value and carbon storage already associated with logging. Moreover, simulation scenarios of host tree local species loss indicated that logging might decrease the robustness of liana-tree interaction networks if heavily infested trees (i.e. the most connected ones) were more likely to disappear. This effect is partially mitigated in the short term by the colonization of host trees by a greater diversity of liana species within logged forests, yet this might not compensate for the loss of preferred tree hosts in the long term. As a consequence, species interaction networks may show a lagged response to disturbance, which may trigger sudden collapses in species richness and ecosystem function in response to additional disturbances, representing a new type of 'extinction debt'. © 2016 The Author(s).

  18. Selective logging in tropical forests decreases the robustness of liana–tree interaction networks to the loss of host tree species

    Science.gov (United States)

    Magrach, Ainhoa; Senior, Rebecca A.; Rogers, Andrew; Nurdin, Deddy; Benedick, Suzan; Laurance, William F.; Santamaria, Luis; Edwards, David P.

    2016-01-01

    Selective logging is one of the major drivers of tropical forest degradation, causing important shifts in species composition. Whether such changes modify interactions between species and the networks in which they are embedded remain fundamental questions to assess the ‘health’ and ecosystem functionality of logged forests. We focus on interactions between lianas and their tree hosts within primary and selectively logged forests in the biodiversity hotspot of Malaysian Borneo. We found that lianas were more abundant, had higher species richness, and different species compositions in logged than in primary forests. Logged forests showed heavier liana loads disparately affecting slow-growing tree species, which could exacerbate the loss of timber value and carbon storage already associated with logging. Moreover, simulation scenarios of host tree local species loss indicated that logging might decrease the robustness of liana–tree interaction networks if heavily infested trees (i.e. the most connected ones) were more likely to disappear. This effect is partially mitigated in the short term by the colonization of host trees by a greater diversity of liana species within logged forests, yet this might not compensate for the loss of preferred tree hosts in the long term. As a consequence, species interaction networks may show a lagged response to disturbance, which may trigger sudden collapses in species richness and ecosystem function in response to additional disturbances, representing a new type of ‘extinction debt’. PMID:26936241

  19. Phylogenetic reconstruction methods: an overview.

    Science.gov (United States)

    De Bruyn, Alexandre; Martin, Darren P; Lefeuvre, Pierre

    2014-01-01

    Initially designed to infer evolutionary relationships based on morphological and physiological characters, phylogenetic reconstruction methods have greatly benefited from recent developments in molecular biology and sequencing technologies with a number of powerful methods having been developed specifically to infer phylogenies from macromolecular data. This chapter, while presenting an overview of basic concepts and methods used in phylogenetic reconstruction, is primarily intended as a simplified step-by-step guide to the construction of phylogenetic trees from nucleotide sequences using fairly up-to-date maximum likelihood methods implemented in freely available computer programs. While the analysis of chloroplast sequences from various Vanilla species is used as an illustrative example, the techniques covered here are relevant to the comparative analysis of homologous sequences datasets sampled from any group of organisms.

  20. Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

    Science.gov (United States)

    Yang, Ziheng; Zhu, Tianqi

    2018-02-20

    The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

  1. Speeding Up Neighbour-Joining Tree Construction

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Fagerberg, Rolf; Mailund, Thomas

    A widely used method for constructing phylogenetic trees is the neighbour-joining method of Saitou and Nei. We develope heuristics for speeding up the neighbour-joining method which generate the same phylogenetic trees as the original method. All heuristics are based on using a quad-tree to guide...... the search for the next pair of nodes to join, but di#er in the information stored in quad-tree nodes, the way the search is performed, and in the way the quad-tree is updated after a join. We empirically evaluate the performance of the heuristics on distance matrices obtained from the Pfam collection...... of alignments, and compare the running time with that of the QuickTree tool, a well-known and widely used implementation of the standard neighbour-joining method. The results show that the presented heuristics can give a significant speed-up over the standard neighbour-joining method, already for medium sized...

  2. The individual and interactive effects of tree-tree establishment competition and fire on savanna structure and dynamics

    OpenAIRE

    Calabrese, Justin; Vázquez, Federico; López, Cristóbal; San Miguel, Maxi; Grimm, Volker

    2010-01-01

    The mechanisms regulating savanna tree populations are still not well understood. Recent empirical work suggests that both tree-tree competition and fire are key factors in semi-arid to mesic savannas. However, the potential for competition to structure savannas, particularly in interaction with fire, has received little theoretical attention. We develop a minimalistic and analytically tractable stochastic cellular automaton to study the individual and combined effects of competition and fire...

  3. Hal: an automated pipeline for phylogenetic analyses of genomic data.

    Science.gov (United States)

    Robbertse, Barbara; Yoder, Ryan J; Boyd, Alex; Reeves, John; Spatafora, Joseph W

    2011-02-07

    The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.

  4. A phylogenetic study of SPBP and RAI1: evolutionary conservation of chromatin binding modules.

    Directory of Open Access Journals (Sweden)

    Sagar Darvekar

    Full Text Available Our genome is assembled into and array of highly dynamic nucleosome structures allowing spatial and temporal access to DNA. The nucleosomes are subject to a wide array of post-translational modifications, altering the DNA-histone interaction and serving as docking sites for proteins exhibiting effector or "reader" modules. The nuclear proteins SPBP and RAI1 are composed of several putative "reader" modules which may have ability to recognise a set of histone modification marks. Here we have performed a phylogenetic study of their putative reader modules, the C-terminal ePHD/ADD like domain, a novel nucleosome binding region and an AT-hook motif. Interactions studies in vitro and in yeast cells suggested that despite the extraordinary long loop region in their ePHD/ADD-like chromatin binding domains, the C-terminal region of both proteins seem to adopt a cross-braced topology of zinc finger interactions similar to other structurally determined ePHD/ADD structures. Both their ePHD/ADD-like domain and their novel nucleosome binding domain are highly conserved in vertebrate evolution, and construction of a phylogenetic tree displayed two well supported clusters representing SPBP and RAI1, respectively. Their genome and domain organisation suggest that SPBP and RAI1 have occurred from a gene duplication event. The phylogenetic tree suggests that this duplication has happened early in vertebrate evolution, since only one gene was identified in insects and lancelet. Finally, experimental data confirm that the conserved novel nucleosome binding region of RAI1 has the ability to bind the nucleosome core and histones. However, an adjacent conserved AT-hook motif as identified in SPBP is not present in RAI1, and deletion of the novel nucleosome binding region of RAI1 did not significantly affect its nuclear localisation.

  5. On Unrooted and Root-Uncertain Variants of Several Well-Known Phylogenetic Network Problems

    NARCIS (Netherlands)

    van Iersel, L.J.J.; Kelk, Steven; Stougie, Leen; Boes, Olivier

    2017-01-01

    The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic

  6. Phylogenetic analysis of local-scale tree soil associations in a lowland moist tropical forest.

    Directory of Open Access Journals (Sweden)

    Laura A Schreeg

    Full Text Available BACKGROUND: Local plant-soil associations are commonly studied at the species-level, while associations at the level of nodes within a phylogeny have been less well explored. Understanding associations within a phylogenetic context, however, can improve our ability to make predictions across systems and can advance our understanding of the role of evolutionary history in structuring communities. METHODOLOGY/PRINCIPAL FINDINGS: Here we quantified evolutionary signal in plant-soil associations using a DNA sequence-based community phylogeny and several soil variables (e.g., extractable phosphorus, aluminum and manganese, pH, and slope as a proxy for soil water. We used published plant distributional data from the 50-ha plot on Barro Colorado Island (BCI, Republic of Panamá. Our results suggest some groups of closely related species do share similar soil associations. Most notably, the node shared by Myrtaceae and Vochysiaceae was associated with high levels of aluminum, a potentially toxic element. The node shared by Apocynaceae was associated with high extractable phosphorus, a nutrient that could be limiting on a taxon specific level. The node shared by the large group of Laurales and Magnoliales was associated with both low extractable phosphorus and with steeper slope. Despite significant node-specific associations, this study detected little to no phylogeny-wide signal. We consider the majority of the 'traits' (i.e., soil variables evaluated to fall within the category of ecological traits. We suggest that, given this category of traits, phylogeny-wide signal might not be expected while node-specific signals can still indicate phylogenetic structure with respect to the variable of interest. CONCLUSIONS: Within the BCI forest dynamics plot, distributions of some plant taxa are associated with local-scale differences in soil variables when evaluated at individual nodes within the phylogenetic tree, but they are not detectable by phylogeny

  7. Mirroring co-evolving trees in the light of their topologies.

    Science.gov (United States)

    Hajirasouliha, Iman; Schönhuth, Alexander; de Juan, David; Valencia, Alfonso; Sahinalp, S Cenk

    2012-05-01

    Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm

  8. Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations.

    Science.gov (United States)

    Kobert, K; Stamatakis, A; Flouri, T

    2017-03-01

    The phylogenetic likelihood function (PLF) is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection, and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory savings attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 12-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the PLF currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation. [Algorithms; maximum likelihood; phylogenetic likelihood function; phylogenetics]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  9. Integrative taxonomy of ciliates: Assessment of molecular phylogenetic content and morphological homology testing.

    Science.gov (United States)

    Vďačný, Peter

    2017-10-01

    The very diverse and comparatively complex morphology of ciliates has given rise to numerous taxonomic concepts. However, the information content of the utilized molecular markers has seldom been explored prior to phylogenetic analyses and taxonomic decisions. Likewise, robust testing of morphological homology statements and the apomorphic nature of diagnostic characters of ciliate taxa is rarely carried out. Four phylogenetic techniques that may help address these issues are reviewed. (1) Split spectrum analysis serves to determine the exact number and quality of nucleotide positions supporting individual nodes in phylogenetic trees and to discern long-branch artifacts that cause spurious phylogenies. (2) Network analysis can depict all possible evolutionary trajectories inferable from the dataset and locate and measure the conflict between them. (3) A priori likelihood mapping tests the suitability of data for reconstruction of a well resolved tree, visualizes the tree-likeness of quartets, and assesses the support of an internal branch of a given tree topology. (4) Reconstruction of ancestral morphologies can be applied for analyzing homology and apomorphy statements without circular reasoning. Since these phylogenetic tools are rarely used, their principles and interpretation are introduced and exemplified using various groups of ciliates. Finally, environmental sequencing data are discussed in this light. Copyright © 2017 The Author. Published by Elsevier GmbH.. All rights reserved.

  10. Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots

    Directory of Open Access Journals (Sweden)

    Matsen Frederick A

    2012-05-01

    Full Text Available Abstract Background Although taxonomy is often used informally to evaluate the results of phylogenetic inference and the root of phylogenetic trees, algorithmic methods to do so are lacking. Results In this paper we formalize these procedures and develop algorithms to solve the relevant problems. In particular, we introduce a new algorithm that solves a "subcoloring" problem to express the difference between a taxonomy and a phylogeny at a given rank. This algorithm improves upon the current best algorithm in terms of asymptotic complexity for the parameter regime of interest; we also describe a branch-and-bound algorithm that saves orders of magnitude in computation on real data sets. We also develop a formalism and an algorithm for rooting phylogenetic trees according to a taxonomy. Conclusions The algorithms in this paper, and the associated freely-available software, will help biologists better use and understand taxonomically labeled phylogenetic trees.

  11. The phylogenetic likelihood library.

    Science.gov (United States)

    Flouri, T; Izquierdo-Carrasco, F; Darriba, D; Aberer, A J; Nguyen, L-T; Minh, B Q; Von Haeseler, A; Stamatakis, A

    2015-03-01

    We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2-10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL). © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  12. QuickJoin—Fast Neighbour-Joining Tree Reconstruction

    DEFF Research Database (Denmark)

    Mailund; Pedersen, Christian N. Storm

    2004-01-01

    We have built a tool for fast construction of very large phylogenetic trees. The tool uses heuristics for speeding up the neighbour-joining algorithm—while still constructing the same tree as the original neighbour-joining algorithm—making it possible to construct trees for ~8000 species in less...

  13. [Phylogenetic analysis of closely related Leuconostoc citreum species based on partial housekeeping genes].

    Science.gov (United States)

    Lv, Qiang; Chen, Ming; Xu, Haiyan; Song, Yuqin; Sun, Zhihong; Dan, Tong; Sun, Tiansong

    2013-07-04

    Using the 16S rRNA, dnaA, murC and pyrG gene sequences, we identified the phylogenetic relationship among closely related Leuconostoc citreum species. Seven Leu. citreum strains originally isolated from sourdough were characterized by PCR methods to amplify the dnaA, murC and pyrG gene sequences, which were determined to assess the suitability as phylogenetic markers. Then, we estimated the genetic distance and constructed the phylogenetic trees including 16S rRNA and above mentioned three housekeeping genes combining with published corresponding sequences. By comparing the phylogenetic trees, the topology of three housekeeping genes trees were consistent with that of 16S rRNA gene. The homology of closely related Leu. citreum species among dnaA, murC, pyrG and 16S rRNA gene sequences were different, ranged from75.5% to 97.2%, 50.2% to 99.7%, 65.0% to 99.8% and 98.5% 100%, respectively. The phylogenetic relationship of three housekeeping genes sequences were highly consistent with the results of 16S rRNA gene sequence, while the genetic distance of these housekeeping genes were extremely high than 16S rRNA gene. Consequently, the dnaA, murC and pyrG gene are suitable for classification and identification closely related Leu. citreum species.

  14. An attempt to reconstruct phylogenetic relationships within Caribbean nummulitids: simulating relationships and tracing character evolution

    Science.gov (United States)

    Eder, Wolfgang; Ives Torres-Silva, Ana; Hohenegger, Johann

    2017-04-01

    Phylogenetic analysis and trees based on molecular data are broadly applied and used to infer genetical and biogeographic relationship in recent larger foraminifera. Molecular phylogenetic is intensively used within recent nummulitids, however for fossil representatives these trees are only of minor informational value. Hence, within paleontological studies a phylogenetic approach through morphometric analysis is of much higher value. To tackle phylogenetic relationships within the nummulitid family, a much higher number of morphological character must be measured than are commonly used in biometric studies, where mostly parameters describing embryonic size (e.g., proloculus diameter, deuteroloculus diameter) and/or the marginal spiral (e.g., spiral diagrams, spiral indices) are studied. For this purpose 11 growth-independent and/or growth-invariant characters have been used to describe the morphological variability of equatorial thin sections of seven Carribbean nummulitid taxa (Nummulites striatoreticulatus, N. macgillavry, Palaeonummulites willcoxi, P.floridensis, P. soldadensis, P.trinitatensis and P.ocalanus) and one outgroup taxon (Ranikothalia bermudezi). Using these characters, phylogenetic trees were calculated using a restricted maximum likelihood algorithm (REML), and results are cross-checked by ordination and cluster analysis. Square-change parsimony method has been run to reconstruct ancestral states, as well as to simulate the evolution of the chosen characters along the calculated phylogenetic tree and, independent - contrast analysis was used to estimate confidence intervals. Based on these simulations, phylogenetic tendencies of certain characters proposed for nummulitids (e.g., Cope's rule or nepionic acceleration) can be tested, whether these tendencies are valid for the whole family or only for certain clades. At least, within the Carribean nummulitids, phylogenetic trends along some growth-independent characters of the embryo (e.g., first

  15. A revised root for the human Y chromosomal phylogenetic tree: the origin of patrilineal diversity in Africa.

    Science.gov (United States)

    Cruciani, Fulvio; Trombetta, Beniamino; Massaia, Andrea; Destro-Bisol, Giovanni; Sellitto, Daniele; Scozzari, Rosaria

    2011-06-10

    To shed light on the structure of the basal backbone of the human Y chromosome phylogeny, we sequenced about 200 kb of the male-specific region of the human Y chromosome (MSY) from each of seven Y chromosomes belonging to clades A1, A2, A3, and BT. We detected 146 biallelic variant sites through this analysis. We used these variants to construct a patrilineal tree, without taking into account any previously reported information regarding the phylogenetic relationships among the seven Y chromosomes here analyzed. There are several key changes at the basal nodes as compared with the most recent reference Y chromosome tree. A different position of the root was determined, with important implications for the origin of human Y chromosome diversity. An estimate of 142 KY was obtained for the coalescence time of the revised MSY tree, which is earlier than that obtained in previous studies and easier to reconcile with plausible scenarios of modern human origin. The number of deep branchings leading to African-specific clades has doubled, further strengthening the MSY-based evidence for a modern human origin in the African continent. An analysis of 2204 African DNA samples showed that the deepest clades of the revised MSY phylogeny are currently found in central and northwest Africa, opening new perspectives on early human presence in the continent. Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  16. Variation across mitochondrial gene trees provides evidence for systematic error: How much gene tree variation is biological?

    Science.gov (United States)

    Richards, Emilie J; Brown, Jeremy M; Barley, Anthony J; Chong, Rebecca A; Thomson, Robert C

    2018-02-19

    The use of large genomic datasets in phylogenetics has highlighted extensive topological variation across genes. Much of this discordance is assumed to result from biological processes. However, variation among gene trees can also be a consequence of systematic error driven by poor model fit, and the relative importance of biological versus methodological factors in explaining gene tree variation is a major unresolved question. Using mitochondrial genomes to control for biological causes of gene tree variation, we estimate the extent of gene tree discordance driven by systematic error and employ posterior prediction to highlight the role of model fit in producing this discordance. We find that the amount of discordance among mitochondrial gene trees is similar to the amount of discordance found in other studies that assume only biological causes of variation. This similarity suggests that the role of systematic error in generating gene tree variation is underappreciated and critical evaluation of fit between assumed models and the data used for inference is important for the resolution of unresolved phylogenetic questions.

  17. Study on competitive interaction models in Cayley tree

    International Nuclear Information System (INIS)

    Moreira, J.G.M.A.

    1987-12-01

    We propose two kinds of models in the Cayley tree to simulate Ising models with axial anisotropy in the cubic lattice. The interaction in the direction of the anisotropy is simulated by the interaction along the branches of the tree. The interaction in the planes perpendicular to the anisotropy direction, in the first model, is simulated by interactions between spins in neighbour branches of the same generation arising from same site of the previous generation. In the second model, the simulation of the interaction in the planes are produced by mean field interactions among all spins in sites of the same generation arising from the same site of the previous generations. We study these models in the limit of infinite coordination number. First, we analyse a situation with antiferromagnetic interactions along the branches between first neighbours only, and we find the analogous of a metamagnetic Ising model. In the following, we introduce competitive interactions between first and second neighbours along the branches, to simulate the ANNNI model. We obtain one equation of differences which relates the magnetization of one generation with the magnetization of the two previous generations, to permit a detailed study of the modulated phase region. We note that the wave number of the modulation, for one fixed temperature, changes with the competition parameter to form a devil's staircase with a fractal dimension which increases with the temperature. We discuss the existence of strange atractors, related to a possible caothic phase. Finally, we show the obtained results when we consider interactions along the branches with three neighbours. (author)

  18. Phylogenetic Paleoecology: Tree-Thinking and Ecology in Deep Time.

    Science.gov (United States)

    Lamsdell, James C; Congreve, Curtis R; Hopkins, Melanie J; Krug, Andrew Z; Patzkowsky, Mark E

    2017-06-01

    The new and emerging field of phylogenetic paleoecology leverages the evolutionary relationships among species to explain temporal and spatial changes in species diversity, abundance, and distribution in deep time. This field is poised for rapid progress as knowledge of the evolutionary relationships among fossil species continues to expand. In particular, this approach will lend new insights to many of the longstanding questions in evolutionary biology, such as: the relationships among character change, ecology, and evolutionary rates; the processes that determine the evolutionary relationships among species within communities and along environmental gradients; and the phylogenetic signal underlying ecological selectivity in background and mass extinctions and in major evolutionary radiations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not.

    Science.gov (United States)

    Hedge, Jessica; Wilson, Daniel J

    2014-11-25

    Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem. Copyright © 2014 Hedge and Wilson.

  20. Comparison of sequence-based and structure-based phylogenetic ...

    Indian Academy of Sciences (India)

    Prakash

    phylogenetic tree construction methods, has been considered as an equivalent of .... Further detailed analysis described is restricted to the first two groups only. ..... Aspartate-ammonia ligase. Plant virus ..... enzymatic activities?; Trends ...

  1. ColorTree: a batch customization tool for phylogenic trees.

    Science.gov (United States)

    Chen, Wei-Hua; Lercher, Martin J

    2009-07-31

    Genome sequencing projects and comparative genomics studies typically aim to trace the evolutionary history of large gene sets, often requiring human inspection of hundreds of phylogenetic trees. If trees are checked for compatibility with an explicit null hypothesis (e.g., the monophyly of certain groups), this daunting task is greatly facilitated by an appropriate coloring scheme. In this note, we introduce ColorTree, a simple yet powerful batch customization tool for phylogenic trees. Based on pattern matching rules, ColorTree applies a set of customizations to an input tree file, e.g., coloring labels or branches. The customized trees are saved to an output file, which can then be viewed and further edited by Dendroscope (a freely available tree viewer). ColorTree runs on any Perl installation as a stand-alone command line tool, and its application can thus be easily automated. This way, hundreds of phylogenic trees can be customized for easy visual inspection in a matter of minutes. ColorTree allows efficient and flexible visual customization of large tree sets through the application of a user-supplied configuration file to multiple tree files.

  2. Fast Structural Search in Phylogenetic Databases

    Directory of Open Access Journals (Sweden)

    William H. Piel

    2005-01-01

    Full Text Available As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P . The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising

  3. The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction

    Science.gov (United States)

    Weisburg, W. G.; Giovannoni, S. J.; Woese, C. R.

    1989-01-01

    Through comparative analysis of 16S ribosomal RNA sequences, it can be shown that two seemingly dissimilar types of eubacteria Deinococcus and the ubiquitous hot spring organism Thermus are distantly but specifically related to one another. This confirms an earlier report based upon 16S rRNA oligonucleotide cataloging studies (Hensel et al., 1986). Their two lineages form a distinctive grouping within the eubacteria that deserved the taxonomic status of a phylum. The (partial) sequence of T. aquaticus rRNA appears relatively close to those of other thermophilic eubacteria. e.g. Thermotoga maritima and Thermomicrobium roseum. However, this closeness does not reflect a true evolutionary closeness; rather it is due to a "thermophilic convergence", the result of unusually high G+C composition in the rRNAs of thermophilic bacteria. Unless such compositional biases are taken into account, the branching order and root of phylogenetic trees can be incorrectly inferred.

  4. Open Reading Frame Phylogenetic Analysis on the Cloud

    Directory of Open Access Journals (Sweden)

    Che-Lun Hung

    2013-01-01

    Full Text Available Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus.

  5. Supermatrix and species tree methods resolve phylogenetic relationships within the big cats, Panthera (Carnivora: Felidae).

    Science.gov (United States)

    Davis, Brian W; Li, Gang; Murphy, William J

    2010-07-01

    The pantherine lineage of cats diverged from the remainder of modern Felidae less than 11 million years ago and consists of the five big cats of the genus Panthera, the lion, tiger, jaguar, leopard, and snow leopard, as well as the closely related clouded leopard. A significant problem exists with respect to the precise phylogeny of these highly threatened great cats. Despite multiple publications on the subject, no two molecular studies have reconstructed Panthera with the same topology. These evolutionary relationships remain unresolved partially due to the recent and rapid radiation of pantherines in the Pliocene, individual speciation events occurring within less than 1 million years, and probable introgression between lineages following their divergence. We provide an alternative, highly supported interpretation of the evolutionary history of the pantherine lineage using novel and published DNA sequence data from the autosomes, both sex chromosomes and the mitochondrial genome. New sequences were generated for 39 single-copy regions of the felid Y chromosome, as well as four mitochondrial and four autosomal gene segments, totaling 28.7 kb. Phylogenetic analysis of these new data, combined with all published data in GenBank, highlighted the prevalence of phylogenetic disparities stemming either from the amplification of a mitochondrial to nuclear translocation event (numt), or errors in species identification. Our 47.6 kb combined dataset was analyzed as a supermatrix and with respect to individual partitions using maximum likelihood and Bayesian phylogenetic inference, in conjunction with Bayesian Estimation of Species Trees (BEST) which accounts for heterogeneous gene histories. Our results yield a robust consensus topology supporting the monophyly of lion and leopard, with jaguar sister to these species, as well as a sister species relationship of tiger and snow leopard. These results highlight new avenues for the study of speciation genomics and

  6. Influence of matrix type on tree community assemblages along tropical dry forest edges.

    Science.gov (United States)

    Benítez-Malvido, Julieta; Gallardo-Vásquez, Julio César; Alvarez-Añorve, Mariana Y; Avila-Cabadilla, Luis Daniel

    2014-05-01

    • Anthropogenic habitat edges have strong negative consequences for the functioning of tropical ecosystems. However, edge effects on tropical dry forest tree communities have been barely documented.• In Chamela, Mexico, we investigated the phylogenetic composition and structure of tree assemblages (≥5 cm dbh) along edges abutting different matrices: (1) disturbed vegetation with cattle, (2) pastures with cattle and, (3) pastures without cattle. Additionally, we sampled preserved forest interiors.• All edge types exhibited similar tree density, basal area and diversity to interior forests, but differed in species composition. A nonmetric multidimensional scaling ordination showed that the presence of cattle influenced species composition more strongly than the vegetation structure of the matrix; tree assemblages abutting matrices with cattle had lower scores in the ordination. The phylogenetic composition of tree assemblages followed the same pattern. The principal plant families and genera were associated according to disturbance regimes as follows: pastures and disturbed vegetation (1) with cattle and (2) without cattle, and (3) pastures without cattle and interior forests. All habitats showed random phylogenetic structures, suggesting that tree communities are assembled mainly by stochastic processes. Long-lived species persisting after edge creation could have important implications in the phylogenetic structure of tree assemblages.• Edge creation exerts a stronger influence on TDF vegetation pathways than previously documented, leading to new ecological communities. Phylogenetic analysis may, however, be needed to detect such changes. © 2014 Botanical Society of America, Inc.

  7. STBase: one million species trees for comparative biology.

    Science.gov (United States)

    McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed

  8. STBase: one million species trees for comparative biology.

    Directory of Open Access Journals (Sweden)

    Michelle M McMahon

    Full Text Available Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies

  9. Morphological characterization and phylogenetic distance among ...

    African Journals Online (AJOL)

    The genetic diversity was calculated with Nei and Li's index, and the phylogenetic tree (dendrogram) was generated with a neighbor-joining program. The dendrogram indicates the diversity of the genotypes, which are grouped into three distinctive large groups. The largest group includes species from the Mediolobivia and ...

  10. BioNames: linking taxonomy, texts, and trees

    Directory of Open Access Journals (Sweden)

    Roderic D.M. Page

    2013-10-01

    Full Text Available BioNames is a web database of taxonomic names for animals, linked to the primary literature and, wherever possible, to phylogenetic trees. It aims to provide a taxonomic “dashboard” where at a glance we can see a summary of the taxonomic and phylogenetic information we have for a given taxon and hence provide a quick answer to the basic question “what is this taxon?” BioNames combines classifications from the Global Biodiversity Information Facility (GBIF and GenBank, images from the Encyclopedia of Life (EOL, animal names from the Index of Organism Names (ION, and bibliographic data from multiple sources including the Biodiversity Heritage Library (BHL and CrossRef. The user interface includes display of full text articles, interactive timelines of taxonomic publications, and zoomable phylogenies. It is available at http://bionames.org.

  11. Incorporating information on predicted solvent accessibility to the co-evolution-based study of protein interactions.

    Science.gov (United States)

    Ochoa, David; García-Gutiérrez, Ponciano; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2013-01-27

    A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.

  12. Finiteness results for Abelian tree models

    NARCIS (Netherlands)

    Draisma, J.; Eggermont, R.H.

    2015-01-01

    Equivariant tree models are statistical models used in the reconstruction of phylogenetic trees from genetic data. Here equivariant refers to a symmetry group imposed on the root distribution and on the transition matrices in the model. We prove that if that symmetry group is Abelian, then the

  13. Finiteness results for Abelian tree models

    NARCIS (Netherlands)

    Draisma, J.; Eggermont, R.H.

    2012-01-01

    Equivariant tree models are statistical models used in the reconstruction of phylogenetic trees from genetic data. Here equivariant refers to a symmetry group imposed on the root distribution and on the transition matrices in the model. We prove that if that symmetry group is Abelian, then the

  14. Multiple alignment analysis on phylogenetic tree of the spread of SARS epidemic using distance method

    Science.gov (United States)

    Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.

    2017-09-01

    Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.

  15. Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference.

    Science.gov (United States)

    Santander-Jiménez, Sergio; Vega-Rodríguez, Miguel A

    2013-10-01

    The development of increasingly popular multiobjective metaheuristics has allowed bioinformaticians to deal with optimization problems in computational biology where multiple objective functions must be taken into account. One of the most relevant research topics that can benefit from these techniques is phylogenetic inference. Throughout the years, different researchers have proposed their own view about the reconstruction of ancestral evolutionary relationships among species. As a result, biologists often report different phylogenetic trees from a same dataset when considering distinct optimality principles. In this work, we detail a multiobjective swarm intelligence approach based on the novel Artificial Bee Colony algorithm for inferring phylogenies. The aim of this paper is to propose a complementary view of phylogenetics according to the maximum parsimony and maximum likelihood criteria, in order to generate a set of phylogenetic trees that represent a compromise between these principles. Experimental results on a variety of nucleotide data sets and statistical studies highlight the relevance of the proposal with regard to other multiobjective algorithms and state-of-the-art biological methods. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  16. Do orthologous gene phylogenies really support tree-thinking?

    Directory of Open Access Journals (Sweden)

    Leigh J

    2005-05-01

    Full Text Available Abstract Background Since Darwin's Origin of Species, reconstructing the Tree of Life has been a goal of evolutionists, and tree-thinking has become a major concept of evolutionary biology. Practically, building the Tree of Life has proven to be tedious. Too few morphological characters are useful for conducting conclusive phylogenetic analyses at the highest taxonomic level. Consequently, molecular sequences (genes, proteins, and genomes likely constitute the only useful characters for constructing a phylogeny of all life. For this reason, tree-makers expect a lot from gene comparisons. The simultaneous study of the largest number of molecular markers possible is sometimes considered to be one of the best solutions in reconstructing the genealogy of organisms. This conclusion is a direct consequence of tree-thinking: if gene inheritance conforms to a tree-like model of evolution, sampling more of these molecules will provide enough phylogenetic signal to build the Tree of Life. The selection of congruent markers is thus a fundamental step in simultaneous analysis of many genes. Results Heat map analyses were used to investigate the congruence of orthologues in four datasets (archaeal, bacterial, eukaryotic and alpha-proteobacterial. We conclude that we simply cannot determine if a large portion of the genes have a common history. In addition, none of these datasets can be considered free of lateral gene transfer. Conclusion Our phylogenetic analyses do not support tree-thinking. These results have important conceptual and practical implications. We argue that representations other than a tree should be investigated in this case because a non-critical concatenation of markers could be highly misleading.

  17. A nuclear phylogenetic analysis: SNPs, indels and SSRs deliver new insights into the relationships in the 'true citrus fruit trees' group (Citrinae, Rutaceae) and the origin of cultivated species.

    Science.gov (United States)

    Garcia-Lor, Andres; Curk, Franck; Snoussi-Trifa, Hager; Morillon, Raphael; Ancillo, Gema; Luro, François; Navarro, Luis; Ollitrault, Patrick

    2013-01-01

    Despite differences in morphology, the genera representing 'true citrus fruit trees' are sexually compatible, and their phylogenetic relationships remain unclear. Most of the important commercial 'species' of Citrus are believed to be of interspecific origin. By studying polymorphisms of 27 nuclear genes, the average molecular differentiation between species was estimated and some phylogenetic relationships between 'true citrus fruit trees' were clarified. Sanger sequencing of PCR-amplified fragments from 18 genes involved in metabolite biosynthesis pathways and nine putative genes for salt tolerance was performed for 45 genotypes of Citrus and relatives of Citrus to mine single nucleotide polymorphisms (SNPs) and indel polymorphisms. Fifty nuclear simple sequence repeats (SSRs) were also analysed. A total of 16 238 kb of DNA was sequenced for each genotype, and 1097 single nucleotide polymorphisms (SNPs) and 50 indels were identified. These polymorphisms were more valuable than SSRs for inter-taxon differentiation. Nuclear phylogenetic analysis revealed that Citrus reticulata and Fortunella form a cluster that is differentiated from the clade that includes three other basic taxa of cultivated citrus (C. maxima, C. medica and C. micrantha). These results confirm the taxonomic subdivision between the subgenera Metacitrus and Archicitrus. A few genes displayed positive selection patterns within or between species, but most of them displayed neutral patterns. The phylogenetic inheritance patterns of the analysed genes were inferred for commercial Citrus spp. Numerous molecular polymorphisms (SNPs and indels), which are potentially useful for the analysis of interspecific genetic structures, have been identified. The nuclear phylogenetic network for Citrus and its sexually compatible relatives was consistent with the geographical origins of these genera. The positive selection observed for a few genes will help further works to analyse the molecular basis of the

  18. Improved phylogenetic analyses corroborate a plausible position of Martialis heureka in the ant tree of life.

    Directory of Open Access Journals (Sweden)

    Patrick Kück

    Full Text Available Martialinae are pale, eyeless and probably hypogaeic predatory ants. Morphological character sets suggest a close relationship to the ant subfamily Leptanillinae. Recent analyses based on molecular sequence data suggest that Martialinae are the sister group to all extant ants. However, by comparing molecular studies and different reconstruction methods, the position of Martialinae remains ambiguous. While this sister group relationship was well supported by Bayesian partitioned analyses, Maximum Likelihood approaches could not unequivocally resolve the position of Martialinae. By re-analysing a previous published molecular data set, we show that the Maximum Likelihood approach is highly appropriate to resolve deep ant relationships, especially between Leptanillinae, Martialinae and the remaining ant subfamilies. Based on improved alignments, alignment masking, and tree reconstructions with a sufficient number of bootstrap replicates, our results strongly reject a placement of Martialinae at the first split within the ant tree of life. Instead, we suggest that Leptanillinae are a sister group to all other extant ant subfamilies, whereas Martialinae branch off as a second lineage. This assumption is backed by approximately unbiased (AU tests, additional Bayesian analyses and split networks. Our results demonstrate clear effects of improved alignment approaches, alignment masking and data partitioning. We hope that our study illustrates the importance of thorough, comprehensible phylogenetic analyses using the example of ant relationships.

  19. Incorporating phylogenetic information for the definition of floristic districts in hyperdiverse Amazon forests: Implications for conservation.

    Science.gov (United States)

    Guevara Andino, Juan Ernesto; Pitman, Nigel C A; Ter Steege, Hans; Mogollón, Hugo; Ceron, Carlos; Palacios, Walter; Oleas, Nora; Fine, Paul V A

    2017-11-01

    Using complementary metrics to evaluate phylogenetic diversity can facilitate the delimitation of floristic units and conservation priority areas. In this study, we describe the spatial patterns of phylogenetic alpha and beta diversity, phylogenetic endemism, and evolutionary distinctiveness of the hyperdiverse Ecuador Amazon forests and define priority areas for conservation. We established a network of 62 one-hectare plots in terra firme forests of Ecuadorian Amazon. In these plots, we tagged, collected, and identified every single adult tree with dbh ≥10 cm. These data were combined with a regional community phylogenetic tree to calculate different phylogenetic diversity (PD) metrics in order to create spatial models. We used Loess regression to estimate the spatial variation of taxonomic and phylogenetic beta diversity as well as phylogenetic endemism and evolutionary distinctiveness. We found evidence for the definition of three floristic districts in the Ecuadorian Amazon, supported by both taxonomic and phylogenetic diversity data. Areas with high levels of phylogenetic endemism and evolutionary distinctiveness in Ecuadorian Amazon forests are unprotected. Furthermore, these areas are severely threatened by proposed plans of oil and mining extraction at large scales and should be prioritized in conservation planning for this region.

  20. Finiteness results for Abelian tree models

    NARCIS (Netherlands)

    Draisma, J.; Eggermont, R.H.

    2015-01-01

    Equivariant tree models are statistical models used in the reconstruction of phylogenetic trees from genetic data. Here equivariant§ refers to a symmetry group imposed on the root distribution and on the transition matrices in the model. We prove that if that symmetry group is Abelian, then the

  1. Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes.

    Science.gov (United States)

    Wolf, P

    1997-10-01

    Inferring basal relationships among vascular plants poses a major challenge to plant systematists. The divergence events that describe these relationships occurred long ago and considerable homoplasy has since accrued for both molecular and morphological characters. A potential solution is to examine phylogenetic analyses from multiple data sets. Here I present a new source of phylogenetic data for ferns and other pteridophytes. I sequenced the chloroplast gene atpB from 23 pteridophyte taxa and used maximum parsimony to infer relationships. A 588-bp region of the gene appeared to contain a statistically significant amount of phylogenetic signal and the resulting trees were largely congruent with similar analyses of nucleotide sequences from rbcL. However, a combined analysis of atpB plus rbcL produced a better resolved tree than did either data set alone. In the shortest trees, leptosporangiate ferns formed a monophyletic group. Also, I detected a well-supported clade of Psilotaceae (Psilotum and Tmesipteris) plus Ophioglossaceae (Ophioglossum and Botrychium). The demonstrated utility of atpB suggests that sequences from this gene should play a role in phylogenetic analyses that incorporate data from chloroplast genes, nuclear genes, morphology, and fossil data.

  2. The Role of the Phylogenetic Diversity Measure, PD, in Bio-informatics: Getting the Definition Right

    Directory of Open Access Journals (Sweden)

    Daniel P. Faith

    2006-01-01

    Full Text Available A recent paper in this journal (Faith and Baker, 2006 described bio-informatics challenges in the application of the PD (phylogenetic diversity measure of Faith (1992a, and highlighted the use of the root of the phylogenetic tree, as implied by the original definition of PD. A response paper (Crozier et al. 2006 stated that 1 the (Faith, 1992a PD definition did not include the use of the root of the tree, and 2 Moritz and Faith (1998 changed the PD definition to include the root. Both characterizations are here refuted. Examples from Faith (1992a,b document the link from the definition to the use of the root of the overall tree, and a survey of papers over the past 15 years by Faith and colleagues demonstrate that the stated PD definition has remained the same as that in the original 1992 study. PD’s estimation of biodiversity at the level of “feature diversity” is seen to have provided the original rationale for the measure’s consideration of the root of the phylogenetic tree.

  3. Tree mortality from drought, insects, and their interactions in a changing climate

    Science.gov (United States)

    Anderegg, William R.L.; Hicke, Jeffrey A.; Fisher, Rosie A.; Allen, Craig D.; Aukema, Juliann E.; Bentz, Barbara; Hood, Sharon; Lichstein, Jeremy W.; Macalady, Alison K.; McDowell, Nate G.; Pan, Yude; Raffa, Kenneth; Sala, Anna; Shaw, John D.; Stephenson, Nathan L.; Tague, Christina L.; Zeppel, Melanie

    2015-01-01

    Climate change is expected to drive increased tree mortality through drought, heat stress, and insect attacks, with manifold impacts on forest ecosystems. Yet, climate-induced tree mortality and biotic disturbance agents are largely absent from process-based ecosystem models. Using data sets from the western USA and associated studies, we present a framework for determining the relative contribution of drought stress, insect attack, and their interactions, which is critical for modeling mortality in future climates. We outline a simple approach that identifies the mechanisms associated with two guilds of insects – bark beetles and defoliators – which are responsible for substantial tree mortality. We then discuss cross-biome patterns of insect-driven tree mortality and draw upon available evidence contrasting the prevalence of insect outbreaks in temperate and tropical regions. We conclude with an overview of tools and promising avenues to address major challenges. Ultimately, a multitrophic approach that captures tree physiology, insect populations, and tree–insect interactions will better inform projections of forest ecosystem responses to climate change.

  4. Temperature, precipitation and biotic interactions as determinants of tree seedling recruitment across the tree line ecotone.

    Science.gov (United States)

    Tingstad, Lise; Olsen, Siri Lie; Klanderud, Kari; Vandvik, Vigdis; Ohlson, Mikael

    2015-10-01

    Seedling recruitment is a critical life history stage for trees, and successful recruitment is tightly linked to both abiotic factors and biotic interactions. In order to better understand how tree species' distributions may change in response to anticipated climate change, more knowledge of the effects of complex climate and biotic interactions is needed. We conducted a seed-sowing experiment to investigate how temperature, precipitation and biotic interactions impact recruitment of Scots pine (Pinus sylvestris) and Norway spruce (Picea abies) seedlings in southern Norway. Seeds were sown into intact vegetation and experimentally created gaps. To study the combined effects of temperature and precipitation, the experiment was replicated across 12 sites, spanning a natural climate gradient from boreal to alpine and from sub-continental to oceanic. Seedling emergence and survival were assessed 12 and 16 months after sowing, respectively, and above-ground biomass and height were determined at the end of the experiment. Interestingly, very few seedlings were detected in the boreal sites, and the highest number of seedlings emerged and established in the alpine sites, indicating that low temperature did not limit seedling recruitment. Site precipitation had an overall positive effect on seedling recruitment, especially at intermediate precipitation levels. Seedling emergence, establishment and biomass were higher in gap plots compared to intact vegetation at all temperature levels. These results suggest that biotic interactions in the form of competition may be more important than temperature as a limiting factor for tree seedling recruitment in the sub- and low-alpine zone of southern Norway.

  5. Detection of Horizontal Gene Transfers from Phylogenetic Comparisons

    Science.gov (United States)

    Pylro, Victor Satler; Vespoli, Luciano de Souza; Duarte, Gabriela Frois; Yotoko, Karla Suemy Clemente

    2012-01-01

    Bacterial phylogenies have become one of the most important challenges for microbial ecology. This field started in the mid-1970s with the aim of using the sequence of the small subunit ribosomal RNA (16S) tool to infer bacterial phylogenies. Phylogenetic hypotheses based on other sequences usually give conflicting topologies that reveal different evolutionary histories, which in some cases may be the result of horizontal gene transfer events. Currently, one of the major goals of molecular biology is to understand the role that horizontal gene transfer plays in species adaptation and evolution. In this work, we compared the phylogenetic tree based on 16S with the tree based on dszC, a gene involved in the cleavage of carbon-sulfur bonds. Bacteria of several genera perform this survival task when living in environments lacking free mineral sulfur. The biochemical pathway of the desulphurization process was extensively studied due to its economic importance, since this step is expensive and indispensable in fuel production. Our results clearly show that horizontal gene transfer events could be detected using common phylogenetic methods with gene sequences obtained from public sequence databases. PMID:22675653

  6. Path integral formulation and Feynman rules for phylogenetic branching models

    Energy Technology Data Exchange (ETDEWEB)

    Jarvis, P D; Bashford, J D; Sumner, J G [School of Mathematics and Physics, University of Tasmania, GPO Box 252C, 7001 Hobart, TAS (Australia)

    2005-11-04

    A dynamical picture of phylogenetic evolution is given in terms of Markov models on a state space, comprising joint probability distributions for character types of taxonomic classes. Phylogenetic branching is a process which augments the number of taxa under consideration, and hence the rank of the underlying joint probability state tensor. We point out the combinatorial necessity for a second-quantized, or Fock space setting, incorporating discrete counting labels for taxa and character types, to allow for a description in the number basis. Rate operators describing both time evolution without branching, and also phylogenetic branching events, are identified. A detailed development of these ideas is given, using standard transcriptions from the microscopic formulation of non-equilibrium reaction-diffusion or birth-death processes. These give the relations between stochastic rate matrices, the matrix elements of the corresponding evolution operators representing them, and the integral kernels needed to implement these as path integrals. The 'free' theory (without branching) is solved, and the correct trilinear 'interaction' terms (representing branching events) are presented. The full model is developed in perturbation theory via the derivation of explicit Feynman rules which establish that the probabilities (pattern frequencies of leaf colourations) arising as matrix elements of the time evolution operator are identical with those computed via the standard analysis. Simple examples (phylogenetic trees with two or three leaves), are discussed in detail. Further implications for the work are briefly considered including the role of time reparametrization covariance.

  7. Path integral formulation and Feynman rules for phylogenetic branching models

    International Nuclear Information System (INIS)

    Jarvis, P D; Bashford, J D; Sumner, J G

    2005-01-01

    A dynamical picture of phylogenetic evolution is given in terms of Markov models on a state space, comprising joint probability distributions for character types of taxonomic classes. Phylogenetic branching is a process which augments the number of taxa under consideration, and hence the rank of the underlying joint probability state tensor. We point out the combinatorial necessity for a second-quantized, or Fock space setting, incorporating discrete counting labels for taxa and character types, to allow for a description in the number basis. Rate operators describing both time evolution without branching, and also phylogenetic branching events, are identified. A detailed development of these ideas is given, using standard transcriptions from the microscopic formulation of non-equilibrium reaction-diffusion or birth-death processes. These give the relations between stochastic rate matrices, the matrix elements of the corresponding evolution operators representing them, and the integral kernels needed to implement these as path integrals. The 'free' theory (without branching) is solved, and the correct trilinear 'interaction' terms (representing branching events) are presented. The full model is developed in perturbation theory via the derivation of explicit Feynman rules which establish that the probabilities (pattern frequencies of leaf colourations) arising as matrix elements of the time evolution operator are identical with those computed via the standard analysis. Simple examples (phylogenetic trees with two or three leaves), are discussed in detail. Further implications for the work are briefly considered including the role of time reparametrization covariance

  8. Inferring epidemic contact structure from phylogenetic trees.

    Directory of Open Access Journals (Sweden)

    Gabriel E Leventhal

    Full Text Available Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing.

  9. Multilocus inference of species trees and DNA barcoding.

    Science.gov (United States)

    Mallo, Diego; Posada, David

    2016-09-05

    The unprecedented amount of data resulting from next-generation sequencing has opened a new era in phylogenetic estimation. Although large datasets should, in theory, increase phylogenetic resolution, massive, multilocus datasets have uncovered a great deal of phylogenetic incongruence among different genomic regions, due both to stochastic error and to the action of different evolutionary process such as incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence violates one of the fundamental assumptions of the DNA barcoding approach, which assumes that gene history and species history are identical. In this review, we explain some of the most important challenges we will have to face to reconstruct the history of species, and the advantages and disadvantages of different strategies for the phylogenetic analysis of multilocus data. In particular, we describe the evolutionary events that can generate species tree-gene tree discordance, compare the most popular methods for species tree reconstruction, highlight the challenges we need to face when using them and discuss their potential utility in barcoding. Current barcoding methods sacrifice a great amount of statistical power by only considering one locus, and a transition to multilocus barcodes would not only improve current barcoding methods, but also facilitate an eventual transition to species-tree-based barcoding strategies, which could better accommodate scenarios where the barcode gap is too small or inexistent.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.

  10. A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories.

    Science.gov (United States)

    Knowles, L Lacey; Huang, Huateng; Sukumaran, Jeet; Smith, Stephen A

    2018-03-01

    Discordant gene trees are commonly encountered when sequences from thousands of loci are applied to estimate phylogenetic relationships. Several processes contribute to this discord. Yet, we have no methods that jointly model different sources of conflict when estimating phylogenies. An alternative to analyzing entire genomes or all the sequenced loci is to identify a subset of loci for phylogenetic analysis. If we can identify data partitions that are most likely to reflect descent from a common ancestor (i.e., discordant loci that indeed reflect incomplete lineage sorting [ILS], as opposed to some other process, such as lateral gene transfer [LGT]), we can analyze this subset using powerful coalescent-based species-tree approaches. Test data sets were simulated where discord among loci could arise from ILS and LGT. Data sets where analyzed using the newly developed program CLASSIPHY (Huang et al., ) to assess whether our ability to distinguish the cause of discord among loci varied when ILS and LGT occurred in the recent versus deep past and whether the accuracy of these inferences were affected by the mutational process. We show that accuracy of probabilistic classification of individual loci by the cause of discord differed when ILS and LGT events occurred more recently compared with the distant past and that the signal-to-noise ratio arising from the mutational process contributes to difficulties in inferring LGT data partitions. We discuss our findings in terms of the promise and limitations of identifying subsets of loci for species-tree inference that will not violate the underlying coalescent model (i.e., data partitions in which ILS, and not LGT, contributes to discord). We also discuss the empirical implications of our work given the many recalcitrant nodes in the tree of life (e.g., origins of angiosperms, amniotes, or Neoaves), and recent arguments for concatenating loci. © 2018 Botanical Society of America.

  11. Molecular phylogenetics of finches and sparrows: consequences of character state removal in cytochrome b sequences.

    Science.gov (United States)

    Groth, J G

    1998-12-01

    The complete mitochondrial cytochrome b genes of 53 genera of oscine passerine birds representing the major groups of finches and some allies were compared. Phylogenetic trees resulting from three levels of character partition removal (no data removed, transitions at third positions of codons removed, and all transitions removed [transversion parsimony]) were generally concordant, and all supported several basic statements regarding relationships of finches and finch-like birds, including: (1) larks (Alaudidae) show no close relationship to any finch group; (2) Peucedramus (olive warbler) is phylogenetically far removed from true wood warblers; (3) a clade consisting of fringillids, passerids, motacillids, and emberizids is supported, and this clade is characterized by evolution of a vestigial 10th wing primary; and (4) Hawaiian honeycreepers are derived from within the cardueline finches. Excluding transition substitutions at third positions of codons resulted in phylogenetic trees similar to, but with greater bootstrap nodal support than, trees derived using either all data (equally weighted) or transversion parsimony. Relative to the shortest trees obtained using all data, the topologies obtained after elimination of third-position transitions showed only slight increases in realized treelength and homoplasy. These increases were negligable compared to increases in overall nodal support; therefore, this partition removal scheme may enhance recovery of deep phylogenetic signal in protein-coding DNA datasets. Copyright 1998 Academic Press.

  12. Phylogenetic diversity analysis of Trichoderma species based on ...

    African Journals Online (AJOL)

    vi-4177/CSAU be assigned as the type strains of a species of genus Trichoderma based on phylogenetic tree analysis together with the 18S rRNA gene sequence search in Ribosomal Database Project, small subunit rRNA and large subunit ...

  13. A Distance Measure for Genome Phylogenetic Analysis

    Science.gov (United States)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  14. TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics

    Directory of Open Access Journals (Sweden)

    von Haeseler Arndt

    2004-06-01

    Full Text Available Abstract Background Most analysis programs for inferring molecular phylogenies are difficult to use, in particular for researchers with little programming experience. Results TREEFINDER is an easy-to-use integrative platform-independent analysis environment for molecular phylogenetics. In this paper the main features of TREEFINDER (version of April 2004 are described. TREEFINDER is written in ANSI C and Java and implements powerful statistical approaches for inferring gene tree and related analyzes. In addition, it provides a user-friendly graphical interface and a phylogenetic programming language. Conclusions TREEFINDER is a versatile framework for analyzing phylogenetic data across different platforms that is suited both for exploratory as well as advanced studies.

  15. Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

    Science.gov (United States)

    Zhou, Xiaofan; Shen, Xing-Xing; Hittinger, Chris Todd

    2018-01-01

    Abstract The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses. PMID:29177474

  16. Resolving ambiguity in the phylogenetic relationship of genotypes A, B, and C of hepatitis B virus

    Science.gov (United States)

    2013-01-01

    Background Hepatitis B virus (HBV) is an important infectious agent that causes widespread concern because billions of people are infected by at least 8 different HBV genotypes worldwide. However, reconstruction of the phylogenetic relationship between HBV genotypes is difficult. Specifically, the phylogenetic relationships among genotypes A, B, and C are not clear from previous studies because of the confounding effects of genotype recombination. In order to clarify the evolutionary relationships, a rigorous approach is required that can effectively explore genetic sequences with recombination. Result In the present study, phylogenetic relationship of the HBV genotypes was reconstructed using a consensus phylogeny of phylogenetic trees of HBV genome segments. Reliability of the reconstructed phylogeny was extensively evaluated in agreements of local phylogenies of genome segments. The reconstructed phylogenetic tree revealed that HBV genotypes B and C had a closer phylogenetic relationship than genotypes A and B or A and C. Evaluations showed the consensus method was capable to reconstruct reliable phylogenetic relationship in the presence of recombinants. Conclusion The consensus method implemented in this study provides an alternative approach for reconstructing reliable phylogenetic relationships for viruses with possible genetic recombination. Our approach revealed the phylogenetic relationships of genotypes A, B, and C of HBV. PMID:23758960

  17. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications.

    Science.gov (United States)

    Goremykin, Vadim V; Holland, Barbara; Hirsch-Ernst, Karen I; Hellwig, Frank H

    2005-09-01

    Determining the phylogenetic relationships among the major lines of angiosperms is a long-standing problem, yet the uncertainty as to the phylogenetic affinity of these lines persists. While a number of studies have suggested that the ANITA (Amborella-Nymphaeales-Illiciales-Trimeniales-Aristolochiales) grade is basal within angiosperms, studies of complete chloroplast genome sequences also suggested an alternative tree, wherein the line leading to the grasses branches first among the angiosperms. To improve taxon sampling in the existing chloroplast genome data, we sequenced the chloroplast genome of the monocot Acorus calamus. We generated a concatenated alignment (89,436 positions for 15 taxa), encompassing almost all sequences usable for phylogeny reconstruction within spermatophytes. The data still contain support for both the ANITA-basal and grasses-basal hypotheses. Using simulations we can show that were the ANITA-basal hypothesis true, parsimony (and distance-based methods with many models) would be expected to fail to recover it. The self-evident explanation for this failure appears to be a long-branch attraction (LBA) between the clade of grasses and the out-group. However, this LBA cannot explain the discrepancies observed between tree topology recovered using the maximum likelihood (ML) method and the topologies recovered using the parsimony and distance-based methods when grasses are deleted. Furthermore, the fact that neither maximum parsimony nor distance methods consistently recover the ML tree, when according to the simulations they would be expected to, when the out-group (Pinus) is deleted, suggests that either the generating tree is not correct or the best symmetric model is misspecified (or both). We demonstrate that the tree recovered under ML is extremely sensitive to model specification and that the best symmetric model is misspecified. Hence, we remain agnostic regarding phylogenetic relationships among basal angiosperm lineages.

  18. Application of agglomerative clustering for analyzing phylogenetically on bacterium of saliva

    Science.gov (United States)

    Bustamam, A.; Fitria, I.; Umam, K.

    2017-07-01

    Analyzing population of Streptococcus bacteria is important since these species can cause dental caries, periodontal, halitosis (bad breath) and more problems. This paper will discuss the phylogenetically relation between the bacterium Streptococcus in saliva using a phylogenetic tree of agglomerative clustering methods. Starting with the bacterium Streptococcus DNA sequence obtained from the GenBank, then performed characteristic extraction of DNA sequences. The characteristic extraction result is matrix form, then performed normalization using min-max normalization and calculate genetic distance using Manhattan distance. Agglomerative clustering technique consisting of single linkage, complete linkage and average linkage. In this agglomerative algorithm number of group is started with the number of individual species. The most similar species is grouped until the similarity decreases and then formed a single group. Results of grouping is a phylogenetic tree and branches that join an established level of distance, that the smaller the distance the more the similarity of the larger species implementation is using R, an open source program.

  19. Applying phylogenetic analysis to viral livestock diseases: moving beyond molecular typing.

    Science.gov (United States)

    Olvera, Alex; Busquets, Núria; Cortey, Marti; de Deus, Nilsa; Ganges, Llilianne; Núñez, José Ignacio; Peralta, Bibiana; Toskano, Jennifer; Dolz, Roser

    2010-05-01

    Changes in livestock production systems in recent years have altered the presentation of many diseases resulting in the need for more sophisticated control measures. At the same time, new molecular assays have been developed to support the diagnosis of animal viral disease. Nucleotide sequences generated by these diagnostic techniques can be used in phylogenetic analysis to infer phenotypes by sequence homology and to perform molecular epidemiology studies. In this review, some key elements of phylogenetic analysis are highlighted, such as the selection of the appropriate neutral phylogenetic marker, the proper phylogenetic method and different techniques to test the reliability of the resulting tree. Examples are given of current and future applications of phylogenetic reconstructions in viral livestock diseases. Copyright 2009 Elsevier Ltd. All rights reserved.

  20. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  1. Forensic application of phylogenetic analyses - Exploration of suspected HIV-1 transmission case.

    Science.gov (United States)

    Siljic, Marina; Salemovic, Dubravka; Cirkovic, Valentina; Pesic-Pavlovic, Ivana; Ranin, Jovan; Todorovic, Marija; Nikolic, Slobodan; Jevtovic, Djordje; Stanojevic, Maja

    2017-03-01

    Transmission of human immunodeficiency virus (HIV) between individuals may have important legal implications and therefore may come to require forensic investigation based upon phylogenetic analysis. In criminal trials results of phylogenetic analyses have been used as evidence of responsibility for HIV transmission. In Serbia, as in many countries worldwide, exposure and deliberate transmission of HIV are criminalized. We present the results of applying state of the art phylogenetic analyses, based on pol and env genetic sequences, in exploration of suspected HIV transmission among three subjects: a man and two women, with presumed assumption of transmission direction from one woman to a man. Phylogenetic methods included relevant neighbor-joining (NJ), maximum likelihood (ML) and Bayesian methods of phylogenetic trees reconstruction and hypothesis testing, that has been shown to be the most sensitive for the reconstruction of epidemiological links mostly from sexually infected individuals. End-point limiting-dilution PCR (EPLD-PCR) assay, generating the minimum of 10 sequences per genetic region per subject, was performed to assess HIV quasispecies distribution and to explore the direction of HIV transmission between three subjects. Phylogenetic analysis revealed that the viral sequences from the three subjects were more genetically related to each other than to other strains circulating in the same area with the similar epidemiological profile, forming strongly supported transmission chain, which could be in favour of a priori hypothesis of one of the women infecting the man. However, in the EPLD based phylogenetic trees for both pol and env genetic region, viral sequences of one subject (man) were paraphyletic to those of two other subjects (women), implying the direction of transmission opposite to the a priori assumption. The dated tree in our analysis confirmed the clustering pattern of query sequences. Still, in the context of unsampled sequences and

  2. Comparison of Boolean analysis and standard phylogenetic methods using artificially evolved and natural mt-tRNA sequences from great apes.

    Science.gov (United States)

    Ari, Eszter; Ittzés, Péter; Podani, János; Thi, Quynh Chi Le; Jakó, Eena

    2012-04-01

    Boolean analysis (or BOOL-AN; Jakó et al., 2009. BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction. Mol. Phylogenet. Evol. 52, 887-97.), a recently developed method for sequence comparison uses the Iterative Canonical Form of Boolean functions. It considers sequence information in a way entirely different from standard phylogenetic methods (i.e. Maximum Parsimony, Maximum-Likelihood, Neighbor-Joining, and Bayesian analysis). The performance and reliability of Boolean analysis were tested and compared with the standard phylogenetic methods, using artificially evolved - simulated - nucleotide sequences and the 22 mitochondrial tRNA genes of the great apes. At the outset, we assumed that the phylogeny of Hominidae is generally well established, and the guide tree of artificial sequence evolution can also be used as a benchmark. These offer a possibility to compare and test the performance of different phylogenetic methods. Trees were reconstructed by each method from 2500 simulated sequences and 22 mitochondrial tRNA sequences. We also introduced a special re-sampling method for Boolean analysis on permuted sequence sites, the P-BOOL-AN procedure. Considering the reliability values (branch support values of consensus trees and Robinson-Foulds distances) we used for simulated sequence trees produced by different phylogenetic methods, BOOL-AN appeared as the most reliable method. Although the mitochondrial tRNA sequences of great apes are relatively short (59-75 bases long) and the ratio of their constant characters is about 75%, BOOL-AN, P-BOOL-AN and the Bayesian approach produced the same tree-topology as the established phylogeny, while the outcomes of Maximum Parsimony, Maximum-Likelihood and Neighbor-Joining methods were equivocal. We conclude that Boolean analysis is a promising alternative to existing methods of sequence comparison for phylogenetic reconstruction and congruence analysis. Copyright © 2012 Elsevier Inc. All

  3. Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat.

    Science.gov (United States)

    Harris, J Kirk; Caporaso, J Gregory; Walker, Jeffrey J; Spear, John R; Gold, Nicholas J; Robertson, Charles E; Hugenholtz, Philip; Goodrich, Julia; McDonald, Daniel; Knights, Dan; Marshall, Paul; Tufo, Henry; Knight, Rob; Pace, Norman R

    2013-01-01

    The microbial mats of Guerrero Negro (GN), Baja California Sur, Mexico historically were considered a simple environment, dominated by cyanobacteria and sulfate-reducing bacteria. Culture-independent rRNA community profiling instead revealed these microbial mats as among the most phylogenetically diverse environments known. A preliminary molecular survey of the GN mat based on only ∼1500 small subunit rRNA gene sequences discovered several new phylum-level groups in the bacterial phylogenetic domain and many previously undetected lower-level taxa. We determined an additional ∼119,000 nearly full-length sequences and 28,000 >200 nucleotide 454 reads from a 10-layer depth profile of the GN mat. With this unprecedented coverage of long sequences from one environment, we confirm the mat is phylogenetically stratified, presumably corresponding to light and geochemical gradients throughout the depth of the mat. Previous shotgun metagenomic data from the same depth profile show the same stratified pattern and suggest that metagenome properties may be predictable from rRNA gene sequences. We verify previously identified novel lineages and identify new phylogenetic diversity at lower taxonomic levels, for example, thousands of operational taxonomic units at the family-genus levels differ considerably from known sequences. The new sequences populate parts of the bacterial phylogenetic tree that previously were poorly described, but indicate that any comprehensive survey of GN diversity has only begun. Finally, we show that taxonomic conclusions are generally congruent between Sanger and 454 sequencing technologies, with the taxonomic resolution achieved dependent on the abundance of reference sequences in the relevant region of the rRNA tree of life.

  4. Analysis of HIV subtypes and the phylogenetic tree in HIV-positive samples from Saudi Arabia

    International Nuclear Information System (INIS)

    Al-Zahrani, Alhusain J.

    2008-01-01

    Objective was to assess the prevalence of HIV-1 genetic subtypes in Saudi Arabia in samples that are serologically positive for HIV-1 and compare the HIV-1 genetic subtypes prevalent in Saudi Arabia with the subtypes prevalent in other countries. Thirty-nine HIV-1 positive samples were analyzed for HIV-1 subtypes using molecular techniques. The study is retrospective study that was conducted in Dammam, Kingdom of Saudi Arabia and in Abbott laboratories (United States of America) from2004 to 2007. All samples were seropositive for HIV-1 group M. Of the 39 seropositive samples, only 12 were polymerase chain reaction positive. Subtype C is the most common virus strain as it occurred in 58% of these samples; subtype B occurred in 17%; subtypes A, D and G were found in 8% each. The phylogenetic tree was also identified for the isolates. Detection of HIV subtypes is important for epidemiological purposes and may help in tracing the source of HIV infections in the Kingdom of Saudi Arabia. (author)

  5. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    cophaga ranges from 0.037–0.106 and 0.049–0.207 for COI and ND5 genes, respectively (tables 2 and 3). Analysis of genetic distance on the basis of sequence difference for both the mitochondrial genes shows very little genetic difference. The discrepancy in the phylogenetic trees based on individ- ual genes may be due ...

  6. Phylogenetic relationships within and among Brassica species from ...

    African Journals Online (AJOL)

    STORAGESEVER

    2008-05-02

    May 2, 2008 ... Inappropriate tree reconstruction methods would pose a problem only in the basal relationships rather than in terminal taxa; the paraphyly observed in this study applied mostly to terminal taxa. This study recovered sufficient phylogenetic characters to separate accessions of the same species, making.

  7. ETE: a python Environment for Tree Exploration.

    Science.gov (United States)

    Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni

    2010-01-13

    Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  8. ETE: a python Environment for Tree Exploration

    Directory of Open Access Journals (Sweden)

    Gabaldón Toni

    2010-01-01

    Full Text Available Abstract Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE, a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  9. Phylogenetic and chemical diversity of MAR4 streptomycete lineage

    Directory of Open Access Journals (Sweden)

    Marisa Paulino

    2014-06-01

    To date, phylogenetic characterization of 6 representative isolates, based on partial sequence of gene encoding 16S rRNA, confirm that these strains belong to the specie Streptomyces aculeolatus. Figure 2. Neighbour-joining phylogenetic tree created from 6 partial 16S rRNA gene sequence from Streptomyces aculeolatus strains cultured from Madeira Archipelago, based on 1000 bootstrap replicates. BLAST matches (deposited in GenBank are included with species and strain name followed by accession number. Verrucosispora maris and Micromonospora aurantiaca were used as outgroups.

  10. The copy-number tree mixture deconvolution problem and applications to multi-sample bulk sequencing tumor data

    NARCIS (Netherlands)

    S. Zaccaria (Simone); M. El-Kebir (Mohammed); G.W. Klau (Gunnar); B.J. Raphael (Benjamin)

    2017-01-01

    textabstractCancer is an evolutionary process driven by somatic mutation. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the mutational complexity of cancer and the fact that nearly all cancer

  11. Molecular identification and phylogenetic study of Demodex caprae.

    Science.gov (United States)

    Zhao, Ya-E; Cheng, Juan; Hu, Li; Ma, Jun-Xian

    2014-10-01

    The DNA barcode has been widely used in species identification and phylogenetic analysis since 2003, but there have been no reports in Demodex. In this study, to obtain an appropriate DNA barcode for Demodex, molecular identification of Demodex caprae based on mitochondrial cox1 was conducted. Firstly, individual adults and eggs of D. caprae were obtained for genomic DNA (gDNA) extraction; Secondly, mitochondrial cox1 fragment was amplified, cloned, and sequenced; Thirdly, cox1 fragments of D. caprae were aligned with those of other Demodex retrieved from GenBank; Finally, the intra- and inter-specific divergences were computed and the phylogenetic trees were reconstructed to analyze phylogenetic relationship in Demodex. Results obtained from seven 429-bp fragments of D. caprae showed that sequence identities were above 99.1% among three adults and four eggs. The intraspecific divergences in D. caprae, Demodex folliculorum, Demodex brevis, and Demodex canis were 0.0-0.9, 0.5-0.9, 0.0-0.2, and 0.0-0.5%, respectively, while the interspecific divergences between D. caprae and D. folliculorum, D. canis, and D. brevis were 20.3-20.9, 21.8-23.0, and 25.0-25.3, respectively. The interspecific divergences were 10 times higher than intraspecific ones, indicating considerable barcoding gap. Furthermore, the phylogenetic trees showed that four Demodex species gathered separately, representing independent species; and Demodex folliculorum gathered with canine Demodex, D. caprae, and D. brevis in sequence. In conclusion, the selected 429-bp mitochondrial cox1 gene is an appropriate DNA barcode for molecular classification, identification, and phylogenetic analysis of Demodex. D. caprae is an independent species and D. folliculorum is closer to D. canis than to D. caprae or D. brevis.

  12. On the information content of discrete phylogenetic characters.

    Science.gov (United States)

    Bordewich, Magnus; Deutschmann, Ina Maria; Fischer, Mareike; Kasbohm, Elisa; Semple, Charles; Steel, Mike

    2017-12-16

    Phylogenetic inference aims to reconstruct the evolutionary relationships of different species based on genetic (or other) data. Discrete characters are a particular type of data, which contain information on how the species should be grouped together. However, it has long been known that some characters contain more information than others. For instance, a character that assigns the same state to each species groups all of them together and so provides no insight into the relationships of the species considered. At the other extreme, a character that assigns a different state to each species also conveys no phylogenetic signal. In this manuscript, we study a natural combinatorial measure of the information content of an individual character and analyse properties of characters that provide the maximum phylogenetic information, particularly, the number of states such a character uses and how the different states have to be distributed among the species or taxa of the phylogenetic tree.

  13. Integration of vessel traits, wood density, and height in angiosperm shrubs and trees.

    Science.gov (United States)

    Martínez-Cabrera, Hugo I; Schenk, H Jochen; Cevallos-Ferriz, Sergio R S; Jones, Cynthia S

    2011-05-01

    Trees and shrubs tend to occupy different niches within and across ecosystems; therefore, traits related to their resource use and life history are expected to differ. Here we analyzed how growth form is related to variation in integration among vessel traits, wood density, and height. We also considered the ecological and evolutionary consequences of such differences. In a sample of 200 woody plant species (65 shrubs and 135 trees) from Argentina, Mexico, and the United States, standardized major axis (SMA) regression, correlation analyses, and ANOVA were used to determine whether relationships among traits differed between growth forms. The influence of phylogenetic relationships was examined with a phylogenetic ANOVA and phylogenetically independent contrasts (PICs). A principal component analysis was conducted to determine whether trees and shrubs occupy different portions of multivariate trait space. Wood density did not differ between shrubs and trees, but there were significant differences in vessel diameter, vessel density, theoretical conductivity, and as expected, height. In addition, relationships between vessel traits and wood density differed between growth forms. Trees showed coordination among vessel traits, wood density, and height, but in shrubs, wood density and vessel traits were independent. These results hold when phylogenetic relationships were considered. In the multivariate analyses, these differences translated as significantly different positions in multivariate trait space occupied by shrubs and trees. Differences in trait integration between growth forms suggest that evolution of growth form in some lineages might be associated with the degree of trait interrelation.

  14. Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

    Science.gov (United States)

    Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A

    2018-04-29

    Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.

  15. A new support measure to quantify the impact of local optima in phylogenetic analyses.

    KAUST Repository

    Brammer, Grant

    2011-09-29

    Phylogentic analyses are often incorrectly assumed to have stabilized to a single optimum. However, a set of trees from a phylogenetic analysis may contain multiple distinct local optima with each optimum providing different levels of support for each clade. For situations with multiple local optima, we propose p-support which is a clade support measure that shows the impact optima have on a final consensus tree. Our p-support measure is implemented in our PeakMapper software package. We study our approach on two published, large-scale biological tree collections. PeakMapper shows that each data set contains multiple local optima. p-support shows that both datasets contain clades in the majority consensus tree that are only supported by a subset of the local optima. Clades with low p-support are most likely to benefit from further investigation. These tools provide researchers with new information regarding phylogenetic analyses beyond what is provided by other support measures alone.

  16. Interacting walkers on the Cayley tree, and polymer statistics

    International Nuclear Information System (INIS)

    Priezzhev, V.B.

    1986-01-01

    We obtain the generating function for an ensemble of random walkers on the Cayley tree of coordination number z. The pair interaction between walkers is taken into account. This forbids two walkers to occupy the same lattice point after an equal number of steps. Interacting polymer statistics results from this model if one associates time (or the number of steps) with an additional space coordinate. The limiting free energy appears in a form that corresponds to the phase transition of ''3/2 order.''

  17. A new fast method for inferring multiple consensus trees using k-medoids.

    Science.gov (United States)

    Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir

    2018-04-05

    Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while

  18. Evaluating the relationship between evolutionary divergence and phylogenetic accuracy in AFLP data sets.

    Science.gov (United States)

    García-Pereira, María Jesús; Caballero, Armando; Quesada, Humberto

    2010-05-01

    Using in silico amplified fragment length polymorphism (AFLP) fingerprints, we explore the relationship between sequence similarity and phylogeny accuracy to test when, in terms of genetic divergence, the quality of AFLP data becomes too low to be informative for a reliable phylogenetic reconstruction. We generated DNA sequences with known phylogenies using balanced and unbalanced trees with recent, uniform and ancient radiations, and average branch lengths (from the most internal node to the tip) ranging from 0.02 to 0.4 substitutions per site. The resulting sequences were used to emulate the AFLP procedure. Trees were estimated by maximum parsimony (MP), neighbor-joining (NJ), and minimum evolution (ME) methods from both DNA sequences and virtual AFLP fingerprints. The estimated trees were compared with the reference trees using a score that measures overall differences in both topology and relative branch length. As expected, the accuracy of AFLP-based phylogenies decreased dramatically in the more divergent data sets. Above a divergence of approximately 0.05, AFLP-based phylogenies were largely inaccurate irrespective of the distinct topology, radiation model, or phylogenetic method used. This value represents an upper bound of expected tree accuracy for data sets with a simple divergence history; AFLP data sets with a similar divergence but with unbalanced topologies and short ancestral branches produced much less accurate trees. The lack of homology of AFLP bands quickly increases with divergence and reaches its maximum value (100%) at a divergence of only 0.4. Low guanine-cytosine (GC) contents increase the number of nonhomologous bands in AFLP data sets and lead to less reliable trees. However, the effect of the lack of band homology on tree accuracy is surprisingly small relative to the negative impact due to the low information content of AFLP characters. Tree-building methods based on genetic distance displayed similar trends and outperformed parsimony

  19. Phylogenetic relationships between Sarcocystis species from reindeer and other Sarcocystidae deduced from ssu rRNA gene sequences

    DEFF Research Database (Denmark)

    Dahlgren, S.S.; Oliveira, Rodrigo Gouveia; Gjerde, B.

    2008-01-01

    any effect on previously inferred phylogenetic relationships within the Sarcocystidae. The complete small subunit (ssu) rRNA gene sequences of all six Sarcocystis species from reindeer were used in the phylogenetic analyses along with ssu rRNA gene sequences of 85 other members of the Coccidea. Trees...... the six species in phylogenetic analyses of the Sarcocystidae, and also to investigate the phylogenetic relationships between the species from reindeer and those from other hosts. The study also aimed at revealing whether the inclusion of six Sarcocystis species from the same intermediate host would have....... tarandivulpes, formed a sister group to other Sarcocystis species with a canine definitive host. The position of S. hardangeri on the tree suggested that it uses another type of definitive host than the other Sarcocystis species in this clade. Considering the geographical distribution and infection intensity...

  20. Volatile-mediated interactions between phylogenetically different soil bacteria

    Directory of Open Access Journals (Sweden)

    Paolina eGarbeva

    2014-06-01

    Full Text Available There is increasing evidence that organic volatiles play an important role in interactions between micro-organisms in the porous soil matrix. Here we report that volatile compounds emitted by different soil bacteria can affect the growth, antibiotic production and gene expression of the soil bacterium Pseudomonas fluorescens Pf0-1. We applied a novel cultivation approach that mimics the natural nutritional heterogeneity in soil in which P. fluorescens grown on nutrient-limited agar was exposed to volatiles produced by 4 phylogenetically different bacterial isolates (Collimonas pratensis, Serratia plymuthica, Paenibacillus sp. and Pedobacter sp. growing in sand containing artificial root exudates. Contrary to our expectation, the produced volatiles stimulated rather than inhibited the growth of P. fluorescens. A genome-wide, microarray-based analysis revealed that volatiles of all 4 bacterial strains affected gene expression of P. fluorescens, but with a different pattern of gene expression for each strain. Based on the annotation of the differently expressed genes, bacterial volatiles appear to induce a chemotactic motility response in P. fluorescens, but also an oxidative stress response. A more detailed study revealed that volatiles produced by C. pratensis triggered, antimicrobial secondary metabolite production in P. fluorescens. Our results indicate that bacterial volatiles can have an important role in communication, trophic - and antagonistic interactions within the soil bacterial community.

  1. The performance of phylogenetic algorithms in estimating haplotype genealogies with migration.

    Science.gov (United States)

    Salzburger, Walter; Ewing, Greg B; Von Haeseler, Arndt

    2011-05-01

    Genealogies estimated from haplotypic genetic data play a prominent role in various biological disciplines in general and in phylogenetics, population genetics and phylogeography in particular. Several software packages have specifically been developed for the purpose of reconstructing genealogies from closely related, and hence, highly similar haplotype sequence data. Here, we use simulated data sets to test the performance of traditional phylogenetic algorithms, neighbour-joining, maximum parsimony and maximum likelihood in estimating genealogies from nonrecombining haplotypic genetic data. We demonstrate that these methods are suitable for constructing genealogies from sets of closely related DNA sequences with or without migration. As genealogies based on phylogenetic reconstructions are fully resolved, but not necessarily bifurcating, and without reticulations, these approaches outperform widespread 'network' constructing methods. In our simulations of coalescent scenarios involving panmictic, symmetric and asymmetric migration, we found that phylogenetic reconstruction methods performed well, while the statistical parsimony approach as implemented in TCS performed poorly. Overall, parsimony as implemented in the PHYLIP package performed slightly better than other methods. We further point out that we are not making the case that widespread 'network' constructing methods are bad, but that traditional phylogenetic tree finding methods are applicable to haplotypic data and exhibit reasonable performance with respect to accuracy and robustness. We also discuss some of the problems of converting a tree to a haplotype genealogy, in particular that it is nonunique. © 2011 Blackwell Publishing Ltd.

  2. Inferring phylogenetic networks by the maximum parsimony criterion: a case study.

    Science.gov (United States)

    Jin, Guohua; Nakhleh, Luay; Snir, Sagi; Tuller, Tamir

    2007-01-01

    Horizontal gene transfer (HGT) may result in genes whose evolutionary histories disagree with each other, as well as with the species tree. In this case, reconciling the species and gene trees results in a network of relationships, known as the "phylogenetic network" of the set of species. A phylogenetic network that incorporates HGT consists of an underlying species tree that captures vertical inheritance and a set of edges which model the "horizontal" transfer of genetic material. In a series of papers, Nakhleh and colleagues have recently formulated a maximum parsimony (MP) criterion for phylogenetic networks, provided an array of computationally efficient algorithms and heuristics for computing it, and demonstrated its plausibility on simulated data. In this article, we study the performance and robustness of this criterion on biological data. Our findings indicate that MP is very promising when its application is extended to the domain of phylogenetic network reconstruction and HGT detection. In all cases we investigated, the MP criterion detected the correct number of HGT events required to map the evolutionary history of a gene data set onto the species phylogeny. Furthermore, our results indicate that the criterion is robust with respect to both incomplete taxon sampling and the use of different site substitution matrices. Finally, our results show that the MP criterion is very promising in detecting HGT in chimeric genes, whose evolutionary histories are a mix of vertical and horizontal evolution. Besides the performance analysis of MP, our findings offer new insights into the evolution of 4 biological data sets and new possible explanations of HGT scenarios in their evolutionary history.

  3. Some limitations of public sequence data for phylogenetic inference (in plants).

    Science.gov (United States)

    Hinchliff, Cody E; Smith, Stephen Andrew

    2014-01-01

    The GenBank database contains essentially all of the nucleotide sequence data generated for published molecular systematic studies, but for the majority of taxa these data remain sparse. GenBank has value for phylogenetic methods that leverage data-mining and rapidly improving computational methods, but the limits imposed by the sparse structure of the data are not well understood. Here we present a tree representing 13,093 land plant genera--an estimated 80% of extant plant diversity--to illustrate the potential of public sequence data for broad phylogenetic inference in plants, and we explore the limits to inference imposed by the structure of these data using theoretical foundations from phylogenetic data decisiveness. We find that despite very high levels of missing data (over 96%), the present data retain the potential to inform over 86.3% of all possible phylogenetic relationships. Most of these relationships, however, are informed by small amounts of data--approximately half are informed by fewer than four loci, and more than 99% are informed by fewer than fifteen. We also apply an information theoretic measure of branch support to assess the strength of phylogenetic signal in the data, revealing many poorly supported branches concentrated near the tips of the tree, where data are sparse and the limiting effects of this sparseness are stronger. We argue that limits to phylogenetic inference and signal imposed by low data coverage may pose significant challenges for comprehensive phylogenetic inference at the species level. Computational requirements provide additional limits for large reconstructions, but these may be overcome by methodological advances, whereas insufficient data coverage can only be remedied by additional sampling effort. We conclude that public databases have exceptional value for modern systematics and evolutionary biology, and that a continued emphasis on expanding taxonomic and genomic coverage will play a critical role in developing

  4. Towards a formal genealogical classification of the Lezgian languages (North Caucasus: testing various phylogenetic methods on lexical data.

    Directory of Open Access Journals (Sweden)

    Alexei Kassian

    Full Text Available A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ, Neighbor joining (NJ, Unweighted pair group method with arithmetic mean (UPGMA, Bayesian Markov chain Monte Carlo (MCMC, Unweighted maximum parsimony (UMP. Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances. Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists, the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP have yielded less likely topologies.

  5. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    Science.gov (United States)

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.

  6. Complete mitochondrial genome from South American catfish Pseudoplatystoma reticulatum (Eigenmann & Eigenmann) and its impact in Siluriformes phylogenetic tree.

    Science.gov (United States)

    Villela, Luciana Cristine Vasques; Alves, Anderson Luis; Varela, Eduardo Sousa; Yamagishi, Michel Eduardo Beleza; Giachetto, Poliana Fernanda; da Silva, Naiara Milagres Augusto; Ponzetto, Josi Margarete; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues

    2017-02-01

    The cachara (Pseudoplatystoma reticulatum) is a Neotropical freshwater catfish from family Pimelodidae (Siluriformes) native to Brazil. The species is of relative economic importance for local aquaculture production and basic biological information is under development to help boost efforts to domesticate and raise the species in commercial systems. The complete cachara mitochondrial genome was obtained by assembling Illumina RNA-seq data from pooled samples. The full mitogenome was found to be 16,576 bp in length, showing the same basic structure, order, and genetic organization observed in other Pimelodidae, with 13 protein-coding genes, 2 rNA genes, 22 trNAs, and a control region. Observed base composition was 24.63% T, 28.47% C, 31.45% A, and 15.44% G. With the exception of NAD6 and eight tRNAs, all of the observed mitochondrial genes were found to be coded on the H strand. A total of 107 SNPs were identified in P. reticulatum mtDNA, 67 of which were located in coding regions. Of these SNPs, 10 result in amino acid changes. Analysis of the obtained sequence with 94 publicly available full Siluriformes mitogenomes resulted in a phylogenetic tree that generally agreed with available phylogenetic proposals for the order. The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates. Specific sequence and content features were observed and may have functional attributes which are now available for further investigation.

  7. Computing the Skewness of the Phylogenetic Mean Pairwise Distance in Linear Time

    DEFF Research Database (Denmark)

    Tsirogiannis, Constantinos; Sandel, Brody Steven

    2014-01-01

    The phylogenetic Mean Pairwise Distance (MPD) is one of the most popular measures for computing the phylogenetic distance between a given group of species. More specifically, for a phylogenetic tree and for a set of species R represented by a subset of the leaf nodes of , the MPD of R is equal...... to the average cost of all possible simple paths in that connect pairs of nodes in R. Among other phylogenetic measures, the MPD is used as a tool for deciding if the species of a given group R are closely related. To do this, it is important to compute not only the value of the MPD for this group but also...

  8. Linking and Cutting Spanning Trees

    Directory of Open Access Journals (Sweden)

    Luís M. S. Russo

    2018-04-01

    Full Text Available We consider the problem of uniformly generating a spanning tree for an undirected connected graph. This process is useful for computing statistics, namely for phylogenetic trees. We describe a Markov chain for producing these trees. For cycle graphs, we prove that this approach significantly outperforms existing algorithms. For general graphs, experimental results show that the chain converges quickly. This yields an efficient algorithm due to the use of proper fast data structures. To obtain the mixing time of the chain we describe a coupling, which we analyze for cycle graphs and simulate for other graphs.

  9. The phylogenetic position of Amoebophrya sp. infecting Gymnodinium sanguineum.

    Science.gov (United States)

    Gunderson, J H; Goss, S H; Coats, D W

    1999-01-01

    The small-subunit rRNA sequence of a species of Amoebophrya infecting Gymnodinium sanguineum in Chesapeake Bay was obtained and compared to the small subunit rRNA sequences of other protists. Phylogenetic trees constructed with the new sequence place Amoebophrya between the remaining dinoflagellates and other protists.

  10. Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

    Science.gov (United States)

    Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

    2017-01-01

    Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  11. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies.

    Science.gov (United States)

    Leaché, Adam D; Banbury, Barbara L; Felsenstein, Joseph; de Oca, Adrián Nieto-Montes; Stamatakis, Alexandros

    2015-11-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the

  12. Multi-locus analyses of an Antarctic fish species flock (Teleostei, Notothenioidei, Trematominae): Phylogenetic approach and test of the early-radiation event

    International Nuclear Information System (INIS)

    Janko, K.; Musilova, Z.; Marshall, C.; Van Houdt, J.; Couloux, A.; Cruaud, C.; Lecointre, G.

    2011-01-01

    Clades that have undergone episodes of rapid cladogenesis are challenging from a phylogenetic point of view. They are generally characterised by short or missing internal branches in phylogenetic trees and by conflicting topologies among individual gene trees. This may be the case of the subfamily Trematominae, a group of marine teleosts of coastal Antarctic waters, which is considered to have passed through a period of rapid diversification. Despite much phylogenetic attention, the relationships among Trematominae species remain unclear. In contrast to previous studies that were mostly based on concatenated datasets of mitochondrial and/or single nuclear loci, we applied various single-locus and multi-locus phylogenetic approaches to sequences from 11 loci (eight nuclear) and we also used several methods to assess the hypothesis of a radiation event in Trematominae evolution. Diversification rate analyses support the hypothesis of a period of rapid diversification during Trematominae history and only a few nodes in the hypothetical species tree were consistently resolved with various phylogenetic methods. We detected significant discrepancies among trees from individual genes of these species, most probably resulting from incomplete lineage sorting, suggesting that concatenation of loci is not the most appropriate way to investigate Trematominae species interrelationships. These data also provide information about the possible effects of historic climate changes on the diversification rate of this group of fish. (authors)

  13. Analyzing and synthesizing phylogenies using tree alignment graphs.

    Directory of Open Access Journals (Sweden)

    Stephen A Smith

    Full Text Available Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG. The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees, we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to

  14. Analyzing and synthesizing phylogenies using tree alignment graphs.

    Science.gov (United States)

    Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E

    2013-01-01

    Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.

  15. Interacting effects of pollination, water and nutrients on fruit tree performance.

    Science.gov (United States)

    Klein, A-M; Hendrix, S D; Clough, Y; Scofield, A; Kremen, C

    2015-01-01

    Pollination is critical to fruit production, but the interactions of pollination with plant resources on a plant's reproductive and vegetative features are largely overlooked. We examined the influences of pollination, irrigation and fertilisation on the performance of almond, Prunus dulcis, in northern California. We used a full-factorial design to test for the effects of pollination limitation on fruit production and foliage variables of whole trees experiencing four resource treatments: (i) normal water and nutrients, (ii) reduced water, (iii) no nutrients, and (iv) reduced water and no nutrients. In each of these combinations, we applied three pollination treatments: hand-cross pollination, open-pollination and pollinator exclusion. Pollination strongly affected yield even under reduced water and no nutrient applications. Hand-cross pollination resulted in over 50% fruit set with small kernels, while open-pollinated flowers showed over 30% fruit set with moderate-sized kernels. Pollinator-excluded flowers had a maximum fruit set of 5%, with big and heavy kernels. Reduced water interacted with the open- and hand-cross pollination treatments, reducing yield more than in the pollinator exclusion treatment. The number of kernels negatively influenced the number of leaves, and reduced water and no nutrient applications interacted with the pollination treatments. Overall, our results indicate that the influences of pollination on fruit tree yield interact with the plant availability of nutrients and water and that excess pollination can reduce fruit quality and the production of leaves for photosynthesis. Such information is critical to understand how pollination influences fruit tree performance. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

  16. A molecular phylogeny of scaly tree ferns (Cyatheaceae).

    Science.gov (United States)

    Korall, Petra; Conant, David S; Metzgar, Jordan S; Schneider, Harald; Pryer, Kathleen M

    2007-05-01

    Tree ferns recently were identified as the closest sister group to the hyperdiverse clade of ferns, the polypods. Although most of the 600 species of tree ferns are arborescent, the group encompasses a wide range of morphological variability, from diminutive members to the giant scaly tree ferns, Cyatheaceae. This well-known family comprises most of the tree fern diversity (∼500 species) and is widespread in tropical, subtropical, and south temperate regions of the world. Here we investigate the phylogenetic relationships of scaly tree ferns based on DNA sequence data from five plastid regions (rbcL, rbcL-accD IGS, rbcL-atpB IGS, trnG-trnR, and trnL-trnF). A basal dichotomy resolves Sphaeropteris as sister to all other taxa and scale features support these two clades: Sphaeropteris has conform scales, whereas all other taxa have marginate scales. The marginate-scaled clade consists of a basal trichotomy, with the three groups here termed (1) Cyathea (including Cnemidaria, Hymenophyllopsis, Trichipteris), (2) Alsophila sensu stricto, and (3) Gymnosphaera (previously recognized as a section within Alsophila) + A. capensis. Scaly tree ferns display a wide range of indusial structures, and although indusium shape is homoplastic it does contain useful phylogenetic information that supports some of the larger clades recognised.

  17. A well-resolved phylogeny of the trees of Puerto Rico based on DNA barcode sequence data.

    Science.gov (United States)

    Muscarella, Robert; Uriarte, María; Erickson, David L; Swenson, Nathan G; Zimmerman, Jess K; Kress, W John

    2014-01-01

    The use of phylogenetic information in community ecology and conservation has grown in recent years. Two key issues for community phylogenetics studies, however, are (i) low terminal phylogenetic resolution and (ii) arbitrarily defined species pools. We used three DNA barcodes (plastid DNA regions rbcL, matK, and trnH-psbA) to infer a phylogeny for 527 native and naturalized trees of Puerto Rico, representing the vast majority of the entire tree flora of the island (89%). We used a maximum likelihood (ML) approach with and without a constraint tree that enforced monophyly of recognized plant orders. Based on 50% consensus trees, the ML analyses improved phylogenetic resolution relative to a comparable phylogeny generated with Phylomatic (proportion of internal nodes resolved: constrained ML = 74%, unconstrained ML = 68%, Phylomatic = 52%). We quantified the phylogenetic composition of 15 protected forests in Puerto Rico using the constrained ML and Phylomatic phylogenies. We found some evidence that tree communities in areas of high water stress were relatively phylogenetically clustered. Reducing the scale at which the species pool was defined (from island to soil types) changed some of our results depending on which phylogeny (ML vs. Phylomatic) was used. Overall, the increased terminal resolution provided by the ML phylogeny revealed additional patterns that were not observed with a less-resolved phylogeny. With the DNA barcode phylogeny presented here (based on an island-wide species pool), we show that a more fully resolved phylogeny increases power to detect nonrandom patterns of community composition in several Puerto Rican tree communities. Especially if combined with additional information on species functional traits and geographic distributions, this phylogeny will (i) facilitate stronger inferences about the role of historical processes in governing the assembly and composition of Puerto Rican forests, (ii) provide insight into Caribbean

  18. Variance to mean ratio, R(t), for poisson processes on phylogenetic trees.

    Science.gov (United States)

    Goldman, N

    1994-09-01

    The ratio of expected variance to mean, R(t), of numbers of DNA base substitutions for contemporary sequences related by a "star" phylogeny is widely seen as a measure of the adherence of the sequences' evolution to a Poisson process with a molecular clock, as predicted by the "neutral theory" of molecular evolution under certain conditions. A number of estimators of R(t) have been proposed, all predicted to have mean 1 and distributions based on the chi 2. Various genes have previously been analyzed and found to have values of R(t) far in excess of 1, calling into question important aspects of the neutral theory. In this paper, I use Monte Carlo simulation to show that the previously suggested means and distributions of estimators of R(t) are highly inaccurate. The analysis is applied to star phylogenies and to general phylogenetic trees, and well-known gene sequences are reanalyzed. For star phylogenies the results show that Kimura's estimators ("The Neutral Theory of Molecular Evolution," Cambridge Univ. Press, Cambridge, 1983) are unsatisfactory for statistical testing of R(t), but confirm the accuracy of Bulmer's correction factor (Genetics 123: 615-619, 1989). For all three nonstar phylogenies studied, attained values of all three estimators of R(t), although larger than 1, are within their true confidence limits under simple Poisson process models. This shows that lineage effects can be responsible for high estimates of R(t), restoring some limited confidence in the molecular clock and showing that the distinction between lineage and molecular clock effects is vital.(ABSTRACT TRUNCATED AT 250 WORDS)

  19. Widespread Discordance of Gene Trees with Species Tree inDrosophila: Evidence for Incomplete Lineage Sorting

    Energy Technology Data Exchange (ETDEWEB)

    Pollard, Daniel A.; Iyer, Venky N.; Moses, Alan M.; Eisen,Michael B.

    2006-08-28

    The phylogenetic relationship of the now fully sequencedspecies Drosophila erecta and D. yakuba with respect to the D.melanogaster species complex has been a subject of controversy. All threepossible groupings of the species have been reported in the past, thoughrecent multi-gene studies suggest that D. erecta and D. yakuba are sisterspecies. Using the whole genomes of each of these species as well as thefour other fully sequenced species in the subgenus Sophophora, we set outto investigate the placement of D. erecta and D. yakuba in the D.melanogaster species group and to understand the cause of the pastincongruence. Though we find that the phylogeny grouping D. erecta and D.yakuba together is the best supported, we also find widespreadincongruence in nucleotide and amino acid substitutions, insertions anddeletions, and gene trees. The time inferred to span the two keyspeciation events is short enough that under the coalescent model, theincongruence could be the result of incomplete lineage sorting.Consistent with the lineage-sorting hypothesis, substitutions supportingthe same tree were spatially clustered. Support for the different treeswas found to be linked to recombination such that adjacent genes supportthe same tree most often in regions of low recombination andsubstitutions supporting the same tree are most enriched roughly on thesame scale as linkage disequilibrium, also consistent with lineagesorting. The incongruence was found to be statistically significant androbust to model and species choice. No systematic biases were found. Weconclude that phylogenetic incongruence in the D. melanogaster speciescomplex is the result, at least in part, of incomplete lineage sorting.Incomplete lineage sorting will likely cause phylogenetic incongruence inmany comparative genomics datasets. Methods to infer the correct speciestree, the history of every base in the genome, and comparative methodsthat control for and/or utilize this information will be

  20. PhyTB: Phylogenetic tree visualisation and sample positioning for M. tuberculosis

    KAUST Repository

    Benavente, Ernest D; Coll, Francesc; Furnham, Nick; McNerney, Ruth; Glynn, Judith R; Campino, Susana; Pain, Arnab; Mohareb, Fady R; Clark, Taane G

    2015-01-01

    Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights

  1. Phylogenetic reconstruction of endophytic fungal isolates using internal transcribed spacer 2 (ITS2) region.

    Science.gov (United States)

    GokulRaj, Kathamuthu; Sundaresan, Natesan; Ganeshan, Enthai Jagan; Rajapriya, Pandi; Muthumary, Johnpaul; Sridhar, Jayavel; Pandi, Mohan

    2014-01-01

    Endophytic fungi are inhabitants of plants, living most part of their lifecycle asymptomatically which mainly confer protection and ecological advantages to the host plant. In this present study, 48 endophytic fungi were isolated from the leaves of three medicinal plants and characterized based on ITS2 sequence - secondary structure analysis. ITS2 secondary structures were elucidated with minimum free energy method (MFOLD version 3.1) and consensus structure of each genus was generated by 4SALE. ProfDistS was used to generate ITS2 sequence structure based phylogenetic tree respectively. Our elucidated isolates were belonging to Ascomycetes family, representing 5 orders and 6 genera. Colletotrichum/Glomerella spp., Diaporthae/Phomopsis spp., and Alternaria spp., were predominantly observed while Cochliobolus sp., Cladosporium sp., and Emericella sp., were represented by singletons. The constructed phylogenetic tree has well resolved monophyletic groups with >50% bootstrap value support. Secondary structures based fungal systematics improves not only the stability; it also increases the precision of phylogenetic inference. Above ITS2 based phylogenetic analysis was performed for our 48 isolates along with sequences of known ex-types taken from GenBank which confirms the efficiency of the proposed method. Further, we propose it as superlative marker for reconstructing phylogenetic relationships at different taxonomic levels due to their lesser length.

  2. 16S rRNA phylogenetic analysis of actinomycetes isolated from ...

    African Journals Online (AJOL)

    Subsequently, phylogenetic tree was constructed using suitable bioinformatics tools to identify the similarity which showed 97% similarity between strains. Moreover, all the selected strains of actinomycetes were subjected to study the protein and plasmid DNA expression profiles which showed prominent bands with ...

  3. Least Squares Methods for Equidistant Tree Reconstruction

    OpenAIRE

    Fahey, Conor; Hosten, Serkan; Krieger, Nathan; Timpe, Leslie

    2008-01-01

    UPGMA is a heuristic method identifying the least squares equidistant phylogenetic tree given empirical distance data among $n$ taxa. We study this classic algorithm using the geometry of the space of all equidistant trees with $n$ leaves, also known as the Bergman complex of the graphical matroid for the complete graph $K_n$. We show that UPGMA performs an orthogonal projection of the data onto a maximal cell of the Bergman complex. We also show that the equidistant tree with the least (Eucl...

  4. [Genome-wide identification, phylogenetic analysis and expression profiling of the WOX family genes in Solanum lycopersicum].

    Science.gov (United States)

    Li, Xiao-xu; Liu, Cheng; Li, Wei; Zhang, Zeng-lin; Gao, Xiao-ming; Zhou, Hui; Guo, Yong-feng

    2016-05-01

    Members of the plant-specific WOX transcription factor family have been reported to play important roles in cell to cell communication as well as other physiological and developmental processes. In this study, ten members of the WOX transcription factor family were identified in Solanum lycopersicum with HMMER. Neighbor-joining phylogenetic tree, maximum-likelihood tree and Bayesian-inference tree were constructed and similar topologies were shown using the protein sequences of the homeodomain. Phylogenetic study revealed that the 25 WOX family members from Arabidopsis and tomato fall into three clades and nine subfamilies. The patterns of exon-intron structures and organization of conserved domains in Arabidopsis and tomato were consistent based on the phylogenetic results. Transcriptome analysis showed that the expression patterns of SlWOXs were different in different tissue types. Gene Ontology (GO) analysis suggested that, as transcription factors, the SlWOX family members could be involved in a number of biological processes including cell to cell communication and tissue development. Our results are useful for future studies on WOX family members in tomato and other plant species.

  5. Phylogenetic inference with weighted codon evolutionary distances.

    Science.gov (United States)

    Criscuolo, Alexis; Michel, Christian J

    2009-04-01

    We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.

  6. Phylogenetic convolutional neural networks in metagenomics.

    Science.gov (United States)

    Fioravanti, Diego; Giarratano, Ylenia; Maggio, Valerio; Agostinelli, Claudio; Chierici, Marco; Jurman, Giuseppe; Furlanello, Cesare

    2018-03-08

    Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

  7. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    Science.gov (United States)

    Sundaram, Mekala; Willoughby, Janna R; Lichti, Nathanael I; Steele, Michael A; Swihart, Robert K

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%), and combined effects of seed traits and phylogeny of hardwood trees (5-55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is consistent with a weak

  8. Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

    Science.gov (United States)

    Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

    2017-10-11

    Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.

  9. Tree height-diameter allometry across the United States.

    Science.gov (United States)

    Hulshof, Catherine M; Swenson, Nathan G; Weiser, Michael D

    2015-03-01

    The relationship between tree height and diameter is fundamental in determining community and ecosystem structure as well as estimates of biomass and carbon storage. Yet our understanding of how tree allometry relates to climate and whole organismal function is limited. We used the Forest Inventory and Analysis National Program database to determine height-diameter allometries of 2,976,937 individuals of 293 tree species across the United States. The shape of the allometric relationship was determined by comparing linear and nonlinear functional forms. Mixed-effects models were used to test for allometric differences due to climate and floristic (between angiosperms and gymnosperms) and functional groups (leaf habit and shade tolerance). Tree allometry significantly differed across the United States largely because of climate. Temperature, and to some extent precipitation, in part explained tree allometric variation. The magnitude of allometric variation due to climate, however, had a phylogenetic signal. Specifically, angiosperm allometry was more sensitive to differences in temperature compared to gymnosperms. Most notably, angiosperm height was more negatively influenced by increasing temperature variability, whereas gymnosperm height was negatively influenced by decreasing precipitation and increasing altitude. There was little evidence to suggest that shade tolerance influenced tree allometry except for very shade-intolerant trees which were taller for any given diameter. Tree allometry is plastic rather than fixed and scaling parameters vary around predicted central tendencies. This allometric variation provides insight into life-history strategies, phylogenetic history, and environmental limitations at biogeographical scales.

  10. Tree height–diameter allometry across the United States

    Science.gov (United States)

    Hulshof, Catherine M; Swenson, Nathan G; Weiser, Michael D

    2015-01-01

    The relationship between tree height and diameter is fundamental in determining community and ecosystem structure as well as estimates of biomass and carbon storage. Yet our understanding of how tree allometry relates to climate and whole organismal function is limited. We used the Forest Inventory and Analysis National Program database to determine height–diameter allometries of 2,976,937 individuals of 293 tree species across the United States. The shape of the allometric relationship was determined by comparing linear and nonlinear functional forms. Mixed-effects models were used to test for allometric differences due to climate and floristic (between angiosperms and gymnosperms) and functional groups (leaf habit and shade tolerance). Tree allometry significantly differed across the United States largely because of climate. Temperature, and to some extent precipitation, in part explained tree allometric variation. The magnitude of allometric variation due to climate, however, had a phylogenetic signal. Specifically, angiosperm allometry was more sensitive to differences in temperature compared to gymnosperms. Most notably, angiosperm height was more negatively influenced by increasing temperature variability, whereas gymnosperm height was negatively influenced by decreasing precipitation and increasing altitude. There was little evidence to suggest that shade tolerance influenced tree allometry except for very shade-intolerant trees which were taller for any given diameter. Tree allometry is plastic rather than fixed and scaling parameters vary around predicted central tendencies. This allometric variation provides insight into life-history strategies, phylogenetic history, and environmental limitations at biogeographical scales. PMID:25859325

  11. Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study [version 1; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    John A. Lees

    2018-03-01

    Full Text Available Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined “true tree” using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons.

  12. Exploring the determinants of phylogenetic diversity and assemblage structure in conifers across temporal, spatial, and taxonomic scales

    DEFF Research Database (Denmark)

    Eiserhardt, Wolf L.; Borchsenius, Finn; Sandel, Brody Steven

    -environmental models are important elements in this framework. Here, we integrate both types of data in order to explore the determinants of forest tree diversity using the conifers as a model group. Conifers are an old, diverse (ca. 650 spp. in 6 families) and widespread group of woody plants of high ecological...... and economic importance. They are better studied than most other globally distributed groups of forest trees, allowing integrative studies with high phylogenetic and spatial resolution. We analyse phylogenetic diversity, assemblage structure, and diversification rates for regional conifer assemblages...

  13. Patterns of forest phylogenetic community structure across the United States and their possible forest health implications

    Science.gov (United States)

    Kevin M. Potter; Frank H. Koch

    2014-01-01

    The analysis of phylogenetic relationships among co-occurring tree species offers insights into the ecological organization of forest communities from an evolutionary perspective and, when employed regionally across thousands of plots, can assist in forest health assessment. Phylogenetic clustering of species, when species are more closely related than expected by...

  14. Pattern of phylogenetic diversification of the Cychrini ground beetles in the world as deduced mainly from sequence comparisons of the mitochondrial genes.

    Science.gov (United States)

    Su, Zhi-Hui; Imura, Yûki; Okamoto, Munehiro; Osawa, Syozo

    2004-02-04

    The phylogenetic position of the tribe Cychrini within the subfamily Carabinae (the family Carabidae) was estimated by comparing the nucleotide sequences of the mitochondrial NADH dehydrogenase subunit 5 (ND5) gene and the nuclear 28S ribosomal DNA (rDNA). The phylogenetic trees suggest that the Cychrini would most probably be the oldest line within the Carabinae. Phylogenetic trees were constructed by comparing the mitochondrial cytochrome C oxidase subunit I (COI) gene sequences from 33 species of the Cychrini from various localities that include the whole distribution ranges of the representative species within all the known genera in the world. The trees suggest that the Cychrini members radiated into a number of phylogenetic lineages within a short period, starting about 44 million years ago (MYA). Most of the phylogenetic lineages or sublineages are geographically linked, each consisting of a single or only a few species without scarce morphological differentiation in spite of their long evolutionary histories (silent or near-silent evolution [see Adv. Biophys. 36 (1999) 65; J. Mol. Evol. 53 (2001) 517]). The fact suggests that the geographic isolation per se did not bring about conspicuous morphological differentiation. The phylogenetic lineages of the Cychrini well correspond to the taxonomically defined genera and the subgenera.

  15. Evaluating the Intraspecific Interactions of Indian Rosewood (Dalbergia sissoo Roxb. Trees in Indian Rosewood Reserveof Khuzestan Province

    Directory of Open Access Journals (Sweden)

    Y. Erfanifard

    2016-05-01

    Full Text Available Positive and negative (facilitative and competitive interactions of plants are important issues in autecology and can be evaluated by the spatial pattern analysis in plant ecosystems. This study investigates the intraspecific interactions of Indian rosewood (Dalbergia sissoo Roxb. trees in Indian rosewood Reserve of Khuzestan province. Three 150 m × 200 m plots were selected and the spatial locations of all Indian rosewoods (239 trees were specified. Structurally different summary statistics (nearest neighbour distribution function D(r, K2-index K2(r, pair correlation function g(r, and O-ring O(r were also implemented to analyze the spatial pattern of the trees. The distribution of Indian rosewood trees significantly followed inhomogeneous Poisson process (α=0.05. The results of D(r and K2(r showed that the maximum distance to nearest tree was 12 m and density was decreased to this scale. The results of g(r and O(r also revealed the significant aggregation of Indian rosewood trees at scales of 1.5 to 4 m (α=0.05. In general, it was concluded that Indian rosewood trees had positive intraspecific interactions in Indian rosewood Reserve of Khuzestan province and their aggregation showed their facilitative effects on one another.

  16. Effects of species' similarity and dominance on the functional and phylogenetic structure of a plant meta-community.

    Science.gov (United States)

    Chalmandrier, L; Münkemüller, T; Lavergne, S; Thuiller, W

    2015-01-01

    Different assembly processes drive the spatial structure of meta-communities (beta-diversity). Recently, functional and phylogenetic diversities have been suggested as indicators of these assembly processes. Assuming that diversity is a good proxy for niche overlap, high beta-diversity along environmental gradients should be the result of environmental filtering while low beta-diversity should stem from competitive interactions. So far, studies trying to disentangle the relative importance of these assembly processes have provided mixed results. One reason for this may be that these studies often rely on a single measure of diversity and thus implicitly make a choice on how they account for species relative abundances and how species similarities are captured by functional traits or phylogeny. Here, we tested the effect of gradually scaling the importance of dominance (the weight given to dominant vs. rare species) and species similarity (the weight given to small vs. large similarities) on resulting beta-diversity patterns of an alpine plant meta-community. To this end, we combined recent extensions of the Hill numbers framework with Pagel's phylogenetic tree transformation approach. We included functional (based on the leaf-height-seed spectrum) and phylogenetic facets of beta-diversity in our analysis and explicitly accounted for effects of environmental and spatial covariates. We found that functional beta-diversity, was high when the same weight was given to dominant vs. rare species and to large vs. small species' similarities. In contrast, phylogenetic beta-diversity was low when greater weight was given to dominant species and small species' similarities. Those results suggested that different environments along the gradients filtered different species according to their functional traits, while, the same competitive lineages dominated communities across the gradients. Our results highlight that functional vs. phylogenetic facets, presence-absence vs

  17. Effects of logging and recruitment on community phylogenetic structure in 32 permanent forest plots of Kampong Thom, Cambodia.

    Science.gov (United States)

    Toyama, Hironori; Kajisa, Tsuyoshi; Tagane, Shuichiro; Mase, Keiko; Chhang, Phourin; Samreth, Vanna; Ma, Vuthy; Sokh, Heng; Ichihashi, Ryuji; Onoda, Yusuke; Mizoue, Nobuya; Yahara, Tetsukazu

    2015-02-19

    Ecological communities including tropical rainforest are rapidly changing under various disturbances caused by increasing human activities. Recently in Cambodia, illegal logging and clear-felling for agriculture have been increasing. Here, we study the effects of logging, mortality and recruitment of plot trees on phylogenetic community structure in 32 plots in Kampong Thom, Cambodia. Each plot was 0.25 ha; 28 plots were established in primary evergreen forests and four were established in secondary dry deciduous forests. Measurements were made in 1998, 2000, 2004 and 2010, and logging, recruitment and mortality of each tree were recorded. We estimated phylogeny using rbcL and matK gene sequences and quantified phylogenetic α and β diversity. Within communities, logging decreased phylogenetic diversity, and increased overall phylogenetic clustering and terminal phylogenetic evenness. Between communities, logging increased phylogenetic similarity between evergreen and deciduous plots. On the other hand, recruitment had opposite effects both within and between communities. The observed patterns can be explained by environmental homogenization under logging. Logging is biased to particular species and larger diameter at breast height, and forest patrol has been effective in decreasing logging. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  18. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Science.gov (United States)

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  19. A Maximum Parsimony Model to Reconstruct Phylogenetic Network in Honey Bee Evolution

    OpenAIRE

    Usha Chouhan; K. R. Pardasani

    2007-01-01

    Phylogenies ; The evolutionary histories of groups of species are one of the most widely used tools throughout the life sciences, as well as objects of research with in systematic, evolutionary biology. In every phylogenetic analysis reconstruction produces trees. These trees represent the evolutionary histories of many groups of organisms, bacteria due to horizontal gene transfer and plants due to process of hybridization. The process of gene transfer in bacteria and hyb...

  20. Molecular phylogenetics of mastodon and Tyrannosaurus rex.

    Science.gov (United States)

    Organ, Chris L; Schweitzer, Mary H; Zheng, Wenxia; Freimark, Lisa M; Cantley, Lewis C; Asara, John M

    2008-04-25

    We report a molecular phylogeny for a nonavian dinosaur, extending our knowledge of trait evolution within nonavian dinosaurs into the macromolecular level of biological organization. Fragments of collagen alpha1(I) and alpha2(I) proteins extracted from fossil bones of Tyrannosaurus rex and Mammut americanum (mastodon) were analyzed with a variety of phylogenetic methods. Despite missing sequence data, the mastodon groups with elephant and the T. rex groups with birds, consistent with predictions based on genetic and morphological data for mastodon and on morphological data for T. rex. Our findings suggest that molecular data from long-extinct organisms may have the potential for resolving relationships at critical areas in the vertebrate evolutionary tree that have, so far, been phylogenetically intractable.

  1. Fast Computations for Measures of Phylogenetic Beta Diversity.

    Directory of Open Access Journals (Sweden)

    Constantinos Tsirogiannis

    Full Text Available For many applications in ecology, it is important to examine the phylogenetic relations between two communities of species. More formally, let [Formula: see text] be a phylogenetic tree and let A and B be two samples of its tips, representing the examined communities. We want to compute a value that expresses the phylogenetic diversity between A and B in [Formula: see text]. There exist several measures that can do this; these are the so-called phylogenetic beta diversity (β-diversity measures. Two popular measures of this kind are the Community Distance (CD and the Common Branch Length (CBL. In most applications, it is not sufficient to compute the value of a beta diversity measure for two communities A and B; we also want to know if this value is relatively large or small compared to all possible pairs of communities in [Formula: see text] that have the same size. To decide this, the ideal approach is to compute a standardised index that involves the mean and the standard deviation of this measure among all pairs of species samples that have the same number of elements as A and B. However, no method exists for computing exactly and efficiently this index for CD and CBL. We present analytical expressions for computing the expectation and the standard deviation of CD and CBL. Based on these expressions, we describe efficient algorithms for computing the standardised indices of the two measures. Using standard algorithmic analysis, we provide guarantees on the theoretical efficiency of our algorithms. We implemented our algorithms and measured their efficiency in practice. Our implementations compute the standardised indices of CD and CBL in less than twenty seconds for a hundred pairs of samples on trees with 7 ⋅ 10(4 tips. Our implementations are available through the R package PhyloMeasures.

  2. Bias in phylogenetic reconstruction of vertebrate rhodopsin sequences.

    Science.gov (United States)

    Chang, B S; Campbell, D L

    2000-08-01

    Two spurious nodes were found in phylogenetic analyses of vertebrate rhodopsin sequences in comparison with well-established vertebrate relationships. These spurious reconstructions were well supported in bootstrap analyses and occurred independently of the method of phylogenetic analysis used (parsimony, distance, or likelihood). Use of this data set of vertebrate rhodopsin sequences allowed us to exploit established vertebrate relationships, as well as the considerable amount known about the molecular evolution of this gene, in order to identify important factors contributing to the spurious reconstructions. Simulation studies using parametric bootstrapping indicate that it is unlikely that the spurious nodes in the parsimony analyses are due to long branches or other topological effects. Rather, they appear to be due to base compositional bias at third positions, codon bias, and convergent evolution at nucleotide positions encoding the hydrophobic residues isoleucine, leucine, and valine. LogDet distance methods, as well as maximum-likelihood methods which allow for nonstationary changes in base composition, reduce but do not entirely eliminate support for the spurious resolutions. Inclusion of five additional rhodopsin sequences in the phylogenetic analyses largely corrected one of the spurious reconstructions while leaving the other unaffected. The additional sequences not only were more proximal to the corrected node, but were also found to have intermediate levels of base composition and codon bias as compared with neighboring sequences on the tree. This study shows that the spurious reconstructions can be corrected either by excluding third positions, as well as those encoding the amino acids Ile, Val, and Leu (which may not be ideal, as these sites can contain useful phylogenetic signal for other parts of the tree), or by the addition of sequences that reduce problems associated with convergent evolution.

  3. Phylogenetic relationships among populations of Pristurus rupestris Blanford,1874 (Sauria: Sphaerodactylidae) in southern Iran

    OpenAIRE

    YOUSOFI, SUGOL; POUYANI, ESKANDAR RASTEGAR; HOJATI, VIDA

    2015-01-01

    We examined intraspecific relationships of the subspecies Pristurus rupestris iranicus from the northern Persian Gulf area (Hormozgan, Bushehr, and Sistan and Baluchestan provinces). Phylogenetic relationships among these samples were estimated based on the mitochondrial cytochrome b gene. We used three methods of phylogenetic tree reconstruction (maximum likelihood, maximum parsimony, and Bayesian inference). The sampled populations were divided into 5 clades but exhibit little genetic diver...

  4. Inferring Phylogenetic Networks from Gene Order Data

    Directory of Open Access Journals (Sweden)

    Alexey Anatolievich Morozov

    2013-01-01

    Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.

  5. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  6. Evolutionary patterns of range size, abundance and species richness in Amazonian angiosperm trees

    Directory of Open Access Journals (Sweden)

    Kyle Dexter

    2016-09-01

    Full Text Available Amazonian tree species vary enormously in their total abundance and range size, while Amazonian tree genera vary greatly in species richness. The drivers of this variation are not well understood. Here, we construct a phylogenetic hypothesis that represents half of Amazonian tree genera in order to contribute to explaining the variation. We find several clear, broad-scale patterns. Firstly, there is significant phylogenetic signal for all three characteristics; closely related genera tend to have similar numbers of species and similar mean range size and abundance. Additionally, the species richness of genera shows a significant, negative relationship with the mean range size and abundance of their constituent species. Our results suggest that phylogenetically correlated intrinsic factors, namely traits of the genera themselves, shape among lineage variation in range size, abundance and species richness. We postulate that tree stature may be one particularly relevant trait. However, other traits may also be relevant, and our study reinforces the need for ambitious compilations of trait data for Amazonian trees. In the meantime, our study shows how large-scale phylogenies can help to elucidate, and contribute to explaining, macroecological and macroevolutionary patterns in hyperdiverse, yet poorly understood regions like the Amazon Basin.

  7. Molecular phylogenetics and historical biogeography of Rhinolophus bats.

    Science.gov (United States)

    Stoffberg, Samantha; Jacobs, David S; Mackie, Iain J; Matthee, Conrad A

    2010-01-01

    The phylogenetic relationships within the horseshoe bats (genus Rhinolophus) are poorly resolved, particularly at deeper levels within the tree. We present a better-resolved phylogenetic hypothesis for 30 rhinolophid species based on parsimony and Bayesian analyses of the mitochondrial cytochrome b gene and three nuclear introns (TG, THY and PRKC1). Strong support was found for the existence of two geographic clades within the monophyletic Rhinolophidae: an African group and an Oriental assemblage. The relaxed Bayesian clock method indicated that the two rhinolophid clades diverged approximately 35 million years ago and results from Dispersal Vicariance (DIVA) analysis suggest that the horseshoe bats arose in Asia and subsequently dispersed into Europe and Africa.

  8. Visualizing Biological Data in Museums: Visitor Learning with an Interactive Tree of Life Exhibit

    Science.gov (United States)

    Horn, Michael S.; Phillips, Brenda C.; Evans, Evelyn Margaret; Block, Florian; Diamond, Judy; Shen, Chia

    2016-01-01

    In this study, we investigate museum visitor learning and engagement at an interactive visualization of an evolutionary tree of life consisting of over 70,000 species. The study was conducted at two natural history museums where visitors collaboratively explored the tree of life using direct touch gestures on a multi-touch tabletop display. In the…

  9. Phylogenetic structure in tropical hummingbird communities

    DEFF Research Database (Denmark)

    Graham, Catherine H; Parra, Juan L; Rahbek, Carsten

    2009-01-01

    How biotic interactions, current and historical environment, and biogeographic barriers determine community structure is a fundamental question in ecology and evolution, especially in diverse tropical regions. To evaluate patterns of local and regional diversity, we quantified the phylogenetic...... composition of 189 hummingbird communities in Ecuador. We assessed how species and phylogenetic composition changed along environmental gradients and across biogeographic barriers. We show that humid, low-elevation communities are phylogenetically overdispersed (coexistence of distant relatives), a pattern...... that is consistent with the idea that competition influences the local composition of hummingbirds. At higher elevations communities are phylogenetically clustered (coexistence of close relatives), consistent with the expectation of environmental filtering, which may result from the challenge of sustaining...

  10. Assessment of phylogenetic sensitivity for reconstructing HIV-1 epidemiological relationships.

    Science.gov (United States)

    Beloukas, Apostolos; Magiorkinis, Emmanouil; Magiorkinis, Gkikas; Zavitsanou, Asimina; Karamitros, Timokratis; Hatzakis, Angelos; Paraskevis, Dimitrios

    2012-06-01

    Phylogenetic analysis has been extensively used as a tool for the reconstruction of epidemiological relations for research or for forensic purposes. It was our objective to assess the sensitivity of different phylogenetic methods and various phylogenetic programs to reconstruct epidemiological links among HIV-1 infected patients that is the probability to reveal a true transmission relationship. Multiple datasets (90) were prepared consisting of HIV-1 sequences in protease (PR) and partial reverse transcriptase (RT) sampled from patients with documented epidemiological relationship (target population), and from unrelated individuals (control population) belonging to the same HIV-1 subtype as the target population. Each dataset varied regarding the number, the geographic origin and the transmission risk groups of the sequences among the control population. Phylogenetic trees were inferred by neighbor-joining (NJ), maximum likelihood heuristics (hML) and Bayesian methods. All clusters of sequences belonging to the target population were correctly reconstructed by NJ and Bayesian methods receiving high bootstrap and posterior probability (PP) support, respectively. On the other hand, TreePuzzle failed to reconstruct or provide significant support for several clusters; high puzzling step support was associated with the inclusion of control sequences from the same geographic area as the target population. In contrary, all clusters were correctly reconstructed by hML as implemented in PhyML 3.0 receiving high bootstrap support. We report that under the conditions of our study, hML using PhyML, NJ and Bayesian methods were the most sensitive for the reconstruction of epidemiological links mostly from sexually infected individuals. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European.

    Science.gov (United States)

    Forster, Peter; Toth, Alfred

    2003-07-22

    Indo-European is the largest and best-documented language family in the world, yet the reconstruction of the Indo-European tree, first proposed in 1863, has remained controversial. Complications may include ascertainment bias when choosing the linguistic data, and disregard for the wave model of 1872 when attempting to reconstruct the tree. Essentially analogous problems were solved in evolutionary genetics by DNA sequencing and phylogenetic network methods, respectively. We now adapt these tools to linguistics, and analyze Indo-European language data, focusing on Celtic and in particular on the ancient Celtic language of Gaul (modern France), by using bilingual Gaulish-Latin inscriptions. Our phylogenetic network reveals an early split of Celtic within Indo-European. Interestingly, the next branching event separates Gaulish (Continental Celtic) from the British (Insular Celtic) languages, with Insular Celtic subsequently splitting into Brythonic (Welsh, Breton) and Goidelic (Irish and Scottish Gaelic). Taken together, the network thus suggests that the Celtic language arrived in the British Isles as a single wave (and then differentiated locally), rather than in the traditional two-wave scenario ("P-Celtic" to Britain and "Q-Celtic" to Ireland). The phylogenetic network furthermore permits the estimation of time in analogy to genetics, and we obtain tentative dates for Indo-European at 8100 BC +/- 1,900 years, and for the arrival of Celtic in Britain at 3200 BC +/- 1,500 years. The phylogenetic method is easily executed by hand and promises to be an informative approach for many problems in historical linguistics.

  12. Quartet-based methods to reconstruct phylogenetic networks.

    Science.gov (United States)

    Yang, Jialiang; Grünewald, Stefan; Xu, Yifei; Wan, Xiu-Feng

    2014-02-20

    Phylogenetic networks are employed to visualize evolutionary relationships among a group of nucleotide sequences, genes or species when reticulate events like hybridization, recombination, reassortant and horizontal gene transfer are believed to be involved. In comparison to traditional distance-based methods, quartet-based methods consider more information in the reconstruction process and thus have the potential to be more accurate. We introduce QuartetSuite, which includes a set of new quartet-based methods, namely QuartetS, QuartetA, and QuartetM, to reconstruct phylogenetic networks from nucleotide sequences. We tested their performances and compared them with other popular methods on two simulated nucleotide sequence data sets: one generated from a tree topology and the other from a complicated evolutionary history containing three reticulate events. We further validated these methods to two real data sets: a bacterial data set consisting of seven concatenated genes of 36 bacterial species and an influenza data set related to recently emerging H7N9 low pathogenic avian influenza viruses in China. QuartetS, QuartetA, and QuartetM have the potential to accurately reconstruct evolutionary scenarios from simple branching trees to complicated networks containing many reticulate events. These methods could provide insights into the understanding of complicated biological evolutionary processes such as bacterial taxonomy and reassortant of influenza viruses.

  13. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Science.gov (United States)

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for

  14. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Directory of Open Access Journals (Sweden)

    Dawyndt Peter

    2010-01-01

    Full Text Available Abstract Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the

  15. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    Science.gov (United States)

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial

  16. Phylogenetic Patterns of Extinction Risk in the Eastern Arc Ecosystems, an African Biodiversity Hotspot

    OpenAIRE

    Yessoufou, Kowiyou; Daru, Barnabas H.; Davies, T. Jonathan

    2012-01-01

    There is an urgent need to reduce drastically the rate at which biodiversity is declining worldwide. Phylogenetic methods are increasingly being recognised as providing a useful framework for predicting future losses, and guiding efforts for pre-emptive conservation actions. In this study, we used a reconstructed phylogenetic tree of angiosperm species of the Eastern Arc Mountains - an important African biodiversity hotspot - and described the distribution of extinction risk across taxonomic ...

  17. Comments on the gonotyl of Proctocaecum macroclemidis (Tkach and Snyder, 2003) n. comb. (Digenea: Acanthostomidae: Acanthostominae), with a key to the genera of acanthostominae and new phylogenetic tree for Proctocaecum Baugh, 1957.

    Science.gov (United States)

    Brooks, Daniel R

    2004-06-01

    The species recently described as Acanthostomum macroclemidis possesses the gonotyl in the form of a solid muscular pad uniquely diagnostic for species of Proctocaecum and is accordingly transferred to that genus. An artificial key to the 5 acanthostomine genera, as well as an updated phylogenetic hypothesis for the 10 known species of Proctocaecum, based on 11 characters and including 2 species described since the last phylogenetic analysis, are presented. The single most parsimonious phylogenetic tree with a consistency index of 87.5% suggests that Proctocaecum originated in Africa and spread to North America and South America before the breakup of Pangaea. As a result, the 2 North American and 1 South American species are most closely related to different African members of the genus. African and Indo-Pacific species inhabit crocodylids; hence, the occurrence of North American species in alligatorids and chelonians and a South American species in alligatorids are the result of host switches.

  18. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  19. CONSEL: for assessing the confidence of phylogenetic tree selection.

    Science.gov (United States)

    Shimodaira, H; Hasegawa, M

    2001-12-01

    CONSEL is a program to assess the confidence of the tree selection by giving the p-values for the trees. The main thrust of the program is to calculate the p-value of the Approximately Unbiased (AU) test using the multi-scale bootstrap technique. This p-value is less biased than the other conventional p-values such as the Bootstrap Probability (BP), the Kishino-Hasegawa (KH) test, the Shimodaira-Hasegawa (SH) test, and the Weighted Shimodaira-Hasegawa (WSH) test. CONSEL calculates all these p-values from the output of the phylogeny program packages such as Molphy, PAML, and PAUP*. Furthermore, CONSEL is applicable to a wide class of problems where the BPs are available. The programs are written in C language. The source code for Unix and the executable binary for DOS are found at http://www.ism.ac.jp/~shimo/ shimo@ism.ac.jp

  20. Let's jump in: A phylogenetic study of the great basin springfishes and poolfishes, Crenichthys and Empetrichthys (Cyprinodontiformes: Goodeidae.

    Directory of Open Access Journals (Sweden)

    D Cooper Campbell

    Full Text Available North America's Great Basin has long been of interest to biologists due to its high level of organismal endemicity throughout its endorheic watersheds. One example of such a group is the subfamily Empetricthyinae. In this paper, we analyzed the relationships of the Empetrichtyinae and assessed the validity of the subspecies designations given by Williams and Wilde within the group using concatenated phylogenetic tree estimation and species tree estimation. Samples from 19 populations were included covering the entire distribution of the three extant species of Empetricthyinae-Crenichthys nevadae, Crenichthys baileyi and Empetricthys latos. Three nuclear introns (S8 intron 4, S7 intron 1, and P0 intron 1 and one mitochondrial gene (Cytb were sequenced for phylogenetic analysis. Using these sequences, we generated two separate hypotheses of the evolutionary relationships of Empetrichtyinae- one based on the mitochondrial data and one based on the nuclear data using Bayesian phylogenetics. Haplotype networks were also generated to look at the relationships of the populations within Empetrichthyinae. After comparing the two phylogenetic hypotheses, species trees were generated using *BEAST with the nuclear data to further test the validity of the subspecies within Empetrichthyinae. The mitochondrial analyses supported four lineages within C. baileyi and 2 within C. nevadae. The concatenated nuclear tree was more conserved, supporting one clade and an unresolved polytomy in both species. The species tree analysis supported the presence of two species within both C. baileyi and C. nevadae. Based on the results of these analyses, the subspecies designations of Williams and Wilde are not valid, rather a conservative approach suggests there are two species within C. nevadae and two species within C. baileyi. No structure was found for E. latos or the populations of Empetricthyinae. This study represents one of many demonstrating the invalidity of

  1. Tanglegrams: A Reduction Tool for Mathematical Phylogenetics.

    Science.gov (United States)

    Matsen, Frederick A; Billey, Sara C; Kas, Arnold; Konvalinka, Matjaz

    2018-01-01

    Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairs of leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees "factor" through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems.

  2. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

    Science.gov (United States)

    Allman, Elizabeth S; Degnan, James H; Rhodes, John A

    2011-06-01

    Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

  3. Detection and phylogenetic analysis of bacteriophage WO in spiders (Araneae).

    Science.gov (United States)

    Yan, Qian; Qiao, Huping; Gao, Jin; Yun, Yueli; Liu, Fengxiang; Peng, Yu

    2015-11-01

    Phage WO is a bacteriophage found in Wolbachia. Herein, we represent the first phylogenetic study of WOs that infect spiders (Araneae). Seven species of spiders (Araneus alternidens, Nephila clavata, Hylyphantes graminicola, Prosoponoides sinensis, Pholcus crypticolens, Coleosoma octomaculatum, and Nurscia albofasciata) from six families were infected by Wolbachia and WO, followed by comprehensive sequence analysis. Interestingly, WO could be only detected Wolbachia-infected spiders. The relative infection rates of those seven species of spiders were 75, 100, 88.9, 100, 62.5, 72.7, and 100 %, respectively. Our results indicated that both Wolbachia and WO were found in three different body parts of N. clavata, and WO could be passed to the next generation of H. graminicola by vertical transmission. There were three different sequences for WO infected in A. alternidens and two different WO sequences from C. octomaculatum. Only one sequence of WO was found for the other five species of spiders. The discovered sequence of WO ranged from 239 to 311 bp. Phylogenetic tree was generated using maximum likelihood (ML) based on the orf7 gene sequences. According to the phylogenetic tree, WOs in N. clavata and H. graminicola were clustered in the same group. WOs from A. alternidens (WAlt1) and C. octomaculatum (WOct2) were closely related to another clade, whereas WO in P. sinensis was classified as a sole cluster.

  4. Teaching Tree-Thinking to Undergraduate Biology Students.

    Science.gov (United States)

    Meisel, Richard P

    2010-07-27

    Evolution is the unifying principle of all biology, and understanding how evolutionary relationships are represented is critical for a complete understanding of evolution. Phylogenetic trees are the most conventional tool for displaying evolutionary relationships, and "tree-thinking" has been coined as a term to describe the ability to conceptualize evolutionary relationships. Students often lack tree-thinking skills, and developing those skills should be a priority of biology curricula. Many common student misconceptions have been described, and a successful instructor needs a suite of tools for correcting those misconceptions. I review the literature on teaching tree-thinking to undergraduate students and suggest how this material can be presented within an inquiry-based framework.

  5. Moose?tree interactions: rebrowsing is common across tree species

    OpenAIRE

    Mathisen, Karen Marie; Milner, Jos M.; Skarpe, Christina

    2017-01-01

    Background Plant strategies to resist herbivory include tolerance and avoidance. Tolerance strategies, such as rapid regrowth which increases the palatability of new shoots, can lead to positive feedback loops between plants and herbivores. An example of such a positive feedback occurs when moose (Alces alces) browse trees in boreal forests. We described the degree of change in tree morphology that accumulated over time in response to repeated browsing by moose, using an index of accumulated ...

  6. Phylogenetic reassessment of Specklinia and its allied genera in the pleurothallidinae (Orchidaceae)

    NARCIS (Netherlands)

    Karremans, Adam P.; Albertazzi, Federico J.; Bakker, Freek T.; Bogarín, Diego; Eurlings, M.C.M.; Pridgeon, Alec; Pupulin, Franco; Gravendeel, Barbara

    2016-01-01

    The phylogenetic relationships within Specklinia (Pleurothallidinae; Orchidaceae) and related genera are re-evaluated using Bayesian analyses of nrITS and chloroplast matK sequence data of a wide sampling of species. Specklinia is found paraphyletic in the DNA based trees, with species

  7. Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences.

    Science.gov (United States)

    Chen, Meng-Yun; Liang, Dan; Zhang, Peng

    2017-08-01

    The interordinal relationships of Laurasiatherian mammals are currently one of the most controversial questions in mammalian phylogenetics. Previous studies mainly relied on coding sequences (CDS) and seldom used noncoding sequences. Here, by data mining public genome data, we compiled an intron data set of 3,638 genes (all introns from a protein-coding gene are considered as a gene) (19,055,073 bp) and a CDS data set of 10,259 genes (20,994,285 bp), covering all major lineages of Laurasiatheria (except Pholidota). We found that the intron data contained stronger and more congruent phylogenetic signals than the CDS data. In agreement with this observation, concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies, whereas the CDS data set produced weakly supported and incongruent results. Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup, but the CDS data produced unstable results under the same conditions. Interestingly, gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical, suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data. Our final result of Laurasiatheria phylogeny is (Eulipotyphla,((Chiroptera, Perissodactyla),(Carnivora, Cetartiodactyla))), favoring a close relationship between Chiroptera and Perissodactyla. Our study 1) provides a well-supported phylogenetic framework for Laurasiatheria, representing a step towards ending the long-standing "hard" polytomy and 2) argues that intron within genome data is a promising data resource for resolving rapid radiation events across the tree of life. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  8. Conservation Action Based on Threatened Species Capture Taxonomic and Phylogenetic Richness in Breeding and Wintering Populations of Central Asian Birds

    Science.gov (United States)

    Schweizer, Manuel; Ayé, Raffael; Kashkarov, Roman; Roth, Tobias

    2014-01-01

    Although phylogenetic diversity has been suggested to be relevant from a conservation point of view, its role is still limited in applied nature conservation. Recently, the practice of investing conservation resources based on threatened species was identified as a reason for the slow integration of phylogenetic diversity in nature conservation planning. One of the main arguments is based on the observation that threatened species are not evenly distributed over the phylogenetic tree. However this argument seems to dismiss the fact that conservation action is a spatially explicit process, and even if threatened species are not evenly distributed over the phylogenetic tree, the occurrence of threatened species could still indicate areas with above average phylogenetic diversity and consequently could protect phylogenetic diversity. Here we aim to study the selection of important bird areas in Central Asia, which were nominated largely based on the presence of threatened bird species. We show that although threatened species occurring in Central Asia do not capture phylogenetically more distinct species than expected by chance, the current spatially explicit conservation approach of selecting important bird areas covers above average taxonomic and phylogenetic diversity of breeding and wintering birds. We conclude that the spatially explicit processes of conservation actions need to be considered in the current discussion of whether new prioritization methods are needed to complement conservation action based on threatened species. PMID:25337861

  9. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  10. The interaction between freezing tolerance and phenology in temperate deciduous trees

    Directory of Open Access Journals (Sweden)

    Yann eVitasse

    2014-10-01

    Full Text Available Temperate climates are defined by a distinct temperature seasonality with large and often unpredictable weather during any of the four seasons. To thrive in such climates, trees have to withstand a cold winter and the stochastic occurrence of freeze events during any time of the year. The physiological mechanisms trees adopt to escape, avoid and tolerate freezing temperatures include a cold acclimation in autumn, a dormancy period during winter (leafless in deciduous trees, and the maintenance of a certain freezing tolerance during dehardening in early spring. The change from one phase to the next is mediated by complex interactions between temperature and photoperiod. This review aims at providing an overview of the interplay between phenology of leaves and species-specific freezing resistance. First, we address the long-term evolutionary responses that enabled temperate trees to tolerate certain low temperature extremes. We provide evidence that short term acclimation of freezing resistance plays a crucial role both in dormant and active buds, including re-acclimation to cold conditions following warm spells. This ability declines to almost zero during leaf emergence. Second, we show that the risk that native temperate trees encounter freeze injuries is low and is confined to spring and underline that this risk might be altered by climate warming depending on species-specific phenological responses to environmental cues.

  11. An Assessment of Phylogenetic Tools for Analyzing the Interplay Between Interspecific Interactions and Phenotypic Evolution.

    Science.gov (United States)

    Drury, J P; Grether, G F; Garland, T; Morlon, H

    2018-05-01

    Much ecological and evolutionary theory predicts that interspecific interactions often drive phenotypic diversification and that species phenotypes in turn influence species interactions. Several phylogenetic comparative methods have been developed to assess the importance of such processes in nature; however, the statistical properties of these methods have gone largely untested. Focusing mainly on scenarios of competition between closely-related species, we assess the performance of available comparative approaches for analyzing the interplay between interspecific interactions and species phenotypes. We find that many currently used statistical methods often fail to detect the impact of interspecific interactions on trait evolution, that sister-taxa analyses are particularly unreliable in general, and that recently developed process-based models have more satisfactory statistical properties. Methods for detecting predictors of species interactions are generally more reliable than methods for detecting character displacement. In weighing the strengths and weaknesses of different approaches, we hope to provide a clear guide for empiricists testing hypotheses about the reciprocal effect of interspecific interactions and species phenotypes and to inspire further development of process-based models.

  12. Building Very Large Neighbour-Joining Trees

    DEFF Research Database (Denmark)

    Simonsen, Martin; Mailund, Thomas; Pedersen, Christian Nørgaard Storm

    2010-01-01

    , and the NJ method in general, becomes a problem when inferring phylogenies with 10000+ taxa. In this paper we present two extentions of RapidNJ which reduce memory requirements and enable RapidNJ to infer very large phylogenetic trees efficiently. We also present an improved search heuristic for Rapid...

  13. Rooted triple consensus and anomalous gene trees

    Directory of Open Access Journals (Sweden)

    Schmidt Heiko A

    2008-04-01

    Full Text Available Abstract Background Anomalous gene trees (AGTs are gene trees with a topology different from a species tree that are more probable to observe than congruent gene trees. In this paper we propose a rooted triple approach to finding the correct species tree in the presence of AGTs. Results Based on simulated data we show that our method outperforms the extended majority rule consensus strategy, while still resolving the species tree. Applying both methods to a metazoan data set of 216 genes, we tested whether AGTs substantially interfere with the reconstruction of the metazoan phylogeny. Conclusion Evidence of AGTs was not found in this data set, suggesting that erroneously reconstructed gene trees are the most significant challenge in the reconstruction of phylogenetic relationships among species with current data. The new method does however rule out the erroneous reconstruction of deep or poorly resolved splits in the presence of lineage sorting.

  14. Linear programming model to construct phylogenetic network for 16S rRNA sequences of photosynthetic organisms and influenza viruses.

    Science.gov (United States)

    Mathur, Rinku; Adlakha, Neeru

    2014-06-01

    Phylogenetic trees give the information about the vertical relationships of ancestors and descendants but phylogenetic networks are used to visualize the horizontal relationships among the different organisms. In order to predict reticulate events there is a need to construct phylogenetic networks. Here, a Linear Programming (LP) model has been developed for the construction of phylogenetic network. The model is validated by using data sets of chloroplast of 16S rRNA sequences of photosynthetic organisms and Influenza A/H5N1 viruses. Results obtained are in agreement with those obtained by earlier researchers.

  15. Interactive system design using the complementarity of axiomatic design and fault tree analysis

    International Nuclear Information System (INIS)

    Heo, Gyun Young; Do, Sung Hee; Lee, Tae Sik

    2007-01-01

    To efficiently design safety-critical systems such as nuclear power plants, with requirement of high reliability, methodologies allowing for rigorous interactions between the synthesis and analysis processes have been proposed. This paper attempts to develop a reliability-centered design framework through an interactive process between Axiomatic Design (AD) and Fault Tree Analysis (FTA). Integrating AD and FTA into a single framework appears to be a viable solution, as they compliment each other with their unique advantages. AD provides a systematic synthesis tool while FTA is commonly used as a safety analysis tool. These methodologies build a design process that is less subjective, and they enable designers to develop insights that lead to solutions with improved reliability. Due to the nature of the two methodologies, the information involved in each process is complementary: a success tree versus a fault tree. Thus, at each step a system using AD is synthesized, and its reliability is then quantified using the FT derived from the AD synthesis process. The converted FT provides an opportunity to examine the completeness of the outcome from the synthesis process. This study presents an example of the design of a Containment Heat Removal System (CHRS). A case study illustrates the process of designing the CHRS with an interactive design framework focusing on the conversion of the AD process to FTA

  16. Using Pipe Cleaners to Bring the Tree of Life to Life

    Science.gov (United States)

    Halverson, Kristy L.

    2010-01-01

    Phylogenetic trees, such as the "Tree of Life," are commonly found in biology textbooks and are often used in teaching. Because students often struggle to understand these diagrams, I developed a simple, inexpensive classroom model. Made of pipe cleaners, it is easily manipulated to rotate branches, compare topologies, map complete lineages,…

  17. DLRS: gene tree evolution in light of a species tree.

    Science.gov (United States)

    Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens

    2012-11-15

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).

  18. Shapes of tree representations of spin-glass landscapes

    International Nuclear Information System (INIS)

    Hordijk, Wim; Fontanari, Jose F; Stadler, Peter F

    2003-01-01

    Much of the information about the multi-valley structure of disordered spin systems can be convened in a simple tree structure - a barrier tree - the leaves and internal nodes of which represent, respectively, the local minima and the lowest energy saddles connecting those minima. Here we apply several statistics used in the study of phylogenetic trees to barrier trees that result from the energy landscapes of p-spin models. These statistics give information about the shape of these barrier trees, in particular about balance and symmetry. We then ask if they can be used to classify different types of landscapes, compare them with results obtained from random trees, and investigate the structure of subtrees of the barrier trees. We conclude that at least one of the used statistics is capable of distinguishing different types of landscapes, that the barrier trees from p-spin energy landscapes are quite different from random trees, and that subtrees of barrier trees do not reflect the overall tree structure, but their structure is correlated with their 'depth' in the tree

  19. A new support measure to quantify the impact of local optima in phylogenetic analyses.

    KAUST Repository

    Brammer, Grant; Sul, Seung-Jin; Williams, Tiffani L

    2011-01-01

    Phylogentic analyses are often incorrectly assumed to have stabilized to a single optimum. However, a set of trees from a phylogenetic analysis may contain multiple distinct local optima with each optimum providing different levels of support

  20. Random tree growth by vertex splitting

    International Nuclear Information System (INIS)

    David, F; Dukes, W M B; Jonsson, T; Stefánsson, S Ö

    2009-01-01

    We study a model of growing planar tree graphs where in each time step we separate the tree into two components by splitting a vertex and then connect the two pieces by inserting a new link between the daughter vertices. This model generalizes the preferential attachment model and Ford's α-model for phylogenetic trees. We develop a mean field theory for the vertex degree distribution, prove that the mean field theory is exact in some special cases and check that it agrees with numerical simulations in general. We calculate various correlation functions and show that the intrinsic Hausdorff dimension can vary from 1 to ∞, depending on the parameters of the model