WorldWideScience

Sample records for phylogenetic motif models

  1. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  2. Molecular Detection, Phylogenetic Analysis, and Identification of Transcription Motifs in Feline Leukemia Virus from Naturally Infected Cats in Malaysia

    Directory of Open Access Journals (Sweden)

    Faruku Bande

    2014-01-01

    Full Text Available A nested PCR assay was used to determine the viral RNA and proviral DNA status of naturally infected cats. Selected samples that were FeLV-positive by PCR were subjected to sequencing, phylogenetic analysis, and motifs search. Of the 39 samples that were positive for FeLV p27 antigen, 87.2% (34/39 were confirmed positive with nested PCR. FeLV proviral DNA was detected in 38 (97.3% of p27-antigen negative samples. Malaysian FeLV isolates are found to be highly similar with a homology of 91% to 100%. Phylogenetic analysis revealed that Malaysian FeLV isolates divided into two clusters, with a majority (86.2% sharing similarity with FeLV-K01803 and fewer isolates (13.8% with FeLV-GM1 strain. Different enhancer motifs including NF-GMa, Krox-20/WT1I-del2, BAF1, AP-2, TBP, TFIIF-beta, TRF, and TFIID are found to occur either in single, duplicate, triplicate, or sets of 5 in different positions within the U3-LTR-gag region. The present result confirms the occurrence of FeLV viral RNA and provirus DNA in naturally infected cats. Malaysian FeLV isolates are highly similar, and a majority of them are closely related to a UK isolate. This study provides the first molecular based information on FeLV in Malaysia. Additionally, different enhancer motifs likely associated with FeLV related pathogenesis have been identified.

  3. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    Science.gov (United States)

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  4. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    Science.gov (United States)

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Protein clustering and RNA phylogenetic reconstruction of the influenza A [corrected] virus NS1 protein allow an update in classification and identification of motif conservation.

    Science.gov (United States)

    Sevilla-Reyes, Edgar E; Chavaro-Pérez, David A; Piten-Isidro, Elvira; Gutiérrez-González, Luis H; Santos-Mendoza, Teresa

    2013-01-01

    The non-structural protein 1 (NS1) of influenza A virus (IAV), coded by its third most diverse gene, interacts with multiple molecules within infected cells. NS1 is involved in host immune response regulation and is a potential contributor to the virus host range. Early phylogenetic analyses using 50 sequences led to the classification of NS1 gene variants into groups (alleles) A and B. We reanalyzed NS1 diversity using 14,716 complete NS IAV sequences, downloaded from public databases, without host bias. Removal of sequence redundancy and further structured clustering at 96.8% amino acid similarity produced 415 clusters that enhanced our capability to detect distinct subgroups and lineages, which were assigned a numerical nomenclature. Maximum likelihood phylogenetic reconstruction using RNA sequences indicated the previously identified deep branching separating group A from group B, with five distinct subgroups within A as well as two and five lineages within the A4 and A5 subgroups, respectively. Our classification model proposes that sequence patterns in thirteen amino acid positions are sufficient to fit >99.9% of all currently available NS1 sequences into the A subgroups/lineages or the B group. This classification reduces host and virus bias through the prioritization of NS1 RNA phylogenetics over host or virus phenetics. We found significant sequence conservation within the subgroups and lineages with characteristic patterns of functional motifs, such as the differential binding of CPSF30 and crk/crkL or the availability of a C-terminal PDZ-binding motif. To understand selection pressures and evolution acting on NS1, it is necessary to organize the available data. This updated classification may help to clarify and organize the study of NS1 interactions and pathogenic differences and allow the drawing of further functional inferences on sequences in each group, subgroup and lineage rather than on a strain-by-strain basis.

  6. Phyloproteomic Analysis of 11780 Six-Residue-Long Motifs Occurrences

    Directory of Open Access Journals (Sweden)

    O. V. Galzitskaya

    2015-01-01

    Full Text Available How is it possible to find good traits for phylogenetic reconstructions? Here, we present a new phyloproteomic criterion that is an occurrence of simple motifs which can be imprints of evolution history. We studied the occurrences of 11780 six-residue-long motifs consisting of two randomly located amino acids in 97 eukaryotic and 25 bacterial proteomes. For all eukaryotic proteomes, with the exception of the Amoebozoa, Stramenopiles, and Diplomonadida kingdoms, the number of proteins containing the motifs from the first group (one of the two amino acids occurs once at the terminal position made about 20%; in the case of motifs from the second (one of two amino acids occurs one time within the pattern and third (the two amino acids occur randomly groups, 30% and 50%, respectively. For bacterial proteomes, this relationship was 10%, 27%, and 63%, respectively. The matrices of correlation coefficients between numbers of proteins where a motif from the set of 11780 motifs appears at least once in 9 kingdoms and 5 phyla of bacteria were calculated. Among the correlation coefficients for eukaryotic proteomes, the correlation between the animal and fungi kingdoms (0.62 is higher than between fungi and plants (0.54. Our study provides support that animals and fungi are sibling kingdoms. Comparison of the frequencies of six-residue-long motifs in different proteomes allows obtaining phylogenetic relationships based on similarities between these frequencies: the Diplomonadida kingdoms are more close to Bacteria than to Eukaryota; Stramenopiles and Amoebozoa are more close to each other than to other kingdoms of Eukaryota.

  7. BayesMD: flexible biological modeling for motif discovery

    DEFF Research Database (Denmark)

    Tang, Man-Hung Eric; Krogh, Anders; Winther, Ole

    2008-01-01

    We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on trans......We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained...

  8. Improved Maximum Parsimony Models for Phylogenetic Networks.

    Science.gov (United States)

    Van Iersel, Leo; Jones, Mark; Scornavacca, Celine

    2018-05-01

    Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the "hardwired" and "softwired" parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.

  9. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    Directory of Open Access Journals (Sweden)

    Villemereuil Pierre de

    2012-06-01

    Full Text Available Abstract Background Uncertainty in comparative analyses can come from at least two sources: a phylogenetic uncertainty in the tree topology or branch lengths, and b uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow and inflated significance in hypothesis testing (e.g. p-values will be too small. Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible

  10. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    Science.gov (United States)

    2012-01-01

    Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for

  11. I-Ad-binding peptides derived from unrelated protein antigens share a common structural motif

    DEFF Research Database (Denmark)

    Sette, A; Buus, S; Colon, S

    1988-01-01

    on the I-Ad binding of the immunogenic peptide OVA 323-339. The results obtained demonstrated the very permissive nature of Ag-Ia interaction. We also showed that unrelated peptides that are good I-Ad binders share a common structural motif and speculated that recognition of such motifs could represent...... that I-Ad molecules recognize a large library of Ag by virtue of common structural motifs present in peptides derived from phylogenetically unrelated proteins....

  12. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    Science.gov (United States)

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  13. The valine and lysine residues in the conserved FxVTxK motif are important for the function of phylogenetically distant plant cellulose synthases

    Energy Technology Data Exchange (ETDEWEB)

    Slabaugh, Erin; Scavuzzo-Duggan, Tess; Chaves, Arielle; Wilson, Liza; Wilson, Carmen; Davis, Jonathan K.; Cosgrove, Daniel J.; Anderson, Charles T.; Roberts, Alison W.; Haigler, Candace H.

    2015-12-08

    Cellulose synthases (CESAs) synthesize the β-1,4-glucan chains that coalesce to form cellulose microfibrils in plant cell walls. In addition to a large cytosolic (catalytic) domain, CESAs have eight predicted transmembrane helices (TMHs). However, analogous to the structure of BcsA, a bacterial CESA, predicted TMH5 in CESA may instead be an interfacial helix. This would place the conserved FxVTxK motif in the plant cell cytosol where it could function as a substrate-gating loop as occurs in BcsA. To define the functional importance of the CESA region containing FxVTxK, we tested five parallel mutations in Arabidopsis thaliana CESA1 and Physcomitrella patens CESA5 in complementation assays of the relevant cesa mutants. In both organisms, the substitution of the valine or lysine residues in FxVTxK severely affected CESA function. In Arabidopsis roots, both changes were correlated with lower cellulose anisotropy, as revealed by Pontamine Fast Scarlet. Analysis of hypocotyl inner cell wall layers by atomic force microscopy showed that two altered versions of Atcesa1 could rescue cell wall phenotypes observed in the mutant background line. Overall, the data show that the FxVTxK motif is functionally important in two phylogenetically distant plant CESAs. The results show that Physcomitrella provides an efficient model for assessing the effects of engineered CESA mutations affecting primary cell wall synthesis and that diverse testing systems can lead to nuanced insights into CESA structure–function relationships. Although CESA membrane topology needs to be experimentally determined, the results support the possibility that the FxVTxK region functions similarly in CESA and BcsA.

  14. Maximum parsimony, substitution model, and probability phylogenetic trees.

    Science.gov (United States)

    Weng, J F; Thomas, D A; Mareels, I

    2011-01-01

    The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.

  15. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    PWM (position weight matrix) motif model.

  16. Different relationships between temporal phylogenetic turnover and phylogenetic similarity and in two forests were detected by a new null model.

    Science.gov (United States)

    Huang, Jian-Xiong; Zhang, Jian; Shen, Yong; Lian, Ju-yu; Cao, Hong-lin; Ye, Wan-hui; Wu, Lin-fang; Bin, Yue

    2014-01-01

    Ecologists have been monitoring community dynamics with the purpose of understanding the rates and causes of community change. However, there is a lack of monitoring of community dynamics from the perspective of phylogeny. We attempted to understand temporal phylogenetic turnover in a 50 ha tropical forest (Barro Colorado Island, BCI) and a 20 ha subtropical forest (Dinghushan in southern China, DHS). To obtain temporal phylogenetic turnover under random conditions, two null models were used. The first shuffled names of species that are widely used in community phylogenetic analyses. The second simulated demographic processes with careful consideration on the variation in dispersal ability among species and the variations in mortality both among species and among size classes. With the two models, we tested the relationships between temporal phylogenetic turnover and phylogenetic similarity at different spatial scales in the two forests. Results were more consistent with previous findings using the second null model suggesting that the second null model is more appropriate for our purposes. With the second null model, a significantly positive relationship was detected between phylogenetic turnover and phylogenetic similarity in BCI at a 10 m×10 m scale, potentially indicating phylogenetic density dependence. This relationship in DHS was significantly negative at three of five spatial scales. This could indicate abiotic filtering processes for community assembly. Using variation partitioning, we found phylogenetic similarity contributed to variation in temporal phylogenetic turnover in the DHS plot but not in BCI plot. The mechanisms for community assembly in BCI and DHS vary from phylogenetic perspective. Only the second null model detected this difference indicating the importance of choosing a proper null model.

  17. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    Directory of Open Access Journals (Sweden)

    Yin Wang

    2016-01-01

    Full Text Available Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  18. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

    2015-01-01

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  19. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun

    2015-06-11

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  20. Insights into the molecular evolution of the PDZ/LIM family and identification of a novel conserved protein motif.

    Directory of Open Access Journals (Sweden)

    Aartjan J W Te Velthuis

    Full Text Available The PDZ and LIM domain-containing protein family is encoded by a diverse group of genes whose phylogeny has currently not been analyzed. In mammals, ten genes are found that encode both a PDZ- and one or several LIM-domains. These genes are: ALP, RIL, Elfin (CLP36, Mystique, Enigma (LMP-1, Enigma homologue (ENH, ZASP (Cypher, Oracle, LMO7 and the two LIM domain kinases (LIMK1 and LIMK2. As conventional alignment and phylogenetic procedures of full-length sequences fell short of elucidating the evolutionary history of these genes, we started to analyze the PDZ and LIM domain sequences themselves. Using information from most sequenced eukaryotic lineages, our phylogenetic analysis is based on full-length cDNA-, EST-derived- and genomic- PDZ and LIM domain sequences of over 25 species, ranging from yeast to humans. Plant and protozoan homologs were not found. Our phylogenetic analysis identifies a number of domain duplication and rearrangement events, and shows a single convergent event during evolution of the PDZ/LIM family. Further, we describe the separation of the ALP and Enigma subfamilies in lower vertebrates and identify a novel consensus motif, which we call 'ALP-like motif' (AM. This motif is highly-conserved between ALP subfamily proteins of diverse organisms. We used here a combinatorial approach to define the relation of the PDZ and LIM domain encoding genes and to reconstruct their phylogeny. This analysis allowed us to classify the PDZ/LIM family and to suggest a meaningful model for the molecular evolution of the diverse gene architectures found in this multi-domain family.

  1. In Silico Phylogenetic Analysis and Molecular Modelling Study of 2-Haloalkanoic Acid Dehalogenase Enzymes from Bacterial and Fungal Origin

    Directory of Open Access Journals (Sweden)

    Raghunath Satpathy

    2016-01-01

    Full Text Available 2-Haloalkanoic acid dehalogenase enzymes have broad range of applications, starting from bioremediation to chemical synthesis of useful compounds that are widely distributed in fungi and bacteria. In the present study, a total of 81 full-length protein sequences of 2-haloalkanoic acid dehalogenase from bacteria and fungi were retrieved from NCBI database. Sequence analysis such as multiple sequence alignment (MSA, conserved motif identification, computation of amino acid composition, and phylogenetic tree construction were performed on these primary sequences. From MSA analysis, it was observed that the sequences share conserved lysine (K and aspartate (D residues in them. Also, phylogenetic tree indicated a subcluster comprised of both fungal and bacterial species. Due to nonavailability of experimental 3D structure for fungal 2-haloalkanoic acid dehalogenase in the PDB, molecular modelling study was performed for both fungal and bacterial sources of enzymes present in the subcluster. Further structural analysis revealed a common evolutionary topology shared between both fungal and bacterial enzymes. Studies on the buried amino acids showed highly conserved Leu and Ser in the core, despite variation in their amino acid percentage. Additionally, a surface exposed tryptophan was conserved in all of these selected models.

  2. New substitution models for rooting phylogenetic trees.

    Science.gov (United States)

    Williams, Tom A; Heaps, Sarah E; Cherlin, Svetlana; Nye, Tom M W; Boys, Richard J; Embley, T Martin

    2015-09-26

    The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made. © 2015 The Authors.

  3. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    Science.gov (United States)

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  4. A program for verification of phylogenetic network models.

    Science.gov (United States)

    Gunawan, Andreas D M; Lu, Bingxin; Zhang, Louxin

    2016-09-01

    Genetic material is transferred in a non-reproductive manner across species more frequently than commonly thought, particularly in the bacteria kingdom. On one hand, extant genomes are thus more properly considered as a fusion product of both reproductive and non-reproductive genetic transfers. This has motivated researchers to adopt phylogenetic networks to study genome evolution. On the other hand, a gene's evolution is usually tree-like and has been studied for over half a century. Accordingly, the relationships between phylogenetic trees and networks are the basis for the reconstruction and verification of phylogenetic networks. One important problem in verifying a network model is determining whether or not certain existing phylogenetic trees are displayed in a phylogenetic network. This problem is formally called the tree containment problem. It is NP-complete even for binary phylogenetic networks. We design an exponential time but efficient method for determining whether or not a phylogenetic tree is displayed in an arbitrary phylogenetic network. It is developed on the basis of the so-called reticulation-visible property of phylogenetic networks. A C-program is available for download on http://www.math.nus.edu.sg/∼matzlx/tcp_package matzlx@nus.edu.sg Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  6. Phylogenetic analysis of ferlin genes reveals ancient eukaryotic origins

    Directory of Open Access Journals (Sweden)

    Lek Monkol

    2010-07-01

    Full Text Available Abstract Background The ferlin gene family possesses a rare and identifying feature consisting of multiple tandem C2 domains and a C-terminal transmembrane domain. Much currently remains unknown about the fundamental function of this gene family, however, mutations in its two most well-characterised members, dysferlin and otoferlin, have been implicated in human disease. The availability of genome sequences from a wide range of species makes it possible to explore the evolution of the ferlin family, providing contextual insight into characteristic features that define the ferlin gene family in its present form in humans. Results Ferlin genes were detected from all species of representative phyla, with two ferlin subgroups partitioned within the ferlin phylogenetic tree based on the presence or absence of a DysF domain. Invertebrates generally possessed two ferlin genes (one with DysF and one without, with six ferlin genes in most vertebrates (three DysF, three non-DysF. Expansion of the ferlin gene family is evident between the divergence of lamprey (jawless vertebrates and shark (cartilaginous fish. Common to almost all ferlins is an N-terminal C2-FerI-C2 sandwich, a FerB motif, and two C-terminal C2 domains (C2E and C2F adjacent to the transmembrane domain. Preservation of these structural elements throughout eukaryotic evolution suggests a fundamental role of these motifs for ferlin function. In contrast, DysF, C2DE, and FerA are optional, giving rise to subtle differences in domain topologies of ferlin genes. Despite conservation of multiple C2 domains in all ferlins, the C-terminal C2 domains (C2E and C2F displayed higher sequence conservation and greater conservation of putative calcium binding residues across paralogs and orthologs. Interestingly, the two most studied non-mammalian ferlins (Fer-1 and Misfire in model organisms C. elegans and D. melanogaster, present as outgroups in the phylogenetic analysis, with results suggesting

  7. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    Science.gov (United States)

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  8. Phylogenetic mixtures and linear invariants for equal input models.

    Science.gov (United States)

    Casanellas, Marta; Steel, Mike

    2017-04-01

    The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).

  9. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  10. Structural modelling and phylogenetic analyses of PgeIF4A2 (Eukaryotic translation initiation factor) from Pennisetum glaucum reveal signature motifs with a role in stress tolerance and development.

    Science.gov (United States)

    Agarwal, Aakrati; Mudgil, Yashwanti; Pandey, Saurabh; Fartyal, Dhirendra; Reddy, Malireddy K

    2016-01-01

    Eukaryotic translation initiation factor 4A (eIF4A) is an indispensable component of the translation machinery and also play a role in developmental processes and stress alleviation in plants and animals. Different eIF4A isoforms are present in the cytosol of the cell, namely, eIF4A1, eIF4A2, and eIF4A3 and their expression is tightly regulated in cap-dependent translation. We revealed the structural model of PgeIF4A2 protein using the crystal structure of Homo sapiens eIF4A3 (PDB ID: 2J0S) as template by Modeller 9.12. The resultant PgeIF4A2 model structure was refined by PROCHECK, ProSA, Verify3D and RMSD that showed the model structure is reliable with 77 % amino acid sequence identity with template. Investigation revealed two conserved signatures for ATP-dependent RNA Helicase DEAD-box conserved site (VLDEADEML) and RNA helicase DEAD-box type, Q-motif in sheet-turn-helix and α-helical region respectively. All these conserved motifs are responsible for response during developmental stages and stress tolerance in plants.

  11. Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

    Science.gov (United States)

    Sumner, Jeremy G

    2017-03-01

    We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

  12. MotifNet: a web-server for network motif analysis.

    Science.gov (United States)

    Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

    2017-06-15

    Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  14. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  15. Phylogenetic trees

    OpenAIRE

    Baños, Hector; Bushek, Nathaniel; Davidson, Ruth; Gross, Elizabeth; Harris, Pamela E.; Krone, Robert; Long, Colby; Stewart, Allen; Walker, Robert

    2016-01-01

    We introduce the package PhylogeneticTrees for Macaulay2 which allows users to compute phylogenetic invariants for group-based tree models. We provide some background information on phylogenetic algebraic geometry and show how the package PhylogeneticTrees can be used to calculate a generating set for a phylogenetic ideal as well as a lower bound for its dimension. Finally, we show how methods within the package can be used to compute a generating set for the join of any two ideals.

  16. Temporal motifs in time-dependent networks

    International Nuclear Information System (INIS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  17. Molecular phylogenetics and comparative modeling of HEN1, a methyltransferase involved in plant microRNA biogenesis

    Directory of Open Access Journals (Sweden)

    Obarska Agnieszka

    2006-01-01

    Full Text Available Abstract Background Recently, HEN1 protein from Arabidopsis thaliana was discovered as an essential enzyme in plant microRNA (miRNA biogenesis. HEN1 transfers a methyl group from S-adenosylmethionine to the 2'-OH or 3'-OH group of the last nucleotide of miRNA/miRNA* duplexes produced by the nuclease Dicer. Previously it was found that HEN1 possesses a Rossmann-fold methyltransferase (RFM domain and a long N-terminal extension including a putative double-stranded RNA-binding motif (DSRM. However, little is known about the details of the structure and the mechanism of action of this enzyme, and about its phylogenetic origin. Results Extensive database searches were carried out to identify orthologs and close paralogs of HEN1. Based on the multiple sequence alignment a phylogenetic tree of the HEN1 family was constructed. The fold-recognition approach was used to identify related methyltransferases with experimentally solved structures and to guide the homology modeling of the HEN1 catalytic domain. Additionally, we identified a La-like predicted RNA binding domain located C-terminally to the DSRM domain and a domain with a peptide prolyl cis/trans isomerase (PPIase fold, but without the conserved PPIase active site, located N-terminally to the catalytic domain. Conclusion The bioinformatics analysis revealed that the catalytic domain of HEN1 is not closely related to any known RNA:2'-OH methyltransferases (e.g. to the RrmJ/fibrillarin superfamily, but rather to small-molecule methyltransferases. The structural model was used as a platform to identify the putative active site and substrate-binding residues of HEN and to propose its mechanism of action.

  18. Epitope discovery with phylogenetic hidden Markov models.

    LENUS (Irish Health Repository)

    Lacerda, Miguel

    2010-05-01

    Existing methods for the prediction of immunologically active T-cell epitopes are based on the amino acid sequence or structure of pathogen proteins. Additional information regarding the locations of epitopes may be acquired by considering the evolution of viruses in hosts with different immune backgrounds. In particular, immune-dependent evolutionary patterns at sites within or near T-cell epitopes can be used to enhance epitope identification. We have developed a mutation-selection model of T-cell epitope evolution that allows the human leukocyte antigen (HLA) genotype of the host to influence the evolutionary process. This is one of the first examples of the incorporation of environmental parameters into a phylogenetic model and has many other potential applications where the selection pressures exerted on an organism can be related directly to environmental factors. We combine this novel evolutionary model with a hidden Markov model to identify contiguous amino acid positions that appear to evolve under immune pressure in the presence of specific host immune alleles and that therefore represent potential epitopes. This phylogenetic hidden Markov model provides a rigorous probabilistic framework that can be combined with sequence or structural information to improve epitope prediction. As a demonstration, we apply the model to a data set of HIV-1 protein-coding sequences and host HLA genotypes.

  19. Phylogenetic comparative methods on phylogenetic networks with reticulations.

    Science.gov (United States)

    Bastide, Paul; Solís-Lemus, Claudia; Kriebel, Ricardo; Sparks, K William; Ané, Cécile

    2018-04-25

    The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species' traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. One natural extension of the BM is to use a weighted average model for the trait of a hybrid, at a reticulation point. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel's λ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents, for instance because of hybrid vigor or hybrid depression. These two phenomena are rather commonly observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts, and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios, and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to a dataset of Xiphophorus fishes, to confirm and complete previous analysis in this group. All the methods developed here are available in the Julia package PhyloNetworks.

  20. Model checking software for phylogenetic trees using distribution and database methods

    Directory of Open Access Journals (Sweden)

    Requeno José Ignacio

    2013-12-01

    Full Text Available Model checking, a generic and formal paradigm stemming from computer science based on temporal logics, has been proposed for the study of biological properties that emerge from the labeling of the states defined over the phylogenetic tree. This strategy allows us to use generic software tools already present in the industry. However, the performance of traditional model checking is penalized when scaling the system for large phylogenies. To this end, two strategies are presented here. The first one consists of partitioning the phylogenetic tree into a set of subgraphs each one representing a subproblem to be verified so as to speed up the computation time and distribute the memory consumption. The second strategy is based on uncoupling the information associated to each state of the phylogenetic tree (mainly, the DNA sequence and exporting it to an external tool for the management of large information systems. The integration of all these approaches outperforms the results of monolithic model checking and helps us to execute the verification of properties in a real phylogenetic tree.

  1. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  2. A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.

    Science.gov (United States)

    Brewer, Steven D.

    This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…

  3. Analyzing Phylogenetic Trees with Timed and Probabilistic Model Checking: The Lactose Persistence Case Study.

    Science.gov (United States)

    Requeno, José Ignacio; Colom, José Manuel

    2014-12-01

    Model checking is a generic verification technique that allows the phylogeneticist to focus on models and specifications instead of on implementation issues. Phylogenetic trees are considered as transition systems over which we interrogate phylogenetic questions written as formulas of temporal logic. Nonetheless, standard logics become insufficient for certain practices of phylogenetic analysis since they do not allow the inclusion of explicit time and probabilities. The aim of this paper is to extend the application of model checking techniques beyond qualitative phylogenetic properties and adapt the existing logical extensions and tools to the field of phylogeny. The introduction of time and probabilities in phylogenetic specifications is motivated by the study of a real example: the analysis of the ratio of lactose intolerance in some populations and the date of appearance of this phenotype.

  4. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  5. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  6. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  7. Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.

    Science.gov (United States)

    Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A

    2018-01-30

    Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  9. Computational analyses of synergism in small molecular network motifs.

    Directory of Open Access Journals (Sweden)

    Yili Zhang

    2014-03-01

    Full Text Available Cellular functions and responses to stimuli are controlled by complex regulatory networks that comprise a large diversity of molecular components and their interactions. However, achieving an intuitive understanding of the dynamical properties and responses to stimuli of these networks is hampered by their large scale and complexity. To address this issue, analyses of regulatory networks often focus on reduced models that depict distinct, reoccurring connectivity patterns referred to as motifs. Previous modeling studies have begun to characterize the dynamics of small motifs, and to describe ways in which variations in parameters affect their responses to stimuli. The present study investigates how variations in pairs of parameters affect responses in a series of ten common network motifs, identifying concurrent variations that act synergistically (or antagonistically to alter the responses of the motifs to stimuli. Synergism (or antagonism was quantified using degrees of nonlinear blending and additive synergism. Simulations identified concurrent variations that maximized synergism, and examined the ways in which it was affected by stimulus protocols and the architecture of a motif. Only a subset of architectures exhibited synergism following paired changes in parameters. The approach was then applied to a model describing interlocked feedback loops governing the synthesis of the CREB1 and CREB2 transcription factors. The effects of motifs on synergism for this biologically realistic model were consistent with those for the abstract models of single motifs. These results have implications for the rational design of combination drug therapies with the potential for synergistic interactions.

  10. Identifiability of tree-child phylogenetic networks under a probabilistic recombination-mutation model of evolution.

    Science.gov (United States)

    Francis, Andrew; Moulton, Vincent

    2018-06-07

    Phylogenetic networks are an extension of phylogenetic trees which are used to represent evolutionary histories in which reticulation events (such as recombination and hybridization) have occurred. A central question for such networks is that of identifiability, which essentially asks under what circumstances can we reliably identify the phylogenetic network that gave rise to the observed data? Recently, identifiability results have appeared for networks relative to a model of sequence evolution that generalizes the standard Markov models used for phylogenetic trees. However, these results are quite limited in terms of the complexity of the networks that are considered. In this paper, by introducing an alternative probabilistic model for evolution along a network that is based on some ground-breaking work by Thatte for pedigrees, we are able to obtain an identifiability result for a much larger class of phylogenetic networks (essentially the class of so-called tree-child networks). To prove our main theorem, we derive some new results for identifying tree-child networks combinatorially, and then adapt some techniques developed by Thatte for pedigrees to show that our combinatorial results imply identifiability in the probabilistic setting. We hope that the introduction of our new model for networks could lead to new approaches to reliably construct phylogenetic networks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing.

    Energy Technology Data Exchange (ETDEWEB)

    Hong, R. L., Hamaguchi, L., Busch, M. A., and Weigel, D.

    2003-06-01

    OAK-B135 In Arabidopsis thaliana, cis-regulatory sequences of the floral homeotic gene AGAMOUS (AG) are located in the second intron. This 3 kb intron contains binding sites for two direct activators of AG, LEAFY (LFY) and WUSCHEL (WUS), along with other putative regulatory elements. We have used phylogenetic footprinting and the related technique of phylogenetic shadowing to identify putative cis-regulatory elements in this intron. Among 29 Brassicaceae, several other motifs, but not the LFY and WUS binding sites previously identified, are largely invariant. Using reporter gene analyses, we tested six of these motifs and found that they are all functionally important for activity of AG regulatory sequences in A. thaliana. Although there is little obvious sequence similarity outside the Brassicaceae, the intron from cucumber AG has at least partial activity in A. thaliana. Our studies underscore the value of the comparative approach as a tool that complements gene-by-gene promoter dissection, but also highlight that sequence-based studies alone are insufficient for a complete identification of cis-regulatory sites.

  12. Verification of the MOTIF code version 3.0

    International Nuclear Information System (INIS)

    Chan, T.; Guvanasen, V.; Nakka, B.W.; Reid, J.A.K.; Scheier, N.W.; Stanchell, F.W.

    1996-12-01

    As part of the Canadian Nuclear Fuel Waste Management Program (CNFWMP), AECL has developed a three-dimensional finite-element code, MOTIF (Model Of Transport In Fractured/ porous media), for detailed modelling of groundwater flow, heat transport and solute transport in a fractured rock mass. The code solves the transient and steady-state equations of groundwater flow, solute (including one-species radionuclide) transport, and heat transport in variably saturated fractured/porous media. The initial development was completed in 1985 (Guvanasen 1985) and version 3.0 was completed in 1986. This version is documented in detail in Guvanasen and Chan (in preparation). This report describes a series of fourteen verification cases which has been used to test the numerical solution techniques and coding of MOTIF, as well as demonstrate some of the MOTIF analysis capabilities. For each case the MOTIF solution has been compared with a corresponding analytical or independently developed alternate numerical solution. Several of the verification cases were included in Level 1 of the International Hydrologic Code Intercomparison Project (HYDROCOIN). The MOTIF results for these cases were also described in the HYDROCOIN Secretariat's compilation and comparison of results submitted by the various project teams (Swedish Nuclear Power Inspectorate 1988). It is evident from the graphical comparisons presented that the MOTIF solutions for the fourteen verification cases are generally in excellent agreement with known analytical or numerical solutions obtained from independent sources. This series of verification studies has established the ability of the MOTIF finite-element code to accurately model the groundwater flow and solute and heat transport phenomena for which it is intended. (author). 20 refs., 14 tabs., 32 figs

  13. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  14. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Science.gov (United States)

    Grimm, Guido W.; Renner, Susanne S.; Stamatakis, Alexandros; Hemleben, Vera

    2007-01-01

    The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly. PMID:19455198

  15. Discriminative motif discovery via simulated evolution and random under-sampling.

    Directory of Open Access Journals (Sweden)

    Tao Song

    Full Text Available Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  16. Discriminative motif discovery via simulated evolution and random under-sampling.

    Science.gov (United States)

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  17. Path integral formulation and Feynman rules for phylogenetic branching models

    Energy Technology Data Exchange (ETDEWEB)

    Jarvis, P D; Bashford, J D; Sumner, J G [School of Mathematics and Physics, University of Tasmania, GPO Box 252C, 7001 Hobart, TAS (Australia)

    2005-11-04

    A dynamical picture of phylogenetic evolution is given in terms of Markov models on a state space, comprising joint probability distributions for character types of taxonomic classes. Phylogenetic branching is a process which augments the number of taxa under consideration, and hence the rank of the underlying joint probability state tensor. We point out the combinatorial necessity for a second-quantized, or Fock space setting, incorporating discrete counting labels for taxa and character types, to allow for a description in the number basis. Rate operators describing both time evolution without branching, and also phylogenetic branching events, are identified. A detailed development of these ideas is given, using standard transcriptions from the microscopic formulation of non-equilibrium reaction-diffusion or birth-death processes. These give the relations between stochastic rate matrices, the matrix elements of the corresponding evolution operators representing them, and the integral kernels needed to implement these as path integrals. The 'free' theory (without branching) is solved, and the correct trilinear 'interaction' terms (representing branching events) are presented. The full model is developed in perturbation theory via the derivation of explicit Feynman rules which establish that the probabilities (pattern frequencies of leaf colourations) arising as matrix elements of the time evolution operator are identical with those computed via the standard analysis. Simple examples (phylogenetic trees with two or three leaves), are discussed in detail. Further implications for the work are briefly considered including the role of time reparametrization covariance.

  18. Path integral formulation and Feynman rules for phylogenetic branching models

    International Nuclear Information System (INIS)

    Jarvis, P D; Bashford, J D; Sumner, J G

    2005-01-01

    A dynamical picture of phylogenetic evolution is given in terms of Markov models on a state space, comprising joint probability distributions for character types of taxonomic classes. Phylogenetic branching is a process which augments the number of taxa under consideration, and hence the rank of the underlying joint probability state tensor. We point out the combinatorial necessity for a second-quantized, or Fock space setting, incorporating discrete counting labels for taxa and character types, to allow for a description in the number basis. Rate operators describing both time evolution without branching, and also phylogenetic branching events, are identified. A detailed development of these ideas is given, using standard transcriptions from the microscopic formulation of non-equilibrium reaction-diffusion or birth-death processes. These give the relations between stochastic rate matrices, the matrix elements of the corresponding evolution operators representing them, and the integral kernels needed to implement these as path integrals. The 'free' theory (without branching) is solved, and the correct trilinear 'interaction' terms (representing branching events) are presented. The full model is developed in perturbation theory via the derivation of explicit Feynman rules which establish that the probabilities (pattern frequencies of leaf colourations) arising as matrix elements of the time evolution operator are identical with those computed via the standard analysis. Simple examples (phylogenetic trees with two or three leaves), are discussed in detail. Further implications for the work are briefly considered including the role of time reparametrization covariance

  19. DNA motif elucidation using belief propagation.

    Science.gov (United States)

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-09-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.

  20. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-01-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  1. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun

    2013-06-29

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors\\' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  2. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  3. Motif formation and industry specific topologies in the Japanese business firm network

    Science.gov (United States)

    Maluck, Julian; Donner, Reik V.; Takayasu, Hideki; Takayasu, Misako

    2017-05-01

    Motifs and roles are basic quantities for the characterization of interactions among 3-node subsets in complex networks. In this work, we investigate how the distribution of 3-node motifs can be influenced by modifying the rules of an evolving network model while keeping the statistics of simpler network characteristics, such as the link density and the degree distribution, invariant. We exemplify this problem for the special case of the Japanese Business Firm Network, where a well-studied and relatively simple yet realistic evolving network model is available, and compare the resulting motif distribution in the real-world and simulated networks. To better approximate the motif distribution of the real-world network in the model, we introduce both subgraph dependent and global additional rules. We find that a specific rule that allows only for the merging process between nodes with similar link directionality patterns reduces the observed excess of densely connected motifs with bidirectional links. Our study improves the mechanistic understanding of motif formation in evolving network models to better describe the characteristic features of real-world networks with a scale-free topology.

  4. PALM: a paralleled and integrated framework for phylogenetic inference with automatic likelihood model selectors.

    Directory of Open Access Journals (Sweden)

    Shu-Hwa Chen

    Full Text Available BACKGROUND: Selecting an appropriate substitution model and deriving a tree topology for a given sequence set are essential in phylogenetic analysis. However, such time consuming, computationally intensive tasks rely on knowledge of substitution model theories and related expertise to run through all possible combinations of several separate programs. To ensure a thorough and efficient analysis and avert tedious manipulations of various programs, this work presents an intuitive framework, the phylogenetic reconstruction with automatic likelihood model selectors (PALM, with convincing, updated algorithms and a best-fit model selection mechanism for seamless phylogenetic analysis. METHODOLOGY: As an integrated framework of ClustalW, PhyML, MODELTEST, ProtTest, and several in-house programs, PALM evaluates the fitness of 56 substitution models for nucleotide sequences and 112 substitution models for protein sequences with scores in various criteria. The input for PALM can be either sequences in FASTA format or a sequence alignment file in PHYLIP format. To accelerate the computing of maximum likelihood and bootstrapping, this work integrates MPICH2/PhyML, PalmMonitor and Palm job controller across several machines with multiple processors and adopts the task parallelism approach. Moreover, an intuitive and interactive web component, PalmTree, is developed for displaying and operating the output tree with options of tree rooting, branches swapping, viewing the branch length values, and viewing bootstrapping score, as well as removing nodes to restart analysis iteratively. SIGNIFICANCE: The workflow of PALM is straightforward and coherent. Via a succinct, user-friendly interface, researchers unfamiliar with phylogenetic analysis can easily use this server to submit sequences, retrieve the output, and re-submit a job based on a previous result if some sequences are to be deleted or added for phylogenetic reconstruction. PALM results in an inference of

  5. Low-dimensional morphospace of topological motifs in human fMRI brain networks

    Directory of Open Access Journals (Sweden)

    Sarah E. Morgan

    2018-06-01

    Full Text Available We present a low-dimensional morphospace of fMRI brain networks, where axes are defined in a data-driven manner based on the network motifs. The morphospace allows us to identify the key variations in healthy fMRI networks in terms of their underlying motifs, and we observe that two principal components (PCs can account for 97% of the motif variability. The first PC of the motif distribution is correlated with efficiency and inversely correlated with transitivity. Hence this axis approximately conforms to the well-known economical small-world trade-off between integration and segregation in brain networks. Finally, we show that the economical clustering generative model proposed by Vértes et al. (2012 can approximately reproduce the motif morphospace of the real fMRI brain networks, in contrast to other generative models. Overall, the motif morphospace provides a powerful way to visualize the relationships between network properties and to investigate generative or constraining factors in the formation of complex human brain functional networks. Motifs have been described as the building blocks of complex networks. Meanwhile, a morphospace allows networks to be placed in a common space and can reveal the relationships between different network properties and elucidate the driving forces behind network topology. We combine the concepts of motifs and morphospaces to create the first motif morphospace of fMRI brain networks. Crucially, the morphospace axes are defined by the motifs, in a data-driven manner. We observe strong correlations between the networks’ positions in morphospace and their global topological properties, suggesting that motif morphospaces are a powerful way to capture the topology of networks in a low-dimensional space and to compare generative models of brain networks. Motif morphospaces could also be used to study other complex networks’ topologies.

  6. Phylogenetic turnover during subtropical forest succession across environmental and phylogenetic scales.

    Science.gov (United States)

    Purschke, Oliver; Michalski, Stefan G; Bruelheide, Helge; Durka, Walter

    2017-12-01

    Although spatial and temporal patterns of phylogenetic community structure during succession are inherently interlinked and assembly processes vary with environmental and phylogenetic scales, successional studies of community assembly have yet to integrate spatial and temporal components of community structure, while accounting for scaling issues. To gain insight into the processes that generate biodiversity after disturbance, we combine analyses of spatial and temporal phylogenetic turnover across phylogenetic scales, accounting for covariation with environmental differences. We compared phylogenetic turnover, at the species- and individual-level, within and between five successional stages, representing woody plant communities in a subtropical forest chronosequence. We decomposed turnover at different phylogenetic depths and assessed its covariation with between-plot abiotic differences. Phylogenetic turnover between stages was low relative to species turnover and was not explained by abiotic differences. However, within the late-successional stages, there was high presence-/absence-based turnover (clustering) that occurred deep in the phylogeny and covaried with environmental differentiation. Our results support a deterministic model of community assembly where (i) phylogenetic composition is constrained through successional time, but (ii) toward late succession, species sorting into preferred habitats according to niche traits that are conserved deep in phylogeny, becomes increasingly important.

  7. Transduction motif analysis of gastric cancer based on a human signaling network

    Energy Technology Data Exchange (ETDEWEB)

    Liu, G.; Li, D.Z.; Jiang, C.S.; Wang, W. [Fuzhou General Hospital of Nanjing Command, Department of Gastroenterology, Fuzhou, China, Department of Gastroenterology, Fuzhou General Hospital of Nanjing Command, Fuzhou (China)

    2014-04-04

    To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs.

  8. Mechanisms of zero-lag synchronization in cortical motifs.

    Directory of Open Access Journals (Sweden)

    Leonardo L Gollo

    2014-04-01

    Full Text Available Zero-lag synchronization between distant cortical areas has been observed in a diversity of experimental data sets and between many different regions of the brain. Several computational mechanisms have been proposed to account for such isochronous synchronization in the presence of long conduction delays: Of these, the phenomenon of "dynamical relaying"--a mechanism that relies on a specific network motif--has proven to be the most robust with respect to parameter mismatch and system noise. Surprisingly, despite a contrary belief in the community, the common driving motif is an unreliable means of establishing zero-lag synchrony. Although dynamical relaying has been validated in empirical and computational studies, the deeper dynamical mechanisms and comparison to dynamics on other motifs is lacking. By systematically comparing synchronization on a variety of small motifs, we establish that the presence of a single reciprocally connected pair--a "resonance pair"--plays a crucial role in disambiguating those motifs that foster zero-lag synchrony in the presence of conduction delays (such as dynamical relaying from those that do not (such as the common driving triad. Remarkably, minor structural changes to the common driving motif that incorporate a reciprocal pair recover robust zero-lag synchrony. The findings are observed in computational models of spiking neurons, populations of spiking neurons and neural mass models, and arise whether the oscillatory systems are periodic, chaotic, noise-free or driven by stochastic inputs. The influence of the resonance pair is also robust to parameter mismatch and asymmetrical time delays amongst the elements of the motif. We call this manner of facilitating zero-lag synchrony resonance-induced synchronization, outline the conditions for its occurrence, and propose that it may be a general mechanism to promote zero-lag synchrony in the brain.

  9. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  10. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  11. Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality

    Directory of Open Access Journals (Sweden)

    Ye Ping

    2005-12-01

    Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic

  12. SSTRAP: A computational model for genomic motif discovery ...

    African Journals Online (AJOL)

    Computational methods can potentially provide high-quality prediction of biological molecules such as DNA binding sites and Transcription factors and therefore reduce the time needed for experimental verification and challenges associated with experimental methods. These biological molecules or motifs have significant ...

  13. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Pipeline for the Analysis of ChIP-seq Data and New Motif Ranking Procedure

    KAUST Repository

    Ashoor, Haitham

    2011-06-01

    This thesis presents a computational methodology for ab-initio identification of transcription factor binding sites based on ChIP-seq data. This method consists of three main steps, namely ChIP-seq data processing, motif discovery and models selection. A novel method for ranking the models of motifs identified in this process is proposed. This method combines multiple factors in order to rank the provided candidate motifs. It combines the model coverage of the ChIP-seq fragments that contain motifs from which that model is built, the suitable background data made up of shuffled ChIP-seq fragments, and the p-value that resulted from evaluating the model on actual and background data. Two ChIP-seq datasets retrieved from ENCODE project are used to evaluate and demonstrate the ability of the method to predict correct TFBSs with high precision. The first dataset relates to neuron-restrictive silencer factor, NRSF, while the second one corresponds to growth-associated binding protein, GABP. The pipeline system shows high precision prediction for both datasets, as in both cases the top ranked motif closely resembles the known motifs for the respective transcription factors.

  15. An efficient identification strategy of clonal tea cultivars using long-core motif SSR markers.

    Science.gov (United States)

    Wang, Rang Jian; Gao, Xiang Feng; Kong, Xiang Rui; Yang, Jun

    2016-01-01

    Microsatellites, or simple sequence repeats (SSRs), especially those with long-core motifs (tri-, tetra-, penta-, and hexa-nucleotide) represent an excellent tool for DNA fingerprinting. SSRs with long-core motifs are preferred since neighbor alleles are more easily separated and identified from each other, which render the interpretation of electropherograms and the true alleles more reliable. In the present work, with the purpose of characterizing a set of core SSR markers with long-core motifs for well fingerprinting clonal cultivars of tea (Camellia sinensis), we analyzed 66 elite clonal tea cultivars in China with 33 initially-chosen long-core motif SSR markers covering all the 15 linkage groups of tea plant genome. A set of 6 SSR markers were conclusively selected as core SSR markers after further selection. The polymorphic information content (PIC) of the core SSR markers was >0.5, with ≤5 alleles in each marker containing 10 or fewer genotypes. Phylogenetic analysis revealed that the core SSR markers were not strongly correlated with the trait 'cultivar processing-property'. The combined probability of identity (PID) between two random cultivars for the whole set of 6 SSR markers was estimated to be 2.22 × 10(-5), which was quite low, confirmed the usefulness of the proposed SSR markers for fingerprinting analyses in Camellia sinensis. Moreover, for the sake of quickly discriminating the clonal tea cultivars, a cultivar identification diagram (CID) was subsequently established using these core markers, which fully reflected the identification process and provided the immediate information about which SSR markers were needed to identify a cultivar chosen among the tested ones. The results suggested that long-core motif SSR markers used in the investigation contributed to the accurate and efficient identification of the clonal tea cultivars and enabled the protection of intellectual property.

  16. Potentials and limitations of histone repeat sequences for phylogenetic reconstruction of Sophophora.

    Science.gov (United States)

    Baldo, A M; Les, D H; Strausbaugh, L D

    1999-11-01

    Simplified DNA sequence acquisition has provided many new data sets that are useful for phylogenetic reconstruction, including single- and multiple-copy nuclear and organellar genes. Although transcribed regions receive much attention, nontranscribed regions have recently been added to the repertoire of sequences suitable for phylogenetic studies, especially for closely related taxa. We evaluated the efficacy of a small portion of the histone repeat for phylogenetic reconstruction among Drosophila species. Histone repeats in invertebrates offer distinct advantages similar to those of widely used ribosomal repeats. First, the units are tandemly repeated and undergo concerted evolution. Second, histone repeats include both highly conserved coding and variable intergenic regions. This composition facilitates application of "universal" primers spanning potentially informative sites. We examined a small region of the histone repeat, including the intergenic spacer segments of coding regions from the divergently transcribed H2A and H2B histone genes. The spacer (about 230 bp) exists as a mosaic with highly conserved functional motifs interspersed with rapidly diverging regions; the former aid in alignment of the spacer. There are no ambiguities in alignment of coding regions. Coding and noncoding regions were analyzed together and separately for phylogenetic information. Parsimony, distance, and maximum-likelihood methods successfully retrieve the corroborated phylogeny for the taxa examined. This study demonstrates the resolving power of a small histone region which may now be added to the growing collection of phylogenetically useful DNA sequences.

  17. Fast social-like learning of complex behaviors based on motor motifs

    Science.gov (United States)

    Calvo Tapia, Carlos; Tyukin, Ivan Y.; Makarov, Valeri A.

    2018-05-01

    Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n -1 )! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n -1 ) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.

  18. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  19. Linear programming model to construct phylogenetic network for 16S rRNA sequences of photosynthetic organisms and influenza viruses.

    Science.gov (United States)

    Mathur, Rinku; Adlakha, Neeru

    2014-06-01

    Phylogenetic trees give the information about the vertical relationships of ancestors and descendants but phylogenetic networks are used to visualize the horizontal relationships among the different organisms. In order to predict reticulate events there is a need to construct phylogenetic networks. Here, a Linear Programming (LP) model has been developed for the construction of phylogenetic network. The model is validated by using data sets of chloroplast of 16S rRNA sequences of photosynthetic organisms and Influenza A/H5N1 viruses. Results obtained are in agreement with those obtained by earlier researchers.

  20. Stochastic Resonance in Neuronal Network Motifs with Ornstein-Uhlenbeck Colored Noise

    Directory of Open Access Journals (Sweden)

    Xuyang Lou

    2014-01-01

    Full Text Available We consider here the effect of the Ornstein-Uhlenbeck colored noise on the stochastic resonance of the feed-forward-loop (FFL network motif. The FFL motif is modeled through the FitzHugh-Nagumo neuron model as well as the chemical coupling. Our results show that the noise intensity and the correlation time of the noise process serve as the control parameters, which have great impacts on the stochastic dynamics of the FFL motif. We find that, with a proper choice of noise intensities and the correlation time of the noise process, the signal-to-noise ratio (SNR can display more than one peak.

  1. Kopi dan Kakao dalam Kreasi Motif Batik Khas Jember

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2015-06-01

    Full Text Available ABSTRAK Batik Jember selama ini identik dengan motif daun tembakau. Visualisasi daun tembakau dalam motif Batik Jember cukup lemah, yaitu kurang berkarakter karena motif yang muncul adalah seperti gambar daun pada umumnya. Oleh karena itu perlu diciptakan desain motif batik khas Jember yang sumber inspirasinya digali dari kekayaan alam lainnya dari Jember yang mempunyai bentuk spesifik dan karakteristik sehingga identitas motif bisa didapatkan dengan lebih kuat. Hasil alam khas Jember tersebut adalah kopi dan kakao. Tujuan penciptaan seni ini adalah untuk menghasilkan motif batik  baru yang mempunyai ciri khas Jember. Metode yang digunakan yaitu pengumpulan data, pengamatan mendalam terhadap objek penciptaan, pengkajian sumber inspirasi, pembuatan desain motif, dan perwujudan menjadi batik. Dari penciptaan seni ini berhasil dikreasikan 6 (enam motif batik yaitu: (1 Motif Uwoh Kopi; (2 Motif Godong Kopi;  (3 Motif Ceplok Kakao; (4 Motif Kakao Raja; (5 Motif Kakao Biru; dan (6 Motif Wiji Mukti. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Uwoh Kopi dan Motif Kakao Raja. Kata kunci: Motif Woh Kopi, Motif Godong Kopi, Motif Ceplok Kakao, Motif Kakao Raja, Motif Kakao Biru, Motif Wiji Mukti ABSTRACTBatik Jember is synonymous with tobacco leaf motif. Tobacco leaf shape is quite weak in the visual appearance characterized as that motif emerges like a picture of leaves in general. Therefore, it is necessary to create a distinctive design motif extracted from other natural resources of Jember that have specific shapes and characteristics that can be obtained as the stronger motif identity. The typical natural resources from Jember are coffee and cocoa. The purpose of the creation of this art is to produce the unique, creative and innovative batik and have specific characteristics of Jember. The method used are data collection, observation of the object, reviewing inspiration sources

  2. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

    Science.gov (United States)

    Guo, Yuchun; Tian, Kevin; Zeng, Haoyang; Guo, Xiaoyun; Gifford, David Kenneth

    2018-04-13

    The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k -mer set memory (KSM), which consists of a set of aligned k -mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations. © 2018 Guo et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  4. Phylogenetic diversity and biodiversity indices on phylogenetic networks.

    Science.gov (United States)

    Wicke, Kristina; Fischer, Mareike

    2018-04-01

    In biodiversity conservation it is often necessary to prioritize the species to conserve. Existing approaches to prioritization, e.g. the Fair Proportion Index and the Shapley Value, are based on phylogenetic trees and rank species according to their contribution to overall phylogenetic diversity. However, in many cases evolution is not treelike and thus, phylogenetic networks have been developed as a generalization of phylogenetic trees, allowing for the representation of non-treelike evolutionary events, such as hybridization. Here, we extend the concepts of phylogenetic diversity and phylogenetic diversity indices from phylogenetic trees to phylogenetic networks. On the one hand, we consider the treelike content of a phylogenetic network, e.g. the (multi)set of phylogenetic trees displayed by a network and the so-called lowest stable ancestor tree associated with it. On the other hand, we derive the phylogenetic diversity of subsets of taxa and biodiversity indices directly from the internal structure of the network. We consider both approaches that are independent of so-called inheritance probabilities as well as approaches that explicitly incorporate these probabilities. Furthermore, we introduce our software package NetDiversity, which is implemented in Perl and allows for the calculation of all generalized measures of phylogenetic diversity and generalized phylogenetic diversity indices established in this note that are independent of inheritance probabilities. We apply our methods to a phylogenetic network representing the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), a group of species characterized by widespread hybridization. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

    Science.gov (United States)

    Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

    2014-12-01

    Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.

  6. phangorn: phylogenetic analysis in R.

    Science.gov (United States)

    Schliep, Klaus Peter

    2011-02-15

    phangorn is a package for phylogenetic reconstruction and analysis in the R language. Previously it was only possible to estimate phylogenetic trees with distance methods in R. phangorn, now offers the possibility of reconstructing phylogenies with distance based methods, maximum parsimony or maximum likelihood (ML) and performing Hadamard conjugation. Extending the general ML framework, this package provides the possibility of estimating mixture and partition models. Furthermore, phangorn offers several functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analyses. phangorn can be obtained through the CRAN homepage http://cran.r-project.org/web/packages/phangorn/index.html. phangorn is licensed under GPL 2.

  7. MODA: an efficient algorithm for network motif discovery in biological networks.

    Science.gov (United States)

    Omidi, Saeed; Schreiber, Falk; Masoudi-Nejad, Ali

    2009-10-01

    In recent years, interest has been growing in the study of complex networks. Since Erdös and Rényi (1960) proposed their random graph model about 50 years ago, many researchers have investigated and shaped this field. Many indicators have been proposed to assess the global features of networks. Recently, an active research area has developed in studying local features named motifs as the building blocks of networks. Unfortunately, network motif discovery is a computationally hard problem and finding rather large motifs (larger than 8 nodes) by means of current algorithms is impractical as it demands too much computational effort. In this paper, we present a new algorithm (MODA) that incorporates techniques such as a pattern growth approach for extracting larger motifs efficiently. We have tested our algorithm and found it able to identify larger motifs with more than 8 nodes more efficiently than most of the current state-of-the-art motif discovery algorithms. While most of the algorithms rely on induced subgraphs as motifs of the networks, MODA is able to extract both induced and non-induced subgraphs simultaneously. The MODA source code is freely available at: http://LBB.ut.ac.ir/Download/LBBsoft/MODA/

  8. A phylogenetic study of SPBP and RAI1: evolutionary conservation of chromatin binding modules.

    Directory of Open Access Journals (Sweden)

    Sagar Darvekar

    Full Text Available Our genome is assembled into and array of highly dynamic nucleosome structures allowing spatial and temporal access to DNA. The nucleosomes are subject to a wide array of post-translational modifications, altering the DNA-histone interaction and serving as docking sites for proteins exhibiting effector or "reader" modules. The nuclear proteins SPBP and RAI1 are composed of several putative "reader" modules which may have ability to recognise a set of histone modification marks. Here we have performed a phylogenetic study of their putative reader modules, the C-terminal ePHD/ADD like domain, a novel nucleosome binding region and an AT-hook motif. Interactions studies in vitro and in yeast cells suggested that despite the extraordinary long loop region in their ePHD/ADD-like chromatin binding domains, the C-terminal region of both proteins seem to adopt a cross-braced topology of zinc finger interactions similar to other structurally determined ePHD/ADD structures. Both their ePHD/ADD-like domain and their novel nucleosome binding domain are highly conserved in vertebrate evolution, and construction of a phylogenetic tree displayed two well supported clusters representing SPBP and RAI1, respectively. Their genome and domain organisation suggest that SPBP and RAI1 have occurred from a gene duplication event. The phylogenetic tree suggests that this duplication has happened early in vertebrate evolution, since only one gene was identified in insects and lancelet. Finally, experimental data confirm that the conserved novel nucleosome binding region of RAI1 has the ability to bind the nucleosome core and histones. However, an adjacent conserved AT-hook motif as identified in SPBP is not present in RAI1, and deletion of the novel nucleosome binding region of RAI1 did not significantly affect its nuclear localisation.

  9. Estimation of rates-across-sites distributions in phylogenetic substitution models.

    Science.gov (United States)

    Susko, Edward; Field, Chris; Blouin, Christian; Roger, Andrew J

    2003-10-01

    Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.

  10. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  11. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  12. Rooting phylogenetic trees under the coalescent model using site pattern probabilities.

    Science.gov (United States)

    Tian, Yuan; Kubatko, Laura

    2017-12-19

    Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.

  13. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  14. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo; Jankovic, Boris R.; Bajic, Vladimir B.; Song, Le; Gao, Xin

    2013-01-01

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  15. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  16. Identity and functions of CxxC-derived motifs.

    Science.gov (United States)

    Fomenko, Dmitri E; Gladyshev, Vadim N

    2003-09-30

    Two cysteines separated by two other residues (the CxxC motif) are employed by many redox proteins for formation, isomerization, and reduction of disulfide bonds and for other redox functions. The place of the C-terminal cysteine in this motif may be occupied by serine (the CxxS motif), modifying the functional repertoire of redox proteins. Here we found that the CxxC motif may also give rise to a motif, in which the C-terminal cysteine is replaced with threonine (the CxxT motif). Moreover, in contrast to a view that the N-terminal cysteine in the CxxC motif always serves as a nucleophilic attacking group, this residue could also be replaced with threonine (the TxxC motif), serine (the SxxC motif), or other residues. In each of these CxxC-derived motifs, the presence of a downstream alpha-helix was strongly favored. A search for conserved CxxC-derived motif/helix patterns in four complete genomes representing bacteria, archaea, and eukaryotes identified known redox proteins and suggested possible redox functions for several additional proteins. Catalytic sites in peroxiredoxins were major representatives of the TxxC motif, whereas those in glutathione peroxidases represented the CxxT motif. Structural assessments indicated that threonines in these enzymes could stabilize catalytic thiolates, suggesting revisions to previously proposed catalytic triads. Each of the CxxC-derived motifs was also observed in natural selenium-containing proteins, in which selenocysteine was present in place of a catalytic cysteine.

  17. Motifs in triadic random graphs based on Steiner triple systems

    Science.gov (United States)

    Winkler, Marco; Reichardt, Jörg

    2013-08-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade, the overabundance of certain subnetwork patterns, i.e., the so-called motifs, has attracted much attention. It has been hypothesized that these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information, is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graph models (ERGMs) to define models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obstacle, we use Steiner triple systems (STSs). These are partitions of sets of nodes into pair-disjoint triads, which thus can be specified independently. Combining the concepts of ERGMs and STSs, we suggest generative models capable of generating ensembles of networks with nontrivial triadic Z-score profiles. Further, we discover inevitable correlations between the abundance of triad patterns, which occur solely for statistical reasons and need to be taken into account when discussing the functional implications of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs analytically.

  18. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.

    2008-01-01

    set of 481 unique phosphotyrosine (Tyr(P)) peptides by sequence similarity to known ligands of the Src homology 2 (SH2) and the phosphotyrosine binding (PTB) domains. From 20 clusters we extracted 16 known and four new interaction motifs. Using quantitative mass spectrometry we pulled down Tyr......(P)-specific binding partners for peptides corresponding to the extracted motifs. We confirmed numerous previously known interaction motifs and found 15 new interactions mediated by phosphosites not previously known to bind SH2 or PTB. Remarkably, a novel hydrophobic N-terminal motif ((L/V/I)(L/V/I)pY) was identified...

  19. Temporal motifs reveal collaboration patterns in online task-oriented networks

    Science.gov (United States)

    Xuan, Qi; Fang, Huiting; Fu, Chenbo; Filkov, Vladimir

    2015-05-01

    Real networks feature layers of interactions and complexity. In them, different types of nodes can interact with each other via a variety of events. Examples of this complexity are task-oriented social networks (TOSNs), where teams of people share tasks towards creating a quality artifact, such as academic research papers or software development in commercial or open source environments. Accomplishing those tasks involves both work, e.g., writing the papers or code, and communication, to discuss and coordinate. Taking into account the different types of activities and how they alternate over time can result in much more precise understanding of the TOSNs behaviors and outcomes. That calls for modeling techniques that can accommodate both node and link heterogeneity as well as temporal change. In this paper, we report on methodology for finding temporal motifs in TOSNs, limited to a system of two people and an artifact. We apply the methods to publicly available data of TOSNs from 31 Open Source Software projects. We find that these temporal motifs are enriched in the observed data. When applied to software development outcome, temporal motifs reveal a distinct dependency between collaboration and communication in the code writing process. Moreover, we show that models based on temporal motifs can be used to more precisely relate both individual developer centrality and team cohesion to programmer productivity than models based on aggregated TOSNs.

  20. Functional and phylogenetic ecology in R

    CERN Document Server

    Swenson, Nathan G

    2014-01-01

    Functional and Phylogenetic Ecology in R is designed to teach readers to use R for phylogenetic and functional trait analyses. Over the past decade, a dizzying array of tools and methods were generated to incorporate phylogenetic and functional information into traditional ecological analyses. Increasingly these tools are implemented in R, thus greatly expanding their impact. Researchers getting started in R can use this volume as a step-by-step entryway into phylogenetic and functional analyses for ecology in R. More advanced users will be able to use this volume as a quick reference to understand particular analyses. The volume begins with an introduction to the R environment and handling relevant data in R. Chapters then cover phylogenetic and functional metrics of biodiversity; null modeling and randomizations for phylogenetic and functional trait analyses; integrating phylogenetic and functional trait information; and interfacing the R environment with a popular C-based program. This book presents a uni...

  1. On Tree-Based Phylogenetic Networks.

    Science.gov (United States)

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

  2. Motif enrichment tool.

    Science.gov (United States)

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  4. Design of potent inhibitors of human RAD51 recombinase based on BRC motifs of BRCA2 protein: modeling and experimental validation of a chimera peptide.

    KAUST Repository

    Nomme, Julian; Renodon-Corniè re, Axelle; Asanomi, Yuya; Sakaguchi, Kazuyasu; Stasiak, Alicja Z; Stasiak, Andrzej; Norden, Bengt; Tran, Vinh; Takahashi, Masayuki

    2010-01-01

    We have previously shown that a 28-amino acid peptide derived from the BRC4 motif of BRCA2 tumor suppressor inhibits selectively human RAD51 recombinase (HsRad51). With the aim of designing better inhibitors for cancer treatment, we combined an in silico docking approach with in vitro biochemical testing to construct a highly efficient chimera peptide from eight existing human BRC motifs. We built a molecular model of all BRC motifs complexed with HsRad51 based on the crystal structure of the BRC4 motif-HsRad51 complex, computed the interaction energy of each residue in each BRC motif, and selected the best amino acid residue at each binding position. This analysis enabled us to propose four amino acid substitutions in the BRC4 motif. Three of these increased the inhibitory effect in vitro, and this effect was found to be additive. We thus obtained a peptide that is about 10 times more efficient in inhibiting HsRad51-ssDNA complex formation than the original peptide.

  5. Design of potent inhibitors of human RAD51 recombinase based on BRC motifs of BRCA2 protein: modeling and experimental validation of a chimera peptide.

    KAUST Repository

    Nomme, Julian

    2010-08-01

    We have previously shown that a 28-amino acid peptide derived from the BRC4 motif of BRCA2 tumor suppressor inhibits selectively human RAD51 recombinase (HsRad51). With the aim of designing better inhibitors for cancer treatment, we combined an in silico docking approach with in vitro biochemical testing to construct a highly efficient chimera peptide from eight existing human BRC motifs. We built a molecular model of all BRC motifs complexed with HsRad51 based on the crystal structure of the BRC4 motif-HsRad51 complex, computed the interaction energy of each residue in each BRC motif, and selected the best amino acid residue at each binding position. This analysis enabled us to propose four amino acid substitutions in the BRC4 motif. Three of these increased the inhibitory effect in vitro, and this effect was found to be additive. We thus obtained a peptide that is about 10 times more efficient in inhibiting HsRad51-ssDNA complex formation than the original peptide.

  6. Molecular Phylogenetics: Concepts for a Newcomer.

    Science.gov (United States)

    Ajawatanawong, Pravech

    Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.

  7. Systematic comparison of the response properties of protein and RNA mediated gene regulatory motifs.

    Science.gov (United States)

    Iyengar, Bharat Ravi; Pillai, Beena; Venkatesh, K V; Gadgil, Chetan J

    2017-05-30

    We present a framework enabling the dissection of the effects of motif structure (feedback or feedforward), the nature of the controller (RNA or protein), and the regulation mode (transcriptional, post-transcriptional or translational) on the response to a step change in the input. We have used a common model framework for gene expression where both motif structures have an activating input and repressing regulator, with the same set of parameters, to enable a comparison of the responses. We studied the global sensitivity of the system properties, such as steady-state gain, overshoot, peak time, and peak duration, to parameters. We find that, in all motifs, overshoot correlated negatively whereas peak duration varied concavely with peak time. Differences in the other system properties were found to be mainly dependent on the nature of the controller rather than the motif structure. Protein mediated motifs showed a higher degree of adaptation i.e. a tendency to return to baseline levels; in particular, feedforward motifs exhibited perfect adaptation. RNA mediated motifs had a mild regulatory effect; they also exhibited a lower peaking tendency and mean overshoot. Protein mediated feedforward motifs showed higher overshoot and lower peak time compared to the corresponding feedback motifs.

  8. Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling

    OpenAIRE

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the sta...

  9. Distinct configurations of protein complexes and biochemical pathways revealed by epistatic interaction network motifs

    LENUS (Irish Health Repository)

    Casey, Fergal

    2011-08-22

    Abstract Background Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks. Results We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter. Conclusion We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.

  10. Phylogenetic distribution of plant snoRNA families

    DEFF Research Database (Denmark)

    Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie

    2016-01-01

    RNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied...... in much detail. In plants, however, their evolution has attracted comparably little attention. RESULTS: In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom....... In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant...

  11. UKIRAN KERAWANG ACEH GAYO SEBAGAI INSPIRASI PENCIPTAAN MOTIF BATIK KHAS GAYO

    Directory of Open Access Journals (Sweden)

    Irfa ina Rohana Salma

    2016-12-01

    Full Text Available ABSTRAK Industri batik mulai berkembang di Gayo, tetapi belum memiliki motif batik khas daerah. Oleh karena itu perlu diciptakan motif batik khas Gayo, dengan mengambil inspirasi dari ukiran yang terdapat pada rumah tradisional yang biasa disebut ukiran kerawang Gayo. Tujuan penciptaan seni ini adalah untuk menciptakan motif batik yang memiliki ciri khas Gayo. Metode yang digunakan yaitu eksplorasi ide, perancangan, dan perwujudan menjadi motif batik. Dalam kegiatan ini telah diciptakan enam motif batik khas Gayo yaitu: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif Gayo Lurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. Hasil uji kesukaan terhadap motif kepada lima puluh responden menunjukkan bahwa Motif Ceplok Gayo paling banyak dipilih oleh responden yaitu sebesar 19%, sedangkan Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo Lurus 15% dan Motif Gayo Tegak 14%. Rata-rata motif yang dihasilkan mendapatkan apresiasi yang baik dari responden, sehingga semua motif layak diproduksi sebagai batik khas Gayo.Kata kunci: batik Gayo, Motif Ceplok Gayo, Motif Parang Gayo.ABSTRACTBatik industry began to develop in Gayo, but have not had a typical batik motif itself. Therefore, it is necessary to create batik motifs of Gayo, by taking inspiration from the carvings found in traditional houses commonly called kerawang Gayo. The purpose of this art is to create motifs those have a Gayo characteristic. The method used are the idea exploration, design, and motifs embodiment. In this activity has created six Gayo batik motifs, namely: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif GayoLurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. The test results fondness of the motives to fifty respondents indicated that the Motif Ceplok Gayo most preferred by respondents ie 19%, while Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo

  12. ROMANIAN TRADITIONAL MOTIF ELEMENT OF MODERNITY IN CLOTHING

    Directory of Open Access Journals (Sweden)

    ŞUTEU Marius Darius

    2017-05-01

    Full Text Available In this paper are presented the phases for improving from an aesthetic point of view a clothing item, the T-shirt for women using software design patterns, computerised graphics and textile different modern technologies including: industrial embroidery, digital printing, sublimation. In the first phase a documentation was prepared in the University of Oradea and traditional motif was selected from a collection comprising a number of Romanian traditional motifs from different parts of the country and were reintepreted and stylized whilst preserving the symbolism and color range specified to the area. For the styling phase was used CorelDraw vector graphics program that allows changing the shape, size and color of the drawings without affecting the identity of the pattern. The embroidery was done using BERNINA Embroidery Software Designer Plus Software. This software allows you to export the model to any domestic or industrial embroidery machine regardless of brand. Finally we observed the resistance of the printed and embroided model to various: elasticity, resistance to abrasion and a sensory analysis on the preservation of color. After testing we noticed the imprint resistance applied to the fabric, resulting in a quality that makes possible to keep the Romanian traditional motif from generation to generation.

  13. A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif

    Directory of Open Access Journals (Sweden)

    Asita Elengoe

    2015-01-01

    Full Text Available Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD of heat shock 70 kDa protein (PDB: 1HJO with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD simulation. Human DNA binding domain of p53 motif (SCMGGMNR retrieved from UniProt (UniProtKB: P04637 was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were −0.44 Kcal/mol and −9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy.

  14. The space of ultrametric phylogenetic trees.

    Science.gov (United States)

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  16. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif....... A special viewing feature, MHC fight, allows for display of the specificity of two different MHC molecules side by side. We show how the web server can be used to discover and display surprising similarities as well as differences between MHC molecules within and between different species. The MHC motif...

  17. The best of both worlds: Phylogenetic eigenvector regression and mapping

    Directory of Open Access Journals (Sweden)

    José Alexandre Felizola Diniz Filho

    2015-09-01

    Full Text Available Eigenfunction analyses have been widely used to model patterns of autocorrelation in time, space and phylogeny. In a phylogenetic context, Diniz-Filho et al. (1998 proposed what they called Phylogenetic Eigenvector Regression (PVR, in which pairwise phylogenetic distances among species are submitted to a Principal Coordinate Analysis, and eigenvectors are then used as explanatory variables in regression, correlation or ANOVAs. More recently, a new approach called Phylogenetic Eigenvector Mapping (PEM was proposed, with the main advantage of explicitly incorporating a model-based warping in phylogenetic distance in which an Ornstein-Uhlenbeck (O-U process is fitted to data before eigenvector extraction. Here we compared PVR and PEM in respect to estimated phylogenetic signal, correlated evolution under alternative evolutionary models and phylogenetic imputation, using simulated data. Despite similarity between the two approaches, PEM has a slightly higher prediction ability and is more general than the original PVR. Even so, in a conceptual sense, PEM may provide a technique in the best of both worlds, combining the flexibility of data-driven and empirical eigenfunction analyses and the sounding insights provided by evolutionary models well known in comparative analyses.

  18. One motif to bind them: A small-XXX-small motif affects transmembrane domain 1 oligomerization, function, localization, and cross-talk between two yeast GPCRs.

    Science.gov (United States)

    Lock, Antonia; Forfar, Rachel; Weston, Cathryn; Bowsher, Leo; Upton, Graham J G; Reynolds, Christopher A; Ladds, Graham; Dixon, Ann M

    2014-12-01

    G protein-coupled receptors (GPCRs) are the largest family of cell-surface receptors in mammals and facilitate a range of physiological responses triggered by a variety of ligands. GPCRs were thought to function as monomers, however it is now accepted that GPCR homo- and hetero-oligomers also exist and influence receptor properties. The Schizosaccharomyces pombe GPCR Mam2 is a pheromone-sensing receptor involved in mating and has previously been shown to form oligomers in vivo. The first transmembrane domain (TMD) of Mam2 contains a small-XXX-small motif, overrepresented in membrane proteins and well-known for promoting helix-helix interactions. An ortholog of Mam2 in Saccharomyces cerevisiae, Ste2, contains an analogous small-XXX-small motif which has been shown to contribute to receptor homo-oligomerization, localization and function. Here we have used experimental and computational techniques to characterize the role of the small-XXX-small motif in function and assembly of Mam2 for the first time. We find that disruption of the motif via mutagenesis leads to reduction of Mam2 TMD1 homo-oligomerization and pheromone-responsive cellular signaling of the full-length protein. It also impairs correct targeting to the plasma membrane. Mutation of the analogous motif in Ste2 yielded similar results, suggesting a conserved mechanism for assembly. Using co-expression of the two fungal receptors in conjunction with computational models, we demonstrate a functional change in G protein specificity and propose that this is brought about through hetero-dimeric interactions of Mam2 with Ste2 via the complementary small-XXX-small motifs. This highlights the potential of these motifs to affect a range of properties that can be investigated in other GPCRs. Copyright © 2014. Published by Elsevier B.V.

  19. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Science.gov (United States)

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  20. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  1. The phylogenetic likelihood library.

    Science.gov (United States)

    Flouri, T; Izquierdo-Carrasco, F; Darriba, D; Aberer, A J; Nguyen, L-T; Minh, B Q; Von Haeseler, A; Stamatakis, A

    2015-03-01

    We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2-10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL). © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  2. Analysis of the secondary structure of ITS transcripts in peritrich ciliates (Ciliophora, Oligohymenophorea): implications for structural evolution and phylogenetic reconstruction.

    Science.gov (United States)

    Sun, Ping; Clamp, John C; Xu, Dapeng

    2010-07-01

    Despite extensive previous morphological work, little agreement has been reached about phylogenetic relationships among peritrich ciliates, making it difficult to study the evolution of the group in a phylogenetic framework. In this study, the nucleotide characteristics and secondary structures of internal transcribed spacers 1 and 2 (ITS1 and ITS2) of 26 peritrich ciliates in 12 genera were analyzed. Information from secondary structures of ITS1 and ITS2 then was used to perform the first systematic study of ITS regions in peritrich ciliates, including one species of Rhabdostyla for which no sequence has been reported previously. Lengths of ITS1 and ITS2 sequences varied relatively little among taxa studied, but their G+C content was highly variable. General secondary structure models of ITS1 and ITS2 were proposed for peritrich ciliates and their reliability was assessed by compensatory base changes. The secondary structure of ITS1 contains three major helices in peritrich ciliates and deviations from this basic structure were found in all taxa examined. The core structure of peritrich ITS2 includes four helices, with helix III as the longest and containing a motif 5'-MAC versus GUK-3' at its apex as well as a YU-UY mismatch in helix II. In addition, the structural motifs of both ITS secondary structures were identified. Phylogenetic analyses using ITS data were performed by means of Bayesian inference, maximum likelihood and neighbor joining methods. Trees had a consistent branching pattern that included the following features: (1) Rhabdostyla always clustered with members of the family Vorticellidae, instead of members of the family Epistylididae, in which it is now classified on the basis of morphology. (2) The systematically questionable genus Ophrydium closely associated with Carchesium, forming a clearly defined, monophyletic group within the Vorticellidae. This supported the hypothesis derived from previous study based on small subunit rRNA gene sequences

  3. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  4. Deciphering functional glycosaminoglycan motifs in development.

    Science.gov (United States)

    Townley, Robert A; Bülow, Hannes E

    2018-03-23

    Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Fitness for synchronization of network motifs

    DEFF Research Database (Denmark)

    Vega, Y.M.; Vázquez-Prada, M.; Pacheco, A.F.

    2004-01-01

    We study the synchronization of Kuramoto's oscillators in small parts of networks known as motifs. We first report on the system dynamics for the case of a scale-free network and show the existence of a non-trivial critical point. We compute the probability that network motifs synchronize, and fi...... that the fitness for synchronization correlates well with motifs interconnectedness and structural complexity. Possible implications for present debates about network evolution in biological and other systems are discussed....

  6. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  7. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  8. Incompletely resolved phylogenetic trees inflate estimates of phylogenetic conservatism.

    Science.gov (United States)

    Davies, T Jonathan; Kraft, Nathan J B; Salamin, Nicolas; Wolkovich, Elizabeth M

    2012-02-01

    The tendency for more closely related species to share similar traits and ecological strategies can be explained by their longer shared evolutionary histories and represents phylogenetic conservatism. How strongly species traits co-vary with phylogeny can significantly impact how we analyze cross-species data and can influence our interpretation of assembly rules in the rapidly expanding field of community phylogenetics. Phylogenetic conservatism is typically quantified by analyzing the distribution of species values on the phylogenetic tree that connects them. Many phylogenetic approaches, however, assume a completely sampled phylogeny: while we have good estimates of deeper phylogenetic relationships for many species-rich groups, such as birds and flowering plants, we often lack information on more recent interspecific relationships (i.e., within a genus). A common solution has been to represent these relationships as polytomies on trees using taxonomy as a guide. Here we show that such trees can dramatically inflate estimates of phylogenetic conservatism quantified using S. P. Blomberg et al.'s K statistic. Using simulations, we show that even randomly generated traits can appear to be phylogenetically conserved on poorly resolved trees. We provide a simple rarefaction-based solution that can reliably retrieve unbiased estimates of K, and we illustrate our method using data on first flowering times from Thoreau's woods (Concord, Massachusetts, USA).

  9. The algebra of the general Markov model on phylogenetic trees and networks.

    Science.gov (United States)

    Sumner, J G; Holland, B R; Jarvis, P D

    2012-04-01

    It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.

  10. Aplikasi Ornamen Khas Maluku untuk Pengembangan Desain Motif Batik

    Directory of Open Access Journals (Sweden)

    Masiswo Masiswo

    2016-04-01

    Full Text Available ABSTRAKMaluku memiliki banyak ragam hias budaya warisan nilai leluhur berupa ornamen etnis yang merupakan kesenian dan keterampilan kerajinan. Hasil warisan tersebut sampai saat ini masih lestari hidup serta dapat dinikmati sebagai konsumsi rohani yang memuaskan manusia. Berkaitan dengan keberlangsungan nilai-nilai tradisi etnis yang berwujud pada ornamen-ornamen daerah Maluku, maka dikembangkan untuk kebutuhan manusia berupa motif batik pada kain. Pengembangan ornamen ini lebih menekankan pada representasi akan bentuk-bentuk ornamen yang diterapkan pada kerajinan batik berupa motif khas Maluku. Pengembangan alternatif desain motif batik dibuat tiga variasi yang bersumber dari ornamen khas Maluku dibuat prototipe produknya dan diuji ketahanan luntur warnanya. Hasil uji ketahanan luntur warna terhadap gosokan basah dari tiga prototipe produk berpredikat baik sekali terdapat pada “Motif Siwa” dan predikat baik pada motif “Siwa Talang” dan motif “Matahari Siwa Talang”.Kata kunci: desain, Maluku, motif batik, ornamenABSTRACTMaluku has much decorative ancestral cultural heritage value in the form of ornament ethnic arts and crafts skills. The result of the legacy is still sustainable living can be enjoyed as well as satisfying spiritual human consumption.Related to the sustainability of traditional values in the form of ethnic ornaments Maluku, it was developed for human needs in the form of batik cloth . The development of these ornaments will be more emphasis on the representation forms of ornamentation that is applied to a batik motif Maluku. Development of alternative design motif made three variations. The development of three alternative design motifs derived from the Maluku ornaments made and tested a prototype product color fastness. The test results of color fastness to wet rubbing of the three prototypes are excellent products predicated on the "Motif Siwa" and a good rating on the motif "Siwa Talang" and motif "Matahari Siwa

  11. Parole, Sintagmatik, dan Paradigmatik Motif Batik Mega Mendung

    Directory of Open Access Journals (Sweden)

    Rudi - Nababan

    2012-04-01

    Full Text Available ABSTRACT   Discussing traditional batik is related a lot to the organization system of fine arts element ac- companying it, either the pattern of the motif or the technique of the making. In this case, the motif of Mega Mendung Cirebon certainly has patterns and rules which are traditionally different from the other motifs in other areas. Through  semiotics analysis especially with Saussure and Pierce concept, it can be traced that batik with Cirebon motif, in this case Mega Mendung motif, has parole and langue system, as unique fine arts language in batik, and structure of visual syntagmatic and paradigmatic. In the context of batik motif as fine arts language, it is surely related to sign system as symbol and icon.       Keywords: visual semiotic, Cirebon’s batik.

  12. Predicting rates of interspecific interaction from phylogenetic trees.

    Science.gov (United States)

    Nuismer, Scott L; Harmon, Luke J

    2015-01-01

    Integrating phylogenetic information can potentially improve our ability to explain species' traits, patterns of community assembly, the network structure of communities, and ecosystem function. In this study, we use mathematical models to explore the ecological and evolutionary factors that modulate the explanatory power of phylogenetic information for communities of species that interact within a single trophic level. We find that phylogenetic relationships among species can influence trait evolution and rates of interaction among species, but only under particular models of species interaction. For example, when interactions within communities are mediated by a mechanism of phenotype matching, phylogenetic trees make specific predictions about trait evolution and rates of interaction. In contrast, if interactions within a community depend on a mechanism of phenotype differences, phylogenetic information has little, if any, predictive power for trait evolution and interaction rate. Together, these results make clear and testable predictions for when and how evolutionary history is expected to influence contemporary rates of species interaction. © 2014 John Wiley & Sons Ltd/CNRS.

  13. Markovian Model in High Order Sequence Prediction From Log-Motif Patterns in Agbada Paralic Section, Niger Delta, Nigeria

    International Nuclear Information System (INIS)

    Olabode, S. O.; Adekoya, J. A.

    2002-01-01

    Markovian model in the elucidation of high order sequence was applied to repetitive events of regressive and transgressive phases in the Agbada paralic section Niger Delta. The repetitive events are made up of delta front, delta topset and fluvio-deltaic sediments. The sediments consist of sands, sandstones, siltstones and shales in various proportions. Five wells: MN1, AA1, NP2, NP6 and NP8 were studied.Summary of biostratigraphic report and well log-motif patterns was used to delineate the third order depositional sequences in the wells.Various Markovian properties - observed transition frequency matrix, observed transition probability matrix, fixed probability vector, expected random matrix (randomised transition matrix) and difference matrix were determined for stacked high order sequence (high frequency cyclic events) nested within the third-order sequences using the log-motif patterns for the various sand bodies and shales. Flow diagrams were constructed for each of the depositional sequences to know the likely occurrence of number of cycles.Upward transition matrix between the log-motif patterns and flow diagram to elucidate cyclicity show that the overall regressive sequence of the Niger Delta has been modified by deltaic depositional elements and fluctuations in sea level. The predictions of higher order sequence within third order sequences from Markovian Properties provide good basis for correlation within the depositional sequences. The model has also been used to decipher the dominant depositional processes during the formation of the sequences. Discrete reservoir intervals and seal potentials within the sequences were also predicted from the flow diagrams constructed

  14. Motif statistics and spike correlations in neuronal networks

    International Nuclear Information System (INIS)

    Hu, Yu; Shea-Brown, Eric; Trousdale, James; Josić, Krešimir

    2013-01-01

    Motifs are patterns of subgraphs of complex networks. We studied the impact of such patterns of connectivity on the level of correlated, or synchronized, spiking activity among pairs of cells in a recurrent network of integrate and fire neurons. For a range of network architectures, we find that the pairwise correlation coefficients, averaged across the network, can be closely approximated using only three statistics of network connectivity. These are the overall network connection probability and the frequencies of two second order motifs: diverging motifs, in which one cell provides input to two others, and chain motifs, in which two cells are connected via a third intermediary cell. Specifically, the prevalence of diverging and chain motifs tends to increase correlation. Our method is based on linear response theory, which enables us to express spiking statistics using linear algebra, and a resumming technique, which extrapolates from second order motifs to predict the overall effect of coupling on network correlation. Our motif-based results seek to isolate the effect of network architecture perturbatively from a known network state. (paper)

  15. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  16. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  17. Genome-wide analysis of SINA family in plants and their phylogenetic relationships.

    Science.gov (United States)

    Wang, Meng; Jin, Ying; Fu, Junjie; Zhu, Yun; Zheng, Jun; Hu, Jian; Wang, Guoying

    2008-06-01

    SINA genes in plants are part of a multigene family with 5 members in Arabidopsis thaliana, 10 members in Populus trichocarpa, 6 members in Oryza sativa, at least 6 members in Zea mays and at least 1 member in Physcomitrella patens. Six members in maize were confirmed by RT-PCR. All SINAs have one RING domain and one SINA domain. These two domains are highly conserved in plants. According to the motif organization and phylogenetic tree, SINA family members were divided into 2 groups. In addition, through semi-quantitative RT-PCR analysis of maize members and Digital Northern analysis of Arabidopsis and rice members, we found that the tissue expression patterns are more diverse in monocot than in Arabidopsis.

  18. Super-transient scaling in time-delay autonomous Boolean network motifs

    Energy Technology Data Exchange (ETDEWEB)

    D' Huys, Otti, E-mail: otti.dhuys@phy.duke.edu; Haynes, Nicholas D. [Department of Physics, Duke University, Durham, North Carolina 27708 (United States); Lohmann, Johannes [Department of Physics, Duke University, Durham, North Carolina 27708 (United States); Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstraße 36, 10623 Berlin (Germany); Gauthier, Daniel J. [Department of Physics, Duke University, Durham, North Carolina 27708 (United States); Department of Physics, The Ohio State University, Columbus, Ohio 43210 (United States)

    2016-09-15

    Autonomous Boolean networks are commonly used to model the dynamics of gene regulatory networks and allow for the prediction of stable dynamical attractors. However, most models do not account for time delays along the network links and noise, which are crucial features of real biological systems. Concentrating on two paradigmatic motifs, the toggle switch and the repressilator, we develop an experimental testbed that explicitly includes both inter-node time delays and noise using digital logic elements on field-programmable gate arrays. We observe transients that last millions to billions of characteristic time scales and scale exponentially with the amount of time delays between nodes, a phenomenon known as super-transient scaling. We develop a hybrid model that includes time delays along network links and allows for stochastic variation in the delays. Using this model, we explain the observed super-transient scaling of both motifs and recreate the experimentally measured transient distributions.

  19. Molecular and phylogenetic characterization of the sieve element occlusion gene family in Fabaceae and non-Fabaceae plants.

    Science.gov (United States)

    Rüping, Boris; Ernst, Antonia M; Jekat, Stephan B; Nordzieke, Steffen; Reineke, Anna R; Müller, Boje; Bornberg-Bauer, Erich; Prüfer, Dirk; Noll, Gundula A

    2010-10-08

    The phloem of dicotyledonous plants contains specialized P-proteins (phloem proteins) that accumulate during sieve element differentiation and remain parietally associated with the cisternae of the endoplasmic reticulum in mature sieve elements. Wounding causes P-protein filaments to accumulate at the sieve plates and block the translocation of photosynthate. Specialized, spindle-shaped P-proteins known as forisomes that undergo reversible calcium-dependent conformational changes have evolved exclusively in the Fabaceae. Recently, the molecular characterization of three genes encoding forisome components in the model legume Medicago truncatula (MtSEO1, MtSEO2 and MtSEO3; SEO = sieve element occlusion) was reported, but little is known about the molecular characteristics of P-proteins in non-Fabaceae. We performed a comprehensive genome-wide comparative analysis by screening the M. truncatula, Glycine max, Arabidopsis thaliana, Vitis vinifera and Solanum phureja genomes, and a Malus domestica EST library for homologs of MtSEO1, MtSEO2 and MtSEO3 and identified numerous novel SEO genes in Fabaceae and even non-Fabaceae plants, which do not possess forisomes. Even in Fabaceae some SEO genes appear to not encode forisome components. All SEO genes have a similar exon-intron structure and are expressed predominantly in the phloem. Phylogenetic analysis revealed the presence of several subgroups with Fabaceae-specific subgroups containing all of the known as well as newly identified forisome component proteins. We constructed Hidden Markov Models that identified three conserved protein domains, which characterize SEO proteins when present in combination. In addition, one common and three subgroup specific protein motifs were found in the amino acid sequences of SEO proteins. SEO genes are organized in genomic clusters and the conserved synteny allowed us to identify several M. truncatula vs G. max orthologs as well as paralogs within the G. max genome. The unexpected

  20. CONTEMPORARY USAGE OF TRADITIONAL TURKISH MOTIFS IN PRODUCT DESIGNS

    Directory of Open Access Journals (Sweden)

    Tulay Gumuser

    2012-12-01

    Full Text Available The aim of this study is to identify the traditional Turkish motifs and its relations among present industrial designs. Traditional Turkish motifs played a very important role in 16th century onwards. The arts of the Ottoman Empire were used because of their symbolic meanings and unique styles. When we examine these motifs we encounter; Tiger Stripe, Three Spot (Çintemani, Rumi, Hatayi, Penç, Cloud, Crescent, Star, Crown, Hyacinth, Tulip and Carnation motifs. Nowadays, Turkish designers have begun to use these traditional Turkish motifs in their designs so as to create differences and awareness in the world design. The examples of these industrial designs, using the Turkish motifs, have survived and have Ottoman heritage and historical value. In this study, the Turkish motifs will be examined along with their focus on contemporary Turkish industrial designs used today.

  1. RNA motif search with data-driven element ordering.

    Science.gov (United States)

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  2. Phylogenetic Structure of Foliar Spectral Traits in Tropical Forest Canopies

    Directory of Open Access Journals (Sweden)

    Kelly M. McManus

    2016-02-01

    Full Text Available The Spectranomics approach to tropical forest remote sensing has established a link between foliar reflectance spectra and the phylogenetic composition of tropical canopy tree communities vis-à-vis the taxonomic organization of biochemical trait variation. However, a direct relationship between phylogenetic affiliation and foliar reflectance spectra of species has not been established. We sought to develop this relationship by quantifying the extent to which underlying patterns of phylogenetic structure drive interspecific variation among foliar reflectance spectra within three Neotropical canopy tree communities with varying levels of soil fertility. We interpreted the resulting spectral patterns of phylogenetic signal in the context of foliar biochemical traits that may contribute to the spectral-phylogenetic link. We utilized a multi-model ensemble to elucidate trait-spectral relationships, and quantified phylogenetic signal for spectral wavelengths and traits using Pagel’s lambda statistic. Foliar reflectance spectra showed evidence of phylogenetic influence primarily within the visible and shortwave infrared spectral regions. These regions were also selected by the multi-model ensemble as those most important to the quantitative prediction of several foliar biochemical traits. Patterns of phylogenetic organization of spectra and traits varied across sites and with soil fertility, indicative of the complex interactions between the environmental and phylogenetic controls underlying patterns of biodiversity.

  3. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Directory of Open Access Journals (Sweden)

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  4. Complete mitochondrial genomes of five skippers (Lepidoptera: Hesperiidae) and phylogenetic reconstruction of Lepidoptera.

    Science.gov (United States)

    Kim, Min Jee; Wang, Ah Rha; Park, Jeong Sun; Kim, Iksoo

    2014-10-01

    We sequenced mitogenomes of five skippers (family Hesperiidae, Lepidoptera) to obtain further insight into the characteristics of butterfly mitogenomes and performed phylogenetic reconstruction using all available gene sequences (PCGs, rRNAs, and tRNAs) from 85 species (20 families in eight superfamilies). The general genomic features found in the butterflies also were found in the five skippers: a high A+T composition (79.3%-80.9%), dominant usage of TAA stop codon, similar skewness pattern in both strands, consistently length intergenic spacer sequence between tRNA(Gln) and ND2 (64-87 bp), conserved ATACTAA motif between tRNA(Ser (UCN)) and ND1, and characteristic features of the A+T-rich region (the ATAGA motif, varying length of poly-T stretch, and poly-A stretch). The start codon for COI was CGA in four skippers as typical, but Lobocla bifasciatus evidently possessed canonical ATG as start codon. All species had the ancestral arrangement tRNA(Asn)/tRNA(Ser (AGN)), instead of the rearrangement tRNA(Ser (AGN))/tRNA(Asn), found in another skipper species (Erynnis). Phylogenetic analyses using all available genes (PCGs, rRNAS, and tRNAs) yielded the consensus superfamilial relationships ((((((Bombycoidea+Noctuoidea+Geometroidea)+Pyraloidea)+Papilionoidea)+Tortricoidea)+Yponomeutoidea)+Hepialoidea), confirming the validity of Macroheterocera (Bombycoidea, Noctuoidea, and Geometroidea in this study) and its sister relationship to Pyraloidea. Within Rhopalocera (butterflies and skippers) the familial relationships (Papilionidae+(Hesperiidae+(Pieridae+((Lycaenidae+Riodinidae)+Nymphalidae)))) were strongly supported in all analyses (0.98-1 by BI and 96-100 by ML methods), rendering invalid the superfamily status for Hesperioidea. On the other hand, current mitogenome-based phylogeny did not find consistent superfamilial relationships among Noctuoidea, Geometroidea, and Bombycoidea and the familial relationships within Bombycoidea between analyses, requiring further

  5. POWRS: position-sensitive motif discovery.

    Directory of Open Access Journals (Sweden)

    Ian W Davis

    Full Text Available Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm "POWRS" (POsition-sensitive WoRd Set for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties.BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.

  6. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  7. Analisis Unsur Matematika pada Motif Sulam Usus

    Directory of Open Access Journals (Sweden)

    Fredi Ganda Putra

    2017-12-01

    Full Text Available Based on interviews with researchers sources said that the beginning of the intestine embroidery is an art of genuine crafts. Called the intestine embroidery because this technique is a technique of combining a strand of cloth resembling the intestine formed according to the pattern by means of embroidered using a thread. Intestinal embroidery techniques were originally used to create a cover of the women's customary wardrobe of Lampung or often referred to as bebe. But not many people in Lampung, especially people who live in Lampung are still many who do not know and recognize the intestine embroidery because most only know tapis only characteristic of Lampung, besides that there are other cultural results that is embroidered intestine. There are still many who do not know that the intestine motif there is a knowledge of mathematics. The researcher's problem formulation is whether there are mathematical elements contained in the intestine embroidery motif based on the concept of geometry. The purpose of this study is to determine whether there are elements of mathematics contained in the intestine motif based on the concept of geometry. Subjects in this study consisted of 4 people obtained by purposive sampling technique. From the results of data analysis conducted by using descriptive analysis and discussion as follows: (1 Intestinal embroidery motif contains the meaning of mathematics and culture or often called Etnomatematika. On the meaning of culture there is a link between the embroidery intestine with a culture that has been there before as the existence of cultural linkage between Hindu belief Buddhism and there are similarities of motifs and decorative patterns contained in the motif embroidery intestine with ornamental variety in Indonesia. (2 The relationship between the intestine with mathematical motifs there are elements of mathematics such as geometry elements in the form of geometry of dimension one and dimension two, and the

  8. Phylogenetic Study of the Evolution of PEP-Carboxykinase

    Directory of Open Access Journals (Sweden)

    Sanjukta Aich

    2007-01-01

    Full Text Available Phosphoenolpyruvate carboxykinase (PCK is the key enzyme to initiate the gluconeogenic pathway in vertebrates, yeast, plants and most bacteria. Nucleotide specificity divided all PCKs into two groups. All the eukaryotic mammalian and most archaeal PCKs are GTP-specifi c. Bacterial and fungal PCKs can be ATP-or GTP-specific but all plant PCKs are ATPspecific. Amino acid sequence alignment of PCK enzymes shows that the nucleotide binding sites are somewhat conserved within each class with few exceptions that do not have any clear ATP- or GTP-specific binding motif. Although the active site residues are mostly conserved in all PCKs, not much significant sequence homology persists between ATP- and GTPdependent PCK enzymes. There is only one planctomycetes PCK enzyme (from Cadidatus Kuenenia stuttgartiensis that shows sequence homology with both ATP-and GTP-dependent PCKs. Phylogenetic studies have been performed to understand the evolutionary relationship of various PCKs from different sources. Based on this study a flowchart of the evolution of PCK has been proposed.

  9. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  10. Potential pitfalls of modelling ribosomal RNA data in phylogenetic tree reconstruction: evidence from case studies in the Metazoa.

    Science.gov (United States)

    Letsch, Harald O; Kjer, Karl M

    2011-05-27

    Failure to account for covariation patterns in helical regions of ribosomal RNA (rRNA) genes has the potential to misdirect the estimation of the phylogenetic signal of the data. Furthermore, the extremes of length variation among taxa, combined with regional substitution rate variation can mislead the alignment of rRNA sequences and thus distort subsequent tree reconstructions. However, recent developments in phylogenetic methodology now allow a comprehensive integration of secondary structures in alignment and tree reconstruction analyses based on rRNA sequences, which has been shown to correct some of these problems. Here, we explore the potentials of RNA substitution models and the interactions of specific model setups with the inherent pattern of covariation in rRNA stems and substitution rate variation among loop regions. We found an explicit impact of RNA substitution models on tree reconstruction analyses. The application of specific RNA models in tree reconstructions is hampered by interaction between the appropriate modelling of covarying sites in stem regions, and excessive homoplasy in some loop regions. RNA models often failed to recover reasonable trees when single-stranded regions are excessively homoplastic, because these regions contribute a greater proportion of the data when covarying sites are essentially downweighted. In this context, the RNA6A model outperformed all other models, including the more parametrized RNA7 and RNA16 models. Our results depict a trade-off between increased accuracy in estimation of interdependencies in helical regions with the risk of magnifying positions lacking phylogenetic signal. We can therefore conclude that caution is warranted when applying rRNA covariation models, and suggest that loop regions be independently screened for phylogenetic signal, and eliminated when they are indistinguishable from random noise. In addition to covariation and homoplasy, other factors, like non-stationarity of substitution rates

  11. A program to compute the soft Robinson-Foulds distance between phylogenetic networks.

    Science.gov (United States)

    Lu, Bingxin; Zhang, Louxin; Leong, Hon Wai

    2017-03-14

    Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete. A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks. Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.

  12. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Science.gov (United States)

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  13. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Directory of Open Access Journals (Sweden)

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  14. Discovery of Conservation and Diversification of miR171 Genes by Phylogenetic Analysis based on Global Genomes

    Directory of Open Access Journals (Sweden)

    Xudong Zhu

    2015-07-01

    Full Text Available The microRNA171 (miR171 family is widely distributed and highly conserved in a range of species and plays critical roles in regulating plant growth and development through repressing expression of ( transcription factors. However, information on the evolutionary conservation and functional diversification of the miRNA171 family members remains scanty. We reconstructed the phylogenetic relationships among miR171 precursor and mature sequences so as to investigate the extent and degree of evolutionary conservation of miR171 in (L. Heynh. (ath, grape ( L. (vvi, poplar ( Torr. & A.Gray ex Hook. (ptc, and rice ( L. (osa. Despite strong conservation of over 80%, some mature miR171 sequences, such as , and and , -, and -, have undergone critical sequence variation, leading to functional diversification, since they target non gene transcript(s. Phylogenetic analyses revealed a combination of old ancestral relationships and recent lineage-specific diversification in the miR171 family within the four model plants. The -regulatory motifs on the upstream promoter sequences of genes were highly divergent and shared some similar elements, indicating their possible contribution to the functional variation observed within the miR171 family. This study will buttress our understanding of the functional differentiation of miRNAs and the relationships of miRNA–target pairs based on the evolutionary history of genes.

  15. Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d

    Directory of Open Access Journals (Sweden)

    Moffatt Barbara A

    2010-08-01

    Full Text Available Abstract Background Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB for coplanar aromatic motifs similar to those found in known glycan-binding proteins. Results The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192 in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry. Conclusions Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure.

  16. Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d.

    Science.gov (United States)

    Doxey, Andrew C; Cheng, Zhenyu; Moffatt, Barbara A; McConkey, Brendan J

    2010-08-03

    Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB) for coplanar aromatic motifs similar to those found in known glycan-binding proteins. The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO) enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192) in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry. Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure.

  17. Characterization of the Complete Mitochondrion Genome of Diurnal Moth Amata emma (Butler) (Lepidoptera: Erebidae) and Its Phylogenetic Implications

    Science.gov (United States)

    Lu, Hui-Fen; Su, Tian-Juan; Luo, A-Rong; Zhu, Chao-Dong; Wu, Chun-Sheng

    2013-01-01

    Mitogenomes can provide information for phylogenetic analyses and evolutionary biology. The complete mitochondrial genome of Amata emma (Lepidoptera: Erebidae) was sequenced and analyzed in the study. The circular genome is 15,463 bp in size, with the gene content, orientation and order identical to other ditrysian insects. The genome composition of the major strand shows highly A+T biased and exhibits negative AT-skew and GC-skew. The initial codons are the canonical putative start codons ATN with the exception of cox1 gene which uses CGA instead. Ten genes share complete termination codons TAA, and three genes use incomplete stop codons TA or T. Additionally, the codon distribution and Relative Synonymous Codon Usage of the 13 PCGs in the A. emma mitogenome are consistent with those in other Noctuid mitogenomes. All tRNA genes have typical cloverleaf secondary structures, except for the trnS1 (AGN) gene, in which the dihydrouridine (DHU) arm is simplified down to a loop. The secondary structures of two rRNA genes broadly conform with the models proposed for these genes of other Lepidopteran insects. Except for the A+T-rich region, there are three major intergenic spacers, spanning at least 10 bp and five overlapping regions. There are obvious differences in the A+T-rich region between A. emma and other Lepidopteran insects reported previously except that the A+T-rich region contains an ‘ATAGA’ -like motif followed by a 19 bp poly-T stretch and a (AT)9 element preceded by the ‘ATTTA’ motif. It neither has a poly-A (in the α strand) upstream trnM nor potential stem-loop structures and just has some simple structures like (AT)nGTAT. The phylogenetic relationships based on nucleotide sequences of 13 PCGs using Bayesian inference and maximum likelihood methods provided a well-supported a broader outline of Lepidoptera and which agree with the traditional morphological classification and recently working, but with a much higher support. PMID:24069145

  18. Triadic motifs in the dependence networks of virtual societies

    Science.gov (United States)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-01

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  19. Triadic motifs in the dependence networks of virtual societies.

    Science.gov (United States)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-10

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  20. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  1. A genome-wide phylogenetic reconstruction of family 1 UDP-glycosyltransferases revealed the expansion of the family during the adaptation of plants to life on land.

    Science.gov (United States)

    Caputi, Lorenzo; Malnoy, Mickael; Goremykin, Vadim; Nikiforova, Svetlana; Martens, Stefan

    2012-03-01

    For almost a decade, our knowledge on the organisation of the family 1 UDP-glycosyltransferases (UGTs) has been limited to the model plant A. thaliana. The availability of other plant genomes represents an opportunity to obtain a broader view of the family in terms of evolution and organisation. Family 1 UGTs are known to glycosylate several classes of plant secondary metabolites. A phylogeny reconstruction study was performed to get an insight into the evolution of this multigene family during the adaptation of plants to life on land. The organisation of the UGTs in the different organisms was also investigated. More than 1500 putative UGTs were identified in 12 fully sequenced and assembled plant genomes based on the highly conserved PSPG motif. Analyses by maximum likelihood (ML) method were performed to reconstruct the phylogenetic relationships existing between the sequences. The results of this study clearly show that the UGT family expanded during the transition from algae to vascular plants and that in higher plants the clustering of UGTs into phylogenetic groups appears to be conserved, although gene loss and gene gain events seem to have occurred in certain lineages. Interestingly, two new phylogenetic groups, named O and P, that are not present in A. thaliana were discovered. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.

  2. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Phylogenetically Acquired Representations and Evolutionary Algorithms.

    OpenAIRE

    Wozniak , Adrianna

    2006-01-01

    First, we explain why Genetic Algorithms (GAs), inspired by the Modern Synthesis, do not accurately model biological evolution, being rather an artificial version of artificial, rather than natural selection. Being focused on optimisation, we propose two improvements of GAs, with the aim to successfully generate adapted, desired behaviour. The first one concerns phylogenetic grounding of meaning, a way to avoid the Symbol Grounding Problem. We give a definition of Phylogenetically Acquired Re...

  4. Fourier transform inequalities for phylogenetic trees.

    Science.gov (United States)

    Matsen, Frederick A

    2009-01-01

    Phylogenetic invariants are not the only constraints on site-pattern frequency vectors for phylogenetic trees. A mutation matrix, by its definition, is the exponential of a matrix with non-negative off-diagonal entries; this positivity requirement implies non-trivial constraints on the site-pattern frequency vectors. We call these additional constraints "edge-parameter inequalities". In this paper, we first motivate the edge-parameter inequalities by considering a pathological site-pattern frequency vector corresponding to a quartet tree with a negative internal edge. This site-pattern frequency vector nevertheless satisfies all of the constraints described up to now in the literature. We next describe two complete sets of edge-parameter inequalities for the group-based models; these constraints are square-free monomial inequalities in the Fourier transformed coordinates. These inequalities, along with the phylogenetic invariants, form a complete description of the set of site-pattern frequency vectors corresponding to bona fide trees. Said in mathematical language, this paper explicitly presents two finite lists of inequalities in Fourier coordinates of the form "monomial < or = 1", each list characterizing the phylogenetically relevant semialgebraic subsets of the phylogenetic varieties.

  5. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  6. RMOD: a tool for regulatory motif detection in signaling network.

    Directory of Open Access Journals (Sweden)

    Jinki Kim

    Full Text Available Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.

  7. Bayesian phylogenetic estimation of fossil ages.

    Science.gov (United States)

    Drummond, Alexei J; Stadler, Tanja

    2016-07-19

    Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth-death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the 'morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses.This article is part of the themed issue 'Dating species divergences using

  8. Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

    Science.gov (United States)

    Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

    2008-02-15

    KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.

  9. Coalescent methods for estimating phylogenetic trees.

    Science.gov (United States)

    Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V

    2009-10-01

    We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.

  10. DNA motif alignment by evolving a population of Markov chains.

    Science.gov (United States)

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  11. Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins

    Science.gov (United States)

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J.; Meltzer, Paul; Sathyanarayana, B. K.; FitzGerald, Peter C.; Vinson, Charles

    2012-01-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235

  12. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.

    Science.gov (United States)

    Wang, Xin; Lin, Peijie; Ho, Joshua W K

    2018-01-19

    It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

  13. Genomic characterization, phylogenetic comparison and differential expression of the cyclic nucleotide-gated channels gene family in pear (Pyrus bretchneideri Rehd.).

    Science.gov (United States)

    Chen, Jianqing; Yin, Hao; Gu, Jinping; Li, Leiting; Liu, Zhe; Jiang, Xueting; Zhou, Hongsheng; Wei, Shuwei; Zhang, Shaoling; Wu, Juyou

    2015-01-01

    The cyclic nucleotide-gated channel (CNGC) family is involved in the uptake of various cations, such as Ca(2+), to regulate plant growth and respond to biotic and abiotic stresses. However, there is far less information about this family in woody plants such as pear. Here, we provided a genome-wide identification and analysis of the CNGC gene family in pear. Phylogenetic analysis showed that the 21 pear CNGC genes could be divided into five groups (I, II, III, IVA and IVB). The majority of gene duplications in pear appeared to have been caused by segmental duplication and occurred 32.94-39.14 million years ago. Evolutionary analysis showed that positive selection had driven the evolution of pear CNGCs. Motif analyses showed that Group I CNGCs generally contained 26 motifs, which was the greatest number of motifs in all CNGC groups. Among these, eight motifs were shared by each group, suggesting that these domains play a conservative role in CNGC activity. Tissue-specific expression analysis indicated that functional diversification of the duplicated CNGC genes was a major feature of long-term evolution. Our results also suggested that the P-S6 and PBC & hinge domains had co-evolved during the evolution. These results provide valuable information to increase our understanding of the function, evolution and expression analyses of the CNGC gene family in higher plants. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. The identification of functional motifs in temporal gene expression analysis

    Directory of Open Access Journals (Sweden)

    Michael G. Surette

    2005-01-01

    Full Text Available The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.

  15. Two results on expected values of imbalance indices of phylogenetic trees

    OpenAIRE

    Mir, Arnau; Rossello, Francesc

    2012-01-01

    We compute an explicit formula for the expected value of the Colless index of a phylogenetic tree generated under the Yule model, and an explicit formula for the expected value of the Sackin index of a phylogenetic tree generated under the uniform model.

  16. Disparate requirements for the Walker A and B ATPase motifs ofhuman RAD51D in homologous recombination

    Energy Technology Data Exchange (ETDEWEB)

    Wiese, Claudia; Hinz, John M.; Tebbs, Robert S.; Nham, Peter B.; Urbin, Salustra S.; Collins, David W.; Thompson, Larry H.; Schild, David

    2006-04-21

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C, and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks. Ectopic expression of wild type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  17. Romanian traditional motif - element of modernity in clothing

    Science.gov (United States)

    Doble, L.; Stan, O.; Suteu, M. D.; Albu, A.; Bohm, G.; Tsatsarou-Michalaki, A.; Gialinou, E.

    2017-10-01

    In this paper are presented the phases for improving from an aesthetic point of view a clothing item, the jacket respectively, with a straight cut for women using software design patterns, computerised graphics and textile different modern technologies including: industrial embroidery, digital printing, sublimation. In the first phase a documentation was prepared in the Ethnographic Museum of Transylvania from Cluj Napoca where more traditional motifs were selected specific to Transylvania etnographic region and were reintepreted and stylized whilst preserving the symbolism and color range specified to the area. For the styling phase was used CorelDraw vector graphics program that allows changing the shape, size and color of the drawings without affecting the identity of the pattern. In the patterns design phase Gemini CAD software was used and for the modeling and model development Optitex software was used. The part for garnishing the model was performed using Embrodery machine software reproducing the stylized motif identically. In order to obtain a significantly improved aesthetic look and an added artistic value the pattern chosen for the jacket was done using a combination of modern textile technologies. This has allowed the realization of a particular texture on the surface of the designed product, demonstrating that traditional patterns can be reintepreted in modern clothing

  18. BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.

    Science.gov (United States)

    Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu

    2013-09-15

    Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.

  19. Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

    Science.gov (United States)

    Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

    2014-11-19

    Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new

  20. Hybrid DNA i-motif: Aminoethylprolyl-PNA (pC5) enhance the stability of DNA (dC5) i-motif structure.

    Science.gov (United States)

    Gade, Chandrasekhar Reddy; Sharma, Nagendra K

    2017-12-15

    This report describes the synthesis of C-rich sequence, cytosine pentamer, of aep-PNA and its biophysical studies for the formation of hybrid DNA:aep-PNAi-motif structure with DNA cytosine pentamer (dC 5 ) under acidic pH conditions. Herein, the CD/UV/NMR/ESI-Mass studies strongly support the formation of stable hybrid DNA i-motif structure with aep-PNA even near acidic conditions. Hence aep-PNA C-rich sequence cytosine could be considered as potential DNA i-motif stabilizing agents in vivo conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Purification and functional motifs of the recombinant ATPase of orf virus.

    Science.gov (United States)

    Lin, Fong-Yuan; Chan, Kun-Wei; Wang, Chi-Young; Wong, Min-Liang; Hsu, Wei-Li

    2011-10-01

    Our previous study showed that the recombinant ATPase encoded by the A32L gene of orf virus displayed ATP hydrolysis activity as predicted from its amino acids sequence. This viral ATPase contains four known functional motifs (motifs I-IV) and a novel AYDG motif; they are essential for ATP hydrolysis reaction by binding ATP and magnesium ions. The motifs I and II correspond with the Walker A and B motifs of the typical ATPase, respectively. To examine the biochemical roles of these five conserved motifs, recombinant ATPases of five deletion mutants derived from the Taiping strain were expressed and purified. Their ATPase functions were assayed and compared with those of two wild type strains, Taiping and Nantou isolated in Taiwan. Our results showed that deletions at motifs I-III or IV exhibited lower activity than that of the wild type. Interestingly, deletion of AYDG motif decreased the ATPase activity more significantly than those of motifs I-IV deletions. Divalent ions such as magnesium and calcium were essential for ATPase activity. Moreover, our recombinant proteins of orf virus also demonstrated GTPase activity, though weaker than the original ATPase activity. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Phylogenetic analysis of Newcastle disease viruses isolated from commercial poultry in Mozambique, 2011 to 2016

    International Nuclear Information System (INIS)

    Mapaco, L.P.; Monjane, I.V.A.; Nhamusso, A.E.; Viljoen, G.J; Dundon, W.G.; Achá, S.J.

    2016-01-01

    Full text: The complete sequence of the fusion (F) protein gene from eleven Newcastle disease viruses (NDV) isolated from commercial poultry in Mozambique between 2011 and 2016 has been generated. The F gene cleavage site motif for all eleven isolates was 112RRRKRF117 indicating that the viruses are virulent. A phylogenetic analysis using the full F gene sequence revealed that the viruses clustered within genotype VIIh and showed a higher similarity to NDVs from South Africa, China and Southeast Asia than to viruses previously described in Mozambique in 1994 to 1995 and 2005. The characterization of these new NDVs has important implications for Newcastle disease management and control in Mozambique. (author)

  3. Brickworx builds recurrent RNA and DNA structural motifs into medium- and low-resolution electron-density maps

    Energy Technology Data Exchange (ETDEWEB)

    Chojnowski, Grzegorz, E-mail: gchojnowski@genesilico.pl [International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw (Poland); Waleń, Tomasz [International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw (Poland); University of Warsaw, Banacha 2, 02-097 Warsaw (Poland); Piątkowski, Paweł; Potrzebowski, Wojciech [International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw (Poland); Bujnicki, Janusz M. [International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw (Poland); Adam Mickiewicz University, Umultowska 89, 61-614 Poznan (Poland)

    2015-03-01

    A computer program that builds crystal structure models of nucleic acid molecules is presented. Brickworx is a computer program that builds crystal structure models of nucleic acid molecules using recurrent motifs including double-stranded helices. In a first step, the program searches for electron-density peaks that may correspond to phosphate groups; it may also take into account phosphate-group positions provided by the user. Subsequently, comparing the three-dimensional patterns of the P atoms with a database of nucleic acid fragments, it finds the matching positions of the double-stranded helical motifs (A-RNA or B-DNA) in the unit cell. If the target structure is RNA, the helical fragments are further extended with recurrent RNA motifs from a fragment library that contains single-stranded segments. Finally, the matched motifs are merged and refined in real space to find the most likely conformations, including a fit of the sequence to the electron-density map. The Brickworx program is available for download and as a web server at http://iimcb.genesilico.pl/brickworx.

  4. Brickworx builds recurrent RNA and DNA structural motifs into medium- and low-resolution electron-density maps

    International Nuclear Information System (INIS)

    Chojnowski, Grzegorz; Waleń, Tomasz; Piątkowski, Paweł; Potrzebowski, Wojciech; Bujnicki, Janusz M.

    2015-01-01

    A computer program that builds crystal structure models of nucleic acid molecules is presented. Brickworx is a computer program that builds crystal structure models of nucleic acid molecules using recurrent motifs including double-stranded helices. In a first step, the program searches for electron-density peaks that may correspond to phosphate groups; it may also take into account phosphate-group positions provided by the user. Subsequently, comparing the three-dimensional patterns of the P atoms with a database of nucleic acid fragments, it finds the matching positions of the double-stranded helical motifs (A-RNA or B-DNA) in the unit cell. If the target structure is RNA, the helical fragments are further extended with recurrent RNA motifs from a fragment library that contains single-stranded segments. Finally, the matched motifs are merged and refined in real space to find the most likely conformations, including a fit of the sequence to the electron-density map. The Brickworx program is available for download and as a web server at http://iimcb.genesilico.pl/brickworx

  5. An experimental test of a fundamental food web motif.

    Science.gov (United States)

    Rip, Jason M K; McCann, Kevin S; Lynn, Denis H; Fawcett, Sonia

    2010-06-07

    Large-scale changes to the world's ecosystem are resulting in the deterioration of biostructure-the complex web of species interactions that make up ecological communities. A difficult, yet crucial task is to identify food web structures, or food web motifs, that are the building blocks of this baroque network of interactions. Once identified, these food web motifs can then be examined through experiments and theory to provide mechanistic explanations for how structure governs ecosystem stability. Here, we synthesize recent ecological research to show that generalist consumers coupling resources with different interaction strengths, is one such motif. This motif amazingly occurs across an enormous range of spatial scales, and so acts to distribute coupled weak and strong interactions throughout food webs. We then perform an experiment that illustrates the importance of this motif to ecological stability. We find that weak interactions coupled to strong interactions by generalist consumers dampen strong interaction strengths and increase community stability. This study takes a critical step by isolating a common food web motif and through clear, experimental manipulation, identifies the fundamental stabilizing consequences of this structure for ecological communities.

  6. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  7. Phylogenetic turnover during subtropical forest succession across environmental and phylogenetic scales

    OpenAIRE

    Purschke, Oliver; Michalski, Stefan G.; Bruelheide, Helge; Durka, Walter

    2017-01-01

    Abstract Although spatial and temporal patterns of phylogenetic community structure during succession are inherently interlinked and assembly processes vary with environmental and phylogenetic scales, successional studies of community assembly have yet to integrate spatial and temporal components of community structure, while accounting for scaling issues. To gain insight into the processes that generate biodiversity after disturbance, we combine analyses of spatial and temporal phylogenetic ...

  8. qPMS7: a fast algorithm for finding (ℓ, d-motifs in DNA and protein sequences.

    Directory of Open Access Journals (Sweden)

    Hieu Dinh

    Full Text Available Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d-motif search (or Planted Motif Search (PMS. A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS, is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.

  9. Codon based co-occurrence network motifs in human mitochondria

    Directory of Open Access Journals (Sweden)

    Pramod Shinde

    2017-10-01

    Full Text Available The nucleotide polymorphism in human mitochondrial genome (mtDNA tolled by codon position bias plays an indispensable role in human population dispersion and expansion. Herein, we constructed genome-wide nucleotide co-occurrence networks using a massive data consisting of five different geographical regions and around 3000 samples for each region. We developed a powerful network model to describe complex mitochondrial evolutionary patterns between codon and non-codon positions. It was interesting to report a different evolution of Asian genomes than those of the rest which is divulged by network motifs. We found evidence that mtDNA undergoes substantial amounts of adaptive evolution, a finding which was supported by a number of previous studies. The dominance of higher order motifs indicated the importance of long-range nucleotide co-occurrence in genomic diversity. Most notably, codon motifs apparently underpinned the preferences among codon positions for co-evolution which is probably highly biased during the origin of the genetic code. Our analyses manifested that codon position co-evolution is very well conserved across human sub-populations and independently maintained within human sub-populations implying the selective role of evolutionary processes on codon position co-evolution. Ergo, this study provided a framework to investigate cooperative genomic interactions which are critical in underlying complex mitochondrial evolution.

  10. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  11. Disparate requirements for the Walker A and B ATPase motifs of human RAD51D in homologous recombination.

    Science.gov (United States)

    Wiese, Claudia; Hinz, John M; Tebbs, Robert S; Nham, Peter B; Urbin, Salustra S; Collins, David W; Thompson, Larry H; Schild, David

    2006-01-01

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks (ICLs). Ectopic expression of wild-type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  12. Armadillo motifs involved in vesicular transport.

    Directory of Open Access Journals (Sweden)

    Harald Striegl

    Full Text Available Armadillo (ARM repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  13. Characterizing Motif Dynamics of Electric Brain Activity Using Symbolic Analysis

    Directory of Open Access Journals (Sweden)

    Massimiliano Zanin

    2014-10-01

    Full Text Available Motifs are small recurring circuits of interactions which constitute the backbone of networked systems. Characterizing motif dynamics is therefore key to understanding the functioning of such systems. Here we propose a method to define and quantify the temporal variability and time scales of electroencephalogram (EEG motifs of resting brain activity. Given a triplet of EEG sensors, links between them are calculated by means of linear correlation; each pattern of links (i.e., each motif is then associated to a symbol, and its appearance frequency is analyzed by means of Shannon entropy. Our results show that each motif becomes observable with different coupling thresholds and evolves at its own time scale, with fronto-temporal sensors emerging at high thresholds and changing at fast time scales, and parietal ones at low thresholds and changing at slower rates. Finally, while motif dynamics differed across individuals, for each subject, it showed robustness across experimental conditions, indicating that it could represent an individual dynamical signature.

  14. Improved i-motif thermal stability by insertion of anthraquinone monomers

    DEFF Research Database (Denmark)

    Gouda, Alaa S; Amine, Mahasen S.; Pedersen, Erik Bjerregaard

    2017-01-01

    In order to gain insight into how to improve thermal stability of i-motifs when used in the context of biomedical and nanotechnological applications, novel anthraquinone-modified i-motifs were synthesized by insertion of 1,8-, 1,4-, 1,5- and 2,6-disubstituted anthraquinone monomers into the TAA...... loops of a 22mer cytosine-rich human telomeric DNA sequence. The influence of the four anthraquinone linkers on the i-motif thermal stability was investigated at 295 nm and pH 5.5. Anthraquinone monomers modulate the i-motif stability in a position-depending manner and the modulation also depends...... unlocked nucleic acid monomers or twisted intercalating nucleic acid. The 2,6-disubstituted anthraquinone linker replacing T10 enabled a significant increase of i-motif thermal melting by 8.2 °C. A substantial increase of 5.0 °C in i-motif thermal melting was recorded when both A6 and T16 were modified...

  15. Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations.

    Science.gov (United States)

    Kobert, K; Stamatakis, A; Flouri, T

    2017-03-01

    The phylogenetic likelihood function (PLF) is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection, and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory savings attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 12-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the PLF currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation. [Algorithms; maximum likelihood; phylogenetic likelihood function; phylogenetics]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  16. EM for phylogenetic topology reconstruction on nonhomogeneous data.

    Science.gov (United States)

    Ibáñez-Marcelo, Esther; Casanellas, Marta

    2014-06-17

    The reconstruction of the phylogenetic tree topology of four taxa is, still nowadays, one of the main challenges in phylogenetics. Its difficulties lie in considering not too restrictive evolutionary models, and correctly dealing with the long-branch attraction problem. The correct reconstruction of 4-taxon trees is crucial for making quartet-based methods work and being able to recover large phylogenies. We adapt the well known expectation-maximization algorithm to evolutionary Markov models on phylogenetic 4-taxon trees. We then use this algorithm to estimate the substitution parameters, compute the corresponding likelihood, and to infer the most likely quartet. In this paper we consider an expectation-maximization method for maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models on trees. We study its success on reconstructing 4-taxon topologies and its performance as input method in quartet-based phylogenetic reconstruction methods such as QFIT and QuartetSuite. Our results show that the method proposed here outperforms neighbor-joining and the usual (time-homogeneous continuous-time) maximum likelihood methods on 4-leaved trees with among-lineage instantaneous rate heterogeneity, and perform similarly to usual continuous-time maximum-likelihood when data satisfies the assumptions of both methods. The method presented in this paper is well suited for reconstructing the topology of any number of taxa via quartet-based methods and is highly accurate, specially regarding largely divergent trees and time nonhomogeneous data.

  17. GenNon-h: Generating multiple sequence alignments on nonhomogeneous phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Kedzierska Anna M

    2012-08-01

    Full Text Available Abstract Background A number of software packages are available to generate DNA multiple sequence alignments (MSAs evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages. Results We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site, the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. Conclusion The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.

  18. Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters

    Directory of Open Access Journals (Sweden)

    Farré Domènec

    2007-12-01

    Full Text Available Abstract Background The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes. Results We observe that promoters driving housekeeping gene expression are enriched in particular motifs with strong positional bias, such as YY1, which are of little relevance in promoters driving tissue-specific expression. We also identify a large number of motifs that show positional bias in genes expressed in a highly tissue-specific manner. They include well-known tissue-specific motifs, such as HNF1 and HNF4 motifs in liver, kidney and small intestine, or RFX motifs in testis, as well as many potentially novel regulatory motifs. Based on this analysis, we provide predictions for 559 tissue-specific motifs in mouse gene promoters. Conclusion The study shows that motif positional bias is an important feature of mammalian proximal promoters and that it affects both general and tissue-specific motifs. Motif positional constraints define very distinct promoter architectures depending on breadth of expression and type of tissue.

  19. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.; Rangkuti, Farania; Schramm, Michael C.; Jankovic, Boris R.; Kamau, Allan; Chowdhary, Rajesh; Archer, John A.C.; Bajic, Vladimir B.

    2011-01-01

    . These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity

  20. Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins

    Science.gov (United States)

    de Beauchene, Isaure Chauvot; de Vries, Sjoerd J.; Zacharias, Martin

    2016-01-01

    Abstract Protein-RNA complexes are important for many biological processes. However, structural modeling of such complexes is hampered by the high flexibility of RNA. Particularly challenging is the docking of single-stranded RNA (ssRNA). We have developed a fragment-based approach to model the structure of ssRNA bound to a protein, based on only the protein structure, the RNA sequence and conserved contacts. The conformational diversity of each RNA fragment is sampled by an exhaustive library of trinucleotides extracted from all known experimental protein–RNA complexes. The method was applied to ssRNA with up to 12 nucleotides which bind to dimers of the RNA recognition motifs (RRMs), a highly abundant eukaryotic RNA-binding domain. The fragment based docking allows a precise de novo atomic modeling of protein-bound ssRNA chains. On a benchmark of seven experimental ssRNA–RRM complexes, near-native models (with a mean heavy-atom deviation of <3 Å from experiment) were generated for six out of seven bound RNA chains, and even more precise models (deviation < 2 Å) were obtained for five out of seven cases, a significant improvement compared to the state of the art. The method is not restricted to RRMs but was also successfully applied to Pumilio RNA binding proteins. PMID:27131381

  1. An Analysis of Multi-type Relational Interactions in FMA Using Graph Motifs with Disjointness Constraints

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation. PMID:23304382

  2. An analysis of multi-type relational interactions in FMA using graph motifs with disjointness constraints.

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation.

  3. Structural motifs of pre-nucleation clusters.

    Science.gov (United States)

    Zhang, Y; Türkmen, I R; Wassermann, B; Erko, A; Rühl, E

    2013-10-07

    Structural motifs of pre-nucleation clusters prepared in single, optically levitated supersaturated aqueous aerosol microparticles containing CaBr2 as a model system are reported. Cluster formation is identified by means of X-ray absorption in the Br K-edge regime. The salt concentration beyond the saturation point is varied by controlling the humidity in the ambient atmosphere surrounding the 15-30 μm microdroplets. This leads to the formation of metastable supersaturated liquid particles. Distinct spectral shifts in near-edge spectra as a function of salt concentration are observed, in which the energy position of the Br K-edge is red-shifted by up to 7.1 ± 0.4 eV if the dilute solution is compared to the solid. The K-edge positions of supersaturated solutions are found between these limits. The changes in electronic structure are rationalized in terms of the formation of pre-nucleation clusters. This assumption is verified by spectral simulations using first-principle density functional theory and molecular dynamics calculations, in which structural motifs are considered, explaining the experimental results. These consist of solvated CaBr2 moieties, rather than building blocks forming calcium bromide hexahydrates, the crystal system that is formed by drying aqueous CaBr2 solutions.

  4. Reversible polymorphism-aware phylogenetic models and their application to tree inference.

    Science.gov (United States)

    Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin

    2016-10-21

    We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Sequencing of complete mitochondrial genomes confirms synonymization of Hyalomma asiaticum asiaticum and kozlovi, and advances phylogenetic hypotheses for the Ixodidae.

    Science.gov (United States)

    Liu, Zhi-Qiang; Liu, Yan-Feng; Kuermanali, Nuer; Wang, Deng-Feng; Chen, Shi-Jun; Guo, Hui-Ling; Zhao, Li; Wang, Jun-Wei; Han, Tao; Wang, Yuan-Zhi; Wang, Jie; Shen, Chen-Feng; Zhang, Zhuang-Zhi; Chen, Chuang-Fu

    2018-01-01

    Phylogeny of hard ticks (Ixodidae) remains unresolved. Mitochondrial genomes (mitogenomes) are increasingly used to resolve phylogenetic controversies, but remain unavailable for the entire large Hyalomma genus. Hyalomma asiaticum is a parasitic tick distributed throughout the Asia. As a result of great morphological variability, two subspecies have been recognised historically; until a morphological data-based synonymization was proposed. However, this hypothesis was never tested using molecular data. Therefore, objectives of this study were to: 1. sequence the first Hyalomma mitogenome; 2. scrutinise the proposed synonymization using molecular data, i.e. complete mitogenomes of both subspecies: H. a. asiaticum and kozlovi; 3. conduct phylogenomic and comparative analyses of all available Ixodidae mitogenomes. Results corroborate the proposed synonymization: the two mitogenomes are almost identical (99.6%). Genomic features of both mitogenomes are standard for Metastriata; which includes the presence of two control regions and all three "Tick-Box" motifs. Gene order and strand distribution are perfectly conserved for the entire Metastriata group. Suspecting compositional biases, we conducted phylogenetic analyses (29 almost complete mitogenomes) using homogeneous and heterogeneous (CAT) models of substitution. The results were congruent, apart from the deep-level topology of prostriate ticks (Ixodes): the homogeneous model produced a monophyletic Ixodes, but the CAT model produced a paraphyletic Ixodes (and thereby Prostriata), divided into Australasian and non-Australasian clades. This topology implies that all metastriate ticks have evolved from the ancestor of the non-Australian branch of prostriate ticks. Metastriata was divided into three clades: 1. Amblyomminae and Rhipicephalinae (Rhipicephalus, Hyalomma, Dermacentor); 2. Haemaphysalinae and Bothriocrotoninae, plus Amblyomma sphenodonti; 3. Amblyomma elaphense, basal to all Metastriata. We conclude that

  6. The Verrucomicrobia LexA-binding Motif: Insights into the Evolutionary Dynamics of the SOS Response

    Directory of Open Access Journals (Sweden)

    Ivan Erill

    2016-07-01

    Full Text Available The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  7. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

    Science.gov (United States)

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  8. Methods and statistics for combining motif match scores.

    Science.gov (United States)

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  9. Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction.

    Science.gov (United States)

    Beiko, Robert G; Ragan, Mark A

    2009-01-01

    Phylogenomic methods can be used to investigate the tangled evolutionary relationships among genomes. Building 'all the trees of all the genes' can potentially identify common pathways of horizontal gene transfer (HGT) among taxa at varying levels of phylogenetic depth. Phylogenetic affinities can be aggregated and merged with the information about genetic linkage and biochemical function to examine hypotheses of adaptive evolution via HGT. Additionally, the use of many genetic data sets increases the power of statistical tests for phylogenetic artifacts. However, large-scale phylogenetic analyses pose several challenges, including the necessary abandonment of manual validation techniques, the need to translate inferred phylogenetic discordance into inferred HGT events, and the challenges involved in aggregating results from search-based inference methods. In this chapter we describe a tree search procedure to recover the most parsimonious pathways of HGT, and examine some of the assumptions that are made by this method.

  10. Climate-driven extinctions shape the phylogenetic structure of temperate tree floras.

    Science.gov (United States)

    Eiserhardt, Wolf L; Borchsenius, Finn; Plum, Christoffer M; Ordonez, Alejandro; Svenning, Jens-Christian

    2015-03-01

    When taxa go extinct, unique evolutionary history is lost. If extinction is selective, and the intrinsic vulnerabilities of taxa show phylogenetic signal, more evolutionary history may be lost than expected under random extinction. Under what conditions this occurs is insufficiently known. We show that late Cenozoic climate change induced phylogenetically selective regional extinction of northern temperate trees because of phylogenetic signal in cold tolerance, leading to significantly and substantially larger than random losses of phylogenetic diversity (PD). The surviving floras in regions that experienced stronger extinction are phylogenetically more clustered, indicating that non-random losses of PD are of increasing concern with increasing extinction severity. Using simulations, we show that a simple threshold model of survival given a physiological trait with phylogenetic signal reproduces our findings. Our results send a strong warning that we may expect future assemblages to be phylogenetically and possibly functionally depauperate if anthropogenic climate change affects taxa similarly. © 2015 John Wiley & Sons Ltd/CNRS.

  11. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

    Science.gov (United States)

    Waddell, Peter J; Ota, Rissa; Penny, David

    2009-10-01

    Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.

  12. Phylogenetic comparative methods complement discriminant function analysis in ecomorphology.

    Science.gov (United States)

    Barr, W Andrew; Scott, Robert S

    2014-04-01

    In ecomorphology, Discriminant Function Analysis (DFA) has been used as evidence for the presence of functional links between morphometric variables and ecological categories. Here we conduct simulations of characters containing phylogenetic signal to explore the performance of DFA under a variety of conditions. Characters were simulated using a phylogeny of extant antelope species from known habitats. Characters were modeled with no biomechanical relationship to the habitat category; the only sources of variation were body mass, phylogenetic signal, or random "noise." DFA on the discriminability of habitat categories was performed using subsets of the simulated characters, and Phylogenetic Generalized Least Squares (PGLS) was performed for each character. Analyses were repeated with randomized habitat assignments. When simulated characters lacked phylogenetic signal and/or habitat assignments were random, ecomorphology. Copyright © 2013 Wiley Periodicals, Inc.

  13. Memetic algorithms for de novo motif-finding in biomedical sequences.

    Science.gov (United States)

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro

  14. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  15. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  16. Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network

    Directory of Open Access Journals (Sweden)

    Barabási Albert-László

    2004-01-01

    Full Text Available Abstract Background Transcriptional regulation of cellular functions is carried out through a complex network of interactions among transcription factors and the promoter regions of genes and operons regulated by them.To better understand the system-level function of such networks simplification of their architecture was previously achieved by identifying the motifs present in the network, which are small, overrepresented, topologically distinct regulatory interaction patterns (subgraphs. However, the interaction of such motifs with each other, and their form of integration into the full network has not been previously examined. Results By studying the transcriptional regulatory network of the bacterium, Escherichia coli, we demonstrate that the two previously identified motif types in the network (i.e., feed-forward loops and bi-fan motifs do not exist in isolation, but rather aggregate into homologous motif clusters that largely overlap with known biological functions. Moreover, these clusters further coalesce into a supercluster, thus establishing distinct topological hierarchies that show global statistical properties similar to the whole network. Targeted removal of motif links disintegrates the network into small, isolated clusters, while random disruptions of equal number of links do not cause such an effect. Conclusion Individual motifs aggregate into homologous motif clusters and a supercluster forming the backbone of the E. coli transcriptional regulatory network and play a central role in defining its global topological organization.

  17. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  18. Do motifs reflect evolved function?--No convergent evolution of genetic regulatory network subgraph topologies.

    Science.gov (United States)

    Knabe, Johannes F; Nehaniv, Chrystopher L; Schilstra, Maria J

    2008-01-01

    Methods that analyse the topological structure of networks have recently become quite popular. Whether motifs (subgraph patterns that occur more often than in randomized networks) have specific functions as elementary computational circuits has been cause for debate. As the question is difficult to resolve with currently available biological data, we approach the issue using networks that abstractly model natural genetic regulatory networks (GRNs) which are evolved to show dynamical behaviors. Specifically one group of networks was evolved to be capable of exhibiting two different behaviors ("differentiation") in contrast to a group with a single target behavior. In both groups we find motif distribution differences within the groups to be larger than differences between them, indicating that evolutionary niches (target functions) do not necessarily mold network structure uniquely. These results show that variability operators can have a stronger influence on network topologies than selection pressures, especially when many topologies can create similar dynamics. Moreover, analysis of motif functional relevance by lesioning did not suggest that motifs were of greater importance to the functioning of the network than arbitrary subgraph patterns. Only when drastically restricting network size, so that one motif corresponds to a whole functionally evolved network, was preference for particular connection patterns found. This suggests that in non-restricted, bigger networks, entanglement with the rest of the network hinders topological subgraph analysis.

  19. Evaluation of properties over phylogenetic trees using stochastic logics.

    Science.gov (United States)

    Requeno, José Ignacio; Colom, José Manuel

    2016-06-14

    Model checking has been recently introduced as an integrated framework for extracting information of the phylogenetic trees using temporal logics as a querying language, an extension of modal logics that imposes restrictions of a boolean formula along a path of events. The phylogenetic tree is considered a transition system modeling the evolution as a sequence of genomic mutations (we understand mutation as different ways that DNA can be changed), while this kind of logics are suitable for traversing it in a strict and exhaustive way. Given a biological property that we desire to inspect over the phylogeny, the verifier returns true if the specification is satisfied or a counterexample that falsifies it. However, this approach has been only considered over qualitative aspects of the phylogeny. In this paper, we repair the limitations of the previous framework for including and handling quantitative information such as explicit time or probability. To this end, we apply current probabilistic continuous-time extensions of model checking to phylogenetics. We reinterpret a catalog of qualitative properties in a numerical way, and we also present new properties that couldn't be analyzed before. For instance, we obtain the likelihood of a tree topology according to a mutation model. As case of study, we analyze several phylogenies in order to obtain the maximum likelihood with the model checking tool PRISM. In addition, we have adapted the software for optimizing the computation of maximum likelihoods. We have shown that probabilistic model checking is a competitive framework for describing and analyzing quantitative properties over phylogenetic trees. This formalism adds soundness and readability to the definition of models and specifications. Besides, the existence of model checking tools hides the underlying technology, omitting the extension, upgrade, debugging and maintenance of a software tool to the biologists. A set of benchmarks justify the feasibility of our

  20. Nonbinary Tree-Based Phylogenetic Networks.

    Science.gov (United States)

    Jetten, Laura; van Iersel, Leo

    2018-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can, for example, represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.

  1. Exopolysaccharide-associated protein sorting in environmental organisms: the PEP-CTERM/EpsH system. Application of a novel phylogenetic profiling heuristic

    Directory of Open Access Journals (Sweden)

    Ward Naomi

    2006-08-01

    Full Text Available Abstract Background Protein translocation to the proper cellular destination may be guided by various classes of sorting signals recognizable in the primary sequence. Detection in some genomes, but not others, may reveal sorting system components by comparison of the phylogenetic profile of the class of sorting signal to that of various protein families. Results We describe a short C-terminal homology domain, sporadically distributed in bacteria, with several key characteristics of protein sorting signals. The domain includes a near-invariant motif Pro-Glu-Pro (PEP. This possible recognition or processing site is followed by a predicted transmembrane helix and a cluster rich in basic amino acids. We designate this domain PEP-CTERM. It tends to occur multiple times in a genome if it occurs at all, with a median count of eight instances; Verrucomicrobium spinosum has sixty-five. PEP-CTERM-containing proteins generally contain an N-terminal signal peptide and exhibit high diversity and little homology to known proteins. All bacteria with PEP-CTERM have both an outer membrane and exopolysaccharide (EPS production genes. By a simple heuristic for screening phylogenetic profiles in the absence of pre-formed protein families, we discovered that a homolog of the membrane protein EpsH (exopolysaccharide locus protein H occurs in a species when PEP-CTERM domains are found. The EpsH family contains invariant residues consistent with a transpeptidase function. Most PEP-CTERM proteins are encoded by single-gene operons preceded by large intergenic regions. In the Proteobacteria, most of these upstream regions share a DNA sequence, a probable cis-regulatory site that contains a sigma-54 binding motif. The phylogenetic profile for this DNA sequence exactly matches that of three proteins: a sigma-54-interacting response regulator (PrsR, a transmembrane histidine kinase (PrsK, and a TPR protein (PrsT. Conclusion These findings are consistent with the hypothesis

  2. Phylogenetic analysis reveals conservation and diversification of micro RNA166 genes among diverse plant species.

    Science.gov (United States)

    Barik, Suvakanta; SarkarDas, Shabari; Singh, Archita; Gautam, Vibhav; Kumar, Pramod; Majee, Manoj; Sarkar, Ananda K

    2014-01-01

    Similar to the majority of the microRNAs, mature miR166s are derived from multiple members of MIR166 genes (precursors) and regulate various aspects of plant development by negatively regulating their target genes (Class III HD-ZIP). The evolutionary conservation or functional diversification of miRNA166 family members remains elusive. Here, we show the phylogenetic relationships among MIR166 precursor and mature sequences from three diverse model plant species. Despite strong conservation, some mature miR166 sequences, such as ppt-miR166m, have undergone sequence variation. Critical sequence variation in ppt-miR166m has led to functional diversification, as it targets non-HD-ZIPIII gene transcript (s). MIR166 precursor sequences have diverged in a lineage specific manner, and both precursors and mature osa-miR166i/j are highly conserved. Interestingly, polycistronic MIR166s were present in Physcomitrella and Oryza but not in Arabidopsis. The nature of cis-regulatory motifs on the upstream promoter sequences of MIR166 genes indicates their possible contribution to the functional variation observed among miR166 species. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites

    KAUST Repository

    Wong, Aloysius Tze

    2015-06-09

    Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.

  4. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites

    KAUST Repository

    Wong, Aloysius Tze; Gehring, Christoph A; Irving, Helen R.

    2015-01-01

    Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.

  5. Dynamic motifs in socio-economic networks

    Science.gov (United States)

    Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

    2014-12-01

    Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.

  6. DendroPy: a Python library for phylogenetic computing.

    Science.gov (United States)

    Sukumaran, Jeet; Holder, Mark T

    2010-06-15

    DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy).

  7. Assessing local structure motifs using order parameters for motif recognition, interstitial identification, and diffusion path characterization

    Science.gov (United States)

    Zimmermann, Nils E. R.; Horton, Matthew K.; Jain, Anubhav; Haranczyk, Maciej

    2017-11-01

    Structure-property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells) of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal closed packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP) database (61,422 compounds) for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT) facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  8. Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

    Science.gov (United States)

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478

  9. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon; Patil, Sachin; Fhayli, Karim; Alsaiari, Shahad K.; Khashab, Niveen M.

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  10. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  11. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  12. Estimating phylogenetic trees from genome-scale data.

    Science.gov (United States)

    Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

    2015-12-01

    The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data. © 2015 New York Academy of Sciences.

  13. Visualizing phylogenetic tree landscapes.

    Science.gov (United States)

    Wilgenbusch, James C; Huang, Wen; Gallivan, Kyle A

    2017-02-02

    Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from multi-source data sets using current consensus tree methods discards valuable information and can disguise potential methodological problems. Discovery of efficient and accurate dimensionality reduction methods used to display at once in 2- or 3- dimensions the relationship among these competing phylogenies will help practitioners diagnose the limits of current evolutionary models and potential problems with phylogenetic reconstruction methods when analyzing large multi-source data sets. We introduce several dimensionality reduction methods to visualize in 2- and 3-dimensions the relationship among competing phylogenies obtained from gene partitions found in three mid- to large-size mitochondrial genome alignments. We test the performance of these dimensionality reduction methods by applying several goodness-of-fit measures. The intrinsic dimensionality of each data set is also estimated to determine whether projections in 2- and 3-dimensions can be expected to reveal meaningful relationships among trees from different data partitions. Several new approaches to aid in the comparison of different phylogenetic landscapes are presented. Curvilinear Components Analysis (CCA) and a stochastic gradient decent (SGD) optimization method give the best representation of the original tree-to-tree distance matrix for each of the three- mitochondrial genome alignments and greatly outperformed the method currently used to visualize tree landscapes. The CCA + SGD method converged at least as fast as previously applied methods for visualizing tree landscapes. We demonstrate for all three mtDNA alignments that 3D

  14. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  15. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    OpenAIRE

    Huang, Hsi-Yuan; Chien, Chia-Hung; Jen, Kuan-Hua; Huang, Hsien-Da

    2006-01-01

    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5′-untra...

  16. Transforming phylogenetic networks: Moving beyond tree space

    OpenAIRE

    Huber, Katharina T.; Moulton, Vincent; Wu, Taoyang

    2016-01-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transforme...

  17. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Christian J. Michel

    2017-12-01

    Full Text Available A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C 3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X , using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X , in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level, and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R . We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions. This property is true for all cardinalities of X motifs (from 4 to 20 and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non- X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together

  18. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

    2017-12-03

    A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first

  19. Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization

    Directory of Open Access Journals (Sweden)

    Nils E. R. Zimmermann

    2017-11-01

    Full Text Available Structure–property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors, as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal close packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP database (61,422 compounds for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  20. The Complete Mitochondrial Genome of Corizus tetraspilus (Hemiptera: Rhopalidae) and Phylogenetic Analysis of Pentatomomorpha

    Science.gov (United States)

    Guo, Zhong-Long; Wang, Juan; Shen, Yu-Ying

    2015-01-01

    Insect mitochondrial genome (mitogenome) are the most extensively used genetic information for molecular evolution, phylogenetics and population genetics. Pentatomomorpha (>14,000 species) is the second largest infraorder of Heteroptera and of great economic importance. To better understand the diversity and phylogeny within Pentatomomorpha, we sequenced and annotated the complete mitogenome of Corizus tetraspilus (Hemiptera: Rhopalidae), an important pest of alfalfa in China. We analyzed the main features of the C. tetraspilus mitogenome, and provided a comparative analysis with four other Coreoidea species. Our results reveal that gene content, gene arrangement, nucleotide composition, codon usage, rRNA structures and sequences of mitochondrial transcription termination factor are conserved in Coreoidea. Comparative analysis shows that different protein-coding genes have been subject to different evolutionary rates correlated with the G+C content. All the transfer RNA genes found in Coreoidea have the typical clover leaf secondary structure, except for trnS1 (AGN) which lacks the dihydrouridine (DHU) arm and possesses a unusual anticodon stem (9 bp vs. the normal 5 bp). The control regions (CRs) among Coreoidea are highly variable in size, of which the CR of C. tetraspilus is the smallest (440 bp), making the C. tetraspilus mitogenome the smallest (14,989 bp) within all completely sequenced Coreoidea mitogenomes. No conserved motifs are found in the CRs of Coreoidea. In addition, the A+T content (60.68%) of the CR of C. tetraspilus is much lower than that of the entire mitogenome (74.88%), and is lowest among Coreoidea. Phylogenetic analyses based on mitogenomic data support the monophyly of each superfamily within Pentatomomorpha, and recognize a phylogenetic relationship of (Aradoidea + (Pentatomoidea + (Lygaeoidea + (Pyrrhocoroidea + Coreoidea)))). PMID:26042898

  1. Phylogenetic Inference of HIV Transmission Clusters

    Directory of Open Access Journals (Sweden)

    Vlad Novitsky

    2017-10-01

    Full Text Available Better understanding the structure and dynamics of HIV transmission networks is essential for designing the most efficient interventions to prevent new HIV transmissions, and ultimately for gaining control of the HIV epidemic. The inference of phylogenetic relationships and the interpretation of results rely on the definition of the HIV transmission cluster. The definition of the HIV cluster is complex and dependent on multiple factors, including the design of sampling, accuracy of sequencing, precision of sequence alignment, evolutionary models, the phylogenetic method of inference, and specified thresholds for cluster support. While the majority of studies focus on clusters, non-clustered cases could also be highly informative. A new dimension in the analysis of the global and local HIV epidemics is the concept of phylogenetically distinct HIV sub-epidemics. The identification of active HIV sub-epidemics reveals spreading viral lineages and may help in the design of targeted interventions.HIVclustering can also be affected by sampling density. Obtaining a proper sampling density may increase statistical power and reduce sampling bias, so sampling density should be taken into account in study design and in interpretation of phylogenetic results. Finally, recent advances in long-range genotyping may enable more accurate inference of HIV transmission networks. If performed in real time, it could both inform public-health strategies and be clinically relevant (e.g., drug-resistance testing.

  2. Community Phylogenetics: Assessing Tree Reconstruction Methods and the Utility of DNA Barcodes

    Science.gov (United States)

    Boyle, Elizabeth E.; Adamowicz, Sarah J.

    2015-01-01

    Studies examining phylogenetic community structure have become increasingly prevalent, yet little attention has been given to the influence of the input phylogeny on metrics that describe phylogenetic patterns of co-occurrence. Here, we examine the influence of branch length, tree reconstruction method, and amount of sequence data on measures of phylogenetic community structure, as well as the phylogenetic signal (Pagel’s λ) in morphological traits, using Trichoptera larval communities from Churchill, Manitoba, Canada. We find that model-based tree reconstruction methods and the use of a backbone family-level phylogeny improve estimations of phylogenetic community structure. In addition, trees built using the barcode region of cytochrome c oxidase subunit I (COI) alone accurately predict metrics of phylogenetic community structure obtained from a multi-gene phylogeny. Input tree did not alter overall conclusions drawn for phylogenetic signal, as significant phylogenetic structure was detected in two body size traits across input trees. As the discipline of community phylogenetics continues to expand, it is important to investigate the best approaches to accurately estimate patterns. Our results suggest that emerging large datasets of DNA barcode sequences provide a vast resource for studying the structure of biological communities. PMID:26110886

  3. RNA recognition motif (RRM)-containing proteins in Bombyx mori

    African Journals Online (AJOL)

    STORAGESEVER

    2009-03-20

    Mar 20, 2009 ... Recognition Motif (RRM), sometimes referred to as. RNP1, is one of the first identified domains for RNA interaction. RRM is very common ..... Apart from the RRM motif, eIF3-S9 has a Trp-Asp. (WD) repeat domain, Poly (A) ...

  4. GNG Motifs Can Replace a GGG Stretch during G-Quadruplex Formation in a Context Dependent Manner.

    Directory of Open Access Journals (Sweden)

    Kohal Das

    Full Text Available G-quadruplexes are one of the most commonly studied non-B DNA structures. Generally, these structures are formed using a minimum of 4, three guanine tracts, with connecting loops ranging from one to seven. Recent studies have reported deviation from this general convention. One such deviation is the involvement of bulges in the guanine tracts. In this study, guanines along with bulges, also referred to as GNG motifs have been extensively studied using recently reported HOX11 breakpoint fragile region I as a model template. By strategic mutagenesis approach we show that the contribution from continuous G-tracts may be dispensible during G-quadruplex formation when such motifs are flanked by GNGs. Importantly, the positioning and number of GNG/GNGNG can also influence the formation of G-quadruplexes. Further, we assessed three genomic regions from HIF1 alpha, VEGF and SHOX gene for G-quadruplex formation using GNG motifs. We show that HIF1 alpha sequence harbouring GNG motifs can fold into intramolecular G-quadruplex. In contrast, GNG motifs in mutant VEGF sequence could not participate in structure formation, suggesting that the usage of GNG is context dependent. Importantly, we show that when two continuous stretches of guanines are flanked by two independent GNG motifs in a naturally occurring sequence (SHOX, it can fold into an intramolecular G-quadruplex. Finally, we show the specific binding of G-quadruplex binding protein, Nucleolin and G-quadruplex antibody, BG4 to SHOX G-quadruplex. Overall, our study provides novel insights into the role of GNG motifs in G-quadruplex structure formation which may have both physiological and pathological implications.

  5. The macroecology of phylogenetically structured hummingbird-plant networks

    DEFF Research Database (Denmark)

    González, Ana M. Martín; Dalsgaard, Bo; Nogues, David Bravo

    2015-01-01

    Aim To investigate the association between hummingbird–plant network structure and species richness, phylogenetic signal on species' interaction pattern, insularity and historical and current climate. Location Fifty-four communities along a c. 10,000 km latitudinal gradient across the Americas (39...... approach, we examined the influence of species richness, phylogenetic signal, insularity and current and historical climate conditions on network structure (null-model-corrected specialization and modularity). Results Phylogenetically related species, especially plants, showed a tendency to interact...... with a similar array of mutualistic partners. The spatial variation in network structure exhibited a constant association with species phylogeny (R2 = 0.18–0.19); however, network structure showed the strongest association with species richness and environmental factors (R2 = 0.20–0.44 and R2 = 0...

  6. THE MOTIF OF THE PRODIGAL SON IN IVAN TURGENEV'S NOVELS

    Directory of Open Access Journals (Sweden)

    Valentina Ivanovna Gabdullina

    2013-11-01

    Full Text Available The author questions the perception of Ivan Turgenev as a “non- Christian writer” and studies the problem of the prodigal son motif functioning in a series of his novels. In his novels, Turgenev pictured different phases of the archetypal story, originating from the Gospel parable of the prodigal son. In the novel Rudin he depicted the phase of spiritual wanderings of the hero who had lost touch with his native land — Russia. In his next novels (Home of the Gentry, Fathers and Sons and Smoke, after leading his hero in circles and sending him back to his paternal home, Turgenev reconstructs the model of human behavior, represented in the parable, thereby recognizing the immutability of the idea formalized in the Gospel. The motif of the return to Russian land gets its completion in Turgenev's last novel Virgin Soil, in which the author paradoxically connects the Westernist idea with the Gospel imperative. Solomin, the son of a deacon, sent by his wise father out to Europe “to get education”, studies in England, masters the European knowledge and returns back “to his native land” to establish his own business in inland Russia. Thus, a series of Turgenev's novels, in which he portrayed different phases of social life, are interlinked with the motif of the prodigal son, who is represented by novels' main characters.

  7. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif

    Directory of Open Access Journals (Sweden)

    Launey Thomas

    2011-06-01

    Full Text Available Abstract Background The interactions between PDZ (PSD-95, Dlg, ZO-1 domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C- terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Results Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V or type-II (x-x-V-x-I/V PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode. We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Conclusions Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  8. Fingerprint motifs of phytases | Fan | African Journal of Biotechnology

    African Journals Online (AJOL)

    Among the total of potential 173 phytases gained in 11 plant genomes through MAST, PAPhys are the major phytases, and HAPhys are the minor, and other phytase groups are not found in planta. Keywords: Phytase, fingerprint motif, multiple EM for motif elicitation (MEME), MAST African Journal of Biotechnology Vol.

  9. Short Arginine Motifs Drive Protein Stickiness in the Escherichia coli Cytoplasm.

    Science.gov (United States)

    Kyne, Ciara; Crowley, Peter B

    2017-09-19

    Although essential to numerous biotech applications, knowledge of molecular recognition by arginine-rich motifs in live cells remains limited. 1 H, 15 N HSQC and 19 F NMR spectroscopies were used to investigate the effects of C-terminal -GR n (n = 1-5) motifs on GB1 interactions in Escherichia coli cells and cell extracts. While the "biologically inert" GB1 yields high-quality in-cell spectra, the -GR n fusions with n = 4 or 5 were undetectable. This result suggests that a tetra-arginine motif is sufficient to drive interactions between a test protein and macromolecules in the E. coli cytoplasm. The inclusion of a 12 residue flexible linker between GB1 and the -GR 5 motif did not improve detection of the "inert" domain. In contrast, all of the constructs were detectable in cell lysates and extracts, suggesting that the arginine-mediated complexes were weak. Together these data reveal the significance of weak interactions between short arginine-rich motifs and the E. coli cytoplasm and demonstrate the potential of such motifs to modify protein interactions in living cells. These interactions must be considered in the design of (in vivo) nanoscale assemblies that rely on arginine-rich sequences.

  10. Lipase genes in Mucor circinelloides: identification, sub-cellular location, phylogenetic analysis and expression profiling during growth and lipid accumulation.

    Science.gov (United States)

    Zan, Xinyi; Tang, Xin; Chu, Linfang; Zhao, Lina; Chen, Haiqin; Chen, Yong Q; Chen, Wei; Song, Yuanda

    2016-10-01

    Lipases or triacylglycerol hydrolases are widely spread in nature and are particularly common in the microbial world. The filamentous fungus Mucor circinelloides is a potential lipase producer, as it grows well in triacylglycerol-contained culture media. So far only one lipase from M. circinelloides has been characterized, while the majority of lipases remain unknown in this fungus. In the present study, 47 potential lipase genes in M. circinelloides WJ11 and 30 potential lipase genes in M. circinelloides CBS 277.49 were identified by extensive bioinformatics analysis. An overview of these lipases is presented, including several characteristics, sub-cellular location, phylogenetic analysis and expression profiling of the lipase genes during growth and lipid accumulation. All of these proteins contained the consensus sequence for a classical lipase (GXSXG motif) and were divided into four types including α/β-hydrolase_1, α/β-hydrolase_3, class_3 and GDSL lipase (GDSL) based on gene annotations. Phylogenetic analyses revealed that class_3 family and α/β-hydrolase_3 family were the conserved lipase family in M. circinelloides. Additionally, some lipases also contained a typical acyltransferase motif of H-(X) 4-D, and these lipases may play a dual role in lipid metabolism, catalyzing both lipid hydrolysis and transacylation reactions. The differential expression of all lipase genes were confirmed by quantitative real-time PCR, and the expression profiling were analyzed to predict the possible biological roles of these lipase genes in lipid metabolism in M. circinelloides. We preliminarily hypothesized that lipases may be involved in triacylglycerol degradation, phospholipid synthesis and beta-oxidation. Moreover, the results of sub-cellular localization, the presence of signal peptide and transcriptional analyses of lipase genes indicated that four lipase in WJ11 most likely belong to extracellular lipases with a signal peptide. These findings provide a platform

  11. Anion induced conformational preference of Cα NN motif residues in functional proteins.

    Science.gov (United States)

    Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

    2017-12-01

    Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.

  12. On Nakhleh's metric for reduced phylogenetic networks

    OpenAIRE

    Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente Feruglio, Gabriel Alejandro

    2009-01-01

    We prove that Nakhleh’s metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, semibinary tree-sibling time consistent phylogenetic networks, and multilabeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phyl...

  13. Phylogenetic Analysis Using Protein Mass Spectrometry.

    Science.gov (United States)

    Ma, Shiyong; Downard, Kevin M; Wong, Jason W H

    2017-01-01

    Through advances in molecular biology, comparative analysis of DNA sequences is currently the cornerstone in the study of molecular evolution and phylogenetics. Nevertheless, protein mass spectrometry offers some unique opportunities to enable phylogenetic analyses in organisms where DNA may be difficult or costly to obtain. To date, the methods of phylogenetic analysis using protein mass spectrometry can be classified into three categories: (1) de novo protein sequencing followed by classical phylogenetic reconstruction, (2) direct phylogenetic reconstruction using proteolytic peptide mass maps, and (3) mapping of mass spectral data onto classical phylogenetic trees. In this chapter, we provide a brief description of the three methods and the protocol for each method along with relevant tools and algorithms.

  14. Gene regulatory and signaling networks exhibit distinct topological distributions of motifs

    Science.gov (United States)

    Ferreira, Gustavo Rodrigues; Nakaya, Helder Imoto; Costa, Luciano da Fontoura

    2018-04-01

    The biological processes of cellular decision making and differentiation involve a plethora of signaling pathways and gene regulatory circuits. These networks in turn exhibit a multitude of motifs playing crucial parts in regulating network activity. Here we compare the topological placement of motifs in gene regulatory and signaling networks and observe that it suggests different evolutionary strategies in motif distribution for distinct cellular subnetworks.

  15. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  16. CD3 gamma contains a phosphoserine-dependent di-leucine motif involved in down-regulation of the T cell receptor

    DEFF Research Database (Denmark)

    Dietrich, J; Hou, X; Wegener, A M

    1994-01-01

    -regulation of the TCR. Furthermore, analysis of a series of CD3 gamma truncation mutants indicated that in addition to S126 phosphorylation a motif C-terminal of S126 was required for TCR down-regulation. Point mutation analyses confirmed this observation and demonstrated that a membrane-proximal di-leucine motif (L131......, indicating that the TCR was down-regulated by endocytosis via clathrin coated pits. Based on the present results and previously published observations on intracellular receptor sorting, a general model for intracellular sorting of receptors containing di-leucine- or tyrosine-based motifs is proposed....

  17. Identification of high-efficiency 3′GG gRNA motifs in indexed FASTA files with ngg2

    Directory of Open Access Journals (Sweden)

    Elisha D. Roberson

    2015-11-01

    Full Text Available CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens. I also scanned the genomes of pig (Sus scrofa and African elephant (Loxodonta africana to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.

  18. A proposed vestigial translation initiation motif in VP1 of hepatitis A virus.

    Science.gov (United States)

    Kang, Jeong-Ah; Funkhouser, Ann W

    2002-07-01

    The internal ribosome entry site (IRES) of picornaviruses has a 3' polypyrimidine tract (PPT) 16-24 bases upstream of an AUG triplet (PPT/AUG motif). This motif is critical in determining the efficiency of cap-independent translation. HAV has a conserved PPT/AUG motif consisting of a nine base sequence (AGGUUUUUC) 23 bases upstream of the preferred AUG start codon. This HAV-specific PPT/AUG motif is repeated and conserved in VP1 of HAV, but not of other picornaviruses. We proposed that the PPT/AUG motif in the open reading frame initiated translation and/or had an impact on the life cycle of the virus. In vitro translation of mutant bicistronic mRNAs and growth in cell culture of mutant viruses provided no evidence that the VP1 PPT/AUG motif had any impact on either translation or growth. HAV differs from other picornaviruses in its inefficient growth in cell culture. Since the HAV-specific PPT/AUG motif is found in only 1 in 300,000 reported viral sequences outside the hepatovirus genus, this motif may be a vestigial translation initiation element and may have played a role in determining the unusual phenotype of HAV.

  19. CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures

    Directory of Open Access Journals (Sweden)

    Hamed Bostan

    2012-01-01

    Full Text Available Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.

  20. BEAM web server: a tool for structural RNA motif discovery.

    Science.gov (United States)

    Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2018-03-15

    RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.

  1. Identify Beta-Hairpin Motifs with Quadratic Discriminant Algorithm Based on the Chemical Shifts.

    Directory of Open Access Journals (Sweden)

    Feng YongE

    Full Text Available Successful prediction of the beta-hairpin motif will be helpful for understanding the of the fold recognition. Some algorithms have been proposed for the prediction of beta-hairpin motifs. However, the parameters used by these methods were primarily based on the amino acid sequences. Here, we proposed a novel model for predicting beta-hairpin structure based on the chemical shift. Firstly, we analyzed the statistical distribution of chemical shifts of six nuclei in not beta-hairpin and beta-hairpin motifs. Secondly, we used these chemical shifts as features combined with three algorithms to predict beta-hairpin structure. Finally, we achieved the best prediction, namely sensitivity of 92%, the specificity of 94% with 0.85 of Mathew's correlation coefficient using quadratic discriminant analysis algorithm, which is clearly superior to the same method for the prediction of beta-hairpin structure from 20 amino acid compositions in the three-fold cross-validation. Our finding showed that the chemical shift is an effective parameter for beta-hairpin prediction, suggesting the quadratic discriminant analysis is a powerful algorithm for the prediction of beta-hairpin.

  2. Transforming phylogenetic networks: Moving beyond tree space.

    Science.gov (United States)

    Huber, Katharina T; Moulton, Vincent; Wu, Taoyang

    2016-09-07

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Review article: The mountain motif in the plot of Matthew

    Directory of Open Access Journals (Sweden)

    Gert J. Volschenk

    2010-09-01

    Full Text Available This article reviewed T.L. Donaldson’s book, Jesus on the mountain: A study in Matthean theology, published in 1985 by JSOT Press, Sheffield, and focused on the mountain motif in the structure and plot of the Gospel of Matthew, in addition to the work of Donaldson on the mountain motif as a literary motif and as theological symbol. The mountain is a primary theological setting for Jesus’ ministry and thus is an important setting, serving as one of the literary devices by which Matthew structured and progressed his narrative. The Zion theological and eschatological significance and Second Temple Judaism serve as the historical and theological background for the mountain motif. The last mountain setting (Mt 28:16–20 is the culmination of the three theological themes in the plot of Matthew, namely Christology, ecclesiology and salvation history.

  4. Phylogenetic and Pathotypic Characterization of a Newcastle Disease Virus Strain Isolated from Ducks and Pigeons in Hubei, China

    Directory of Open Access Journals (Sweden)

    Y Wang

    Full Text Available ABSTRACT Newcastle disease is a highly contagious disease responsible for major outbreaks and considerable economic losses in the poultry industry in China. There is still little information available regarding gene characterization of the NDV, especially in ducks and pigeons. Therefore, the aim of this study was to investigate NDV isolated from ducks and pigeons in Hubei, China. In this study, three NDVs from ducks and pigeons were isolated between 2013 and 2015.The fusion protein (F gene of the NDV isolates was sequenced and phylogenetically analyzed. The clinical signs and gross histopathological lesions were examined. Phylogenetic analysis of these strains indicated that all the sequences are classified as genotype II. The isolates shared a 112 G-R-Q-G-R-L 117motif at the F protein cleavage site, indicating that these three isolates strains are lentogenic. Necropsy and histopathology showed the typical pathological changes. It was concluded that commercial ducks and pigeons in Hubei province carry lentogenic NDV strains with regular genetic divergence, indicating that these species may act as the main reservoirs of NDV in poultry. Therefore, strategies and surveillance should be undertaken to reduce the risk of ND outbreaks.

  5. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Science.gov (United States)

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  6. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  7. Evaluating the microbial diversity of an in vitro model of the human large intestine by phylogenetic microarray analysis

    NARCIS (Netherlands)

    Rajilic-Stojanovic, M.; Maathuis, A.; Heilig, G.H.J.; Venema, K.; Vos, de W.M.; Smidt, H.

    2010-01-01

    A high-density phylogenetic microarray targeting small subunit rRNA (SSU rRNA) sequences of over 1000 microbial phylotypes of the human gastrointestinal tract, the HITChip, was used to assess the impact of faecal inoculum preparation and operation conditions on an in vitro model of the human large

  8. On the origin of distribution patterns of motifs in biological networks

    Directory of Open Access Journals (Sweden)

    Lesk Arthur M

    2008-08-01

    Full Text Available Abstract Background Inventories of small subgraphs in biological networks have identified commonly-recurring patterns, called motifs. The inference that these motifs have been selected for function rests on the idea that their occurrences are significantly more frequent than random. Results Our analysis of several large biological networks suggests, in contrast, that the frequencies of appearance of common subgraphs are similar in natural and corresponding random networks. Conclusion Indeed, certain topological features of biological networks give rise naturally to the common appearance of the motifs. We therefore question whether frequencies of occurrences are reasonable evidence that the structures of motifs have been selected for their functional contribution to the operation of networks.

  9. Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

    Science.gov (United States)

    König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

    2013-01-01

    G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141

  10. A format for phylogenetic placements.

    Directory of Open Access Journals (Sweden)

    Frederick A Matsen

    Full Text Available We have developed a unified format for phylogenetic placements, that is, mappings of environmental sequence data (e.g., short reads into a phylogenetic tree. We are motivated to do so by the growing number of tools for computing and post-processing phylogenetic placements, and the lack of an established standard for storing them. The format is lightweight, versatile, extensible, and is based on the JSON format, which can be parsed by most modern programming languages. Our format is already implemented in several tools for computing and post-processing parsimony- and likelihood-based phylogenetic placements and has worked well in practice. We believe that establishing a standard format for analyzing read placements at this early stage will lead to a more efficient development of powerful and portable post-analysis tools for the growing applications of phylogenetic placement.

  11. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Thermal Stability of Modified i-Motif Oligonucleotides with Naphthalimide Intercalating Nucleic Acids

    DEFF Research Database (Denmark)

    El-Sayed, Ahmed Ali; Pedersen, Erik B.; Khaireldin, Nahid Y.

    2016-01-01

    In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion of naphtha......In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion...... of naphthalimide (1H-benzo[de]isoquinoline-1,3(2H)-dione) as the intercalating nucleic acid. The stabilities of i-motif structures with inserted naphthalimide intercalating nucleotides were studied using UV melting temperatures (Tm) and circular dichroism spectra at different pH values and conditions (crowding...

  13. Phylogenetic signal dissection identifies the root of starfishes.

    Directory of Open Access Journals (Sweden)

    Roberto Feuda

    Full Text Available Relationships within the class Asteroidea have remained controversial for almost 100 years and, despite many attempts to resolve this problem using molecular data, no consensus has yet emerged. Using two nuclear genes and a taxon sampling covering the major asteroid clades we show that non-phylogenetic signal created by three factors--Long Branch Attraction, compositional heterogeneity and the use of poorly fitting models of evolution--have confounded accurate estimation of phylogenetic relationships. To overcome the effect of this non-phylogenetic signal we analyse the data using non-homogeneous models, site stripping and the creation of subpartitions aimed to reduce or amplify the systematic error, and calculate Bayes Factor support for a selection of previously suggested topological arrangements of asteroid orders. We show that most of the previous alternative hypotheses are not supported in the most reliable data partitions, including the previously suggested placement of either Forcipulatida or Paxillosida as sister group to the other major branches. The best-supported solution places Velatida as the sister group to other asteroids, and the implications of this finding for the morphological evolution of asteroids are presented.

  14. Phylogenetic inertia and Darwin's higher law.

    Science.gov (United States)

    Shanahan, Timothy

    2011-03-01

    The concept of 'phylogenetic inertia' is routinely deployed in evolutionary biology as an alternative to natural selection for explaining the persistence of characteristics that appear sub-optimal from an adaptationist perspective. However, in many of these contexts the precise meaning of 'phylogenetic inertia' and its relationship to selection are far from clear. After tracing the history of the concept of 'inertia' in evolutionary biology, I argue that treating phylogenetic inertia and natural selection as alternative explanations is mistaken because phylogenetic inertia is, from a Darwinian point of view, simply an expected effect of selection. Although Darwin did not discuss 'phylogenetic inertia,' he did assert the explanatory priority of selection over descent. An analysis of 'phylogenetic inertia' provides a perspective from which to assess Darwin's view. Copyright © 2010 Elsevier Ltd. All rights reserved.

  15. Sequential dynamics in the motif of excitatory coupled elements

    Science.gov (United States)

    Korotkov, Alexander G.; Kazakov, Alexey O.; Osipov, Grigory V.

    2015-11-01

    In this article a new model of motif (small ensemble) of neuron-like elements is proposed. It is built with the use of the generalized Lotka-Volterra model with excitatory couplings. The main motivation for this work comes from the problems of neuroscience where excitatory couplings are proved to be the predominant type of interaction between neurons of the brain. In this paper it is shown that there are two modes depending on the type of coupling between the elements: the mode with a stable heteroclinic cycle and the mode with a stable limit cycle. Our second goal is to examine the chaotic dynamics of the generalized three-dimensional Lotka-Volterra model.

  16. Quartet-net: a quartet-based method to reconstruct phylogenetic networks.

    Science.gov (United States)

    Yang, Jialiang; Grünewald, Stefan; Wan, Xiu-Feng

    2013-05-01

    Phylogenetic networks can model reticulate evolutionary events such as hybridization, recombination, and horizontal gene transfer. However, reconstructing such networks is not trivial. Popular character-based methods are computationally inefficient, whereas distance-based methods cannot guarantee reconstruction accuracy because pairwise genetic distances only reflect partial information about a reticulate phylogeny. To balance accuracy and computational efficiency, here we introduce a quartet-based method to construct a phylogenetic network from a multiple sequence alignment. Unlike distances that only reflect the relationship between a pair of taxa, quartets contain information on the relationships among four taxa; these quartets provide adequate capacity to infer a more accurate phylogenetic network. In applications to simulated and biological data sets, we demonstrate that this novel method is robust and effective in reconstructing reticulate evolutionary events and it has the potential to infer more accurate phylogenetic distances than other conventional phylogenetic network construction methods such as Neighbor-Joining, Neighbor-Net, and Split Decomposition. This method can be used in constructing phylogenetic networks from simple evolutionary events involving a few reticulate events to complex evolutionary histories involving a large number of reticulate events. A software called "Quartet-Net" is implemented and available at http://sysbio.cvm.msstate.edu/QuartetNet/.

  17. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  18. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2015-01-01

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  19. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  20. The transposition distance for phylogenetic trees

    OpenAIRE

    Rossello, Francesc; Valiente, Gabriel

    2006-01-01

    The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phylogenetic trees. In this paper, we generalize the transposition distance from fully resolved to arbi...

  1. Identification of the divergent calmodulin binding motif in yeast Ssb1/Hsp75 protein and in other HSP70 family members.

    Science.gov (United States)

    Heinen, R C; Diniz-Mendes, L; Silva, J T; Paschoalin, V M F

    2006-11-01

    Yeast soluble proteins were fractionated by calmodulin-agarose affinity chromatography and the Ca2+/calmodulin-binding proteins were analyzed by SDS-PAGE. One prominent protein of 66 kDa was excised from the gel, digested with trypsin and the masses of the resultant fragments were determined by MALDI/MS. Twenty-one of 38 monoisotopic peptide masses obtained after tryptic digestion were matched to the heat shock protein Ssb1/Hsp75, covering 37% of its sequence. Computational analysis of the primary structure of Ssb1/Hsp75 identified a unique potential amphipathic alpha-helix in its N-terminal ATPase domain with features of target regions for Ca2+/calmodulin binding. This region, which shares 89% similarity to the experimentally determined calmodulin-binding domain from mouse, Hsc70, is conserved in near half of the 113 members of the HSP70 family investigated, from yeast to plant and animals. Based on the sequence of this region, phylogenetic analysis grouped the HSP70s in three distinct branches. Two of them comprise the non-calmodulin binding Hsp70s BIP/GR78, a subfamily of eukaryotic HSP70 localized in the endoplasmic reticulum, and DnaK, a subfamily of prokaryotic HSP70. A third heterogeneous group is formed by eukaryotic cytosolic HSP70s containing the new calmodulin-binding motif and other cytosolic HSP70s whose sequences do not conform to those conserved motif, indicating that not all eukaryotic cytosolic Hsp70s are target for calmodulin regulation. Furthermore, the calmodulin-binding domain found in eukaryotic HSP70s is also the target for binding of Bag-1 - an enhancer of ADP/ATP exchange activity of Hsp70s. A model in which calmodulin displaces Bag-1 and modulates Ssb1/Hsp75 chaperone activity is discussed.

  2. Identification of the divergent calmodulin binding motif in yeast Ssb1/Hsp75 protein and in other HSP70 family members

    Directory of Open Access Journals (Sweden)

    R.C. Heinen

    2006-11-01

    Full Text Available Yeast soluble proteins were fractionated by calmodulin-agarose affinity chromatography and the Ca2+/calmodulin-binding proteins were analyzed by SDS-PAGE. One prominent protein of 66 kDa was excised from the gel, digested with trypsin and the masses of the resultant fragments were determined by MALDI/MS. Twenty-one of 38 monoisotopic peptide masses obtained after tryptic digestion were matched to the heat shock protein Ssb1/Hsp75, covering 37% of its sequence. Computational analysis of the primary structure of Ssb1/Hsp75 identified a unique potential amphipathic alpha-helix in its N-terminal ATPase domain with features of target regions for Ca2+/calmodulin binding. This region, which shares 89% similarity to the experimentally determined calmodulin-binding domain from mouse, Hsc70, is conserved in near half of the 113 members of the HSP70 family investigated, from yeast to plant and animals. Based on the sequence of this region, phylogenetic analysis grouped the HSP70s in three distinct branches. Two of them comprise the non-calmodulin binding Hsp70s BIP/GR78, a subfamily of eukaryotic HSP70 localized in the endoplasmic reticulum, and DnaK, a subfamily of prokaryotic HSP70. A third heterogeneous group is formed by eukaryotic cytosolic HSP70s containing the new calmodulin-binding motif and other cytosolic HSP70s whose sequences do not conform to those conserved motif, indicating that not all eukaryotic cytosolic Hsp70s are target for calmodulin regulation. Furthermore, the calmodulin-binding domain found in eukaryotic HSP70s is also the target for binding of Bag-1 - an enhancer of ADP/ATP exchange activity of Hsp70s. A model in which calmodulin displaces Bag-1 and modulates Ssb1/Hsp75 chaperone activity is discussed.

  3. Phylogenetic trees in bioinformatics

    Energy Technology Data Exchange (ETDEWEB)

    Burr, Tom L [Los Alamos National Laboratory

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  4. Efficient sequential and parallel algorithms for finding edit distance based motifs.

    Science.gov (United States)

    Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar

    2016-08-18

    Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in

  5. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  6. Computational identification and phylogenetic analysis of the oil-body structural proteins, oleosin and caleosin, in castor bean and flax.

    Science.gov (United States)

    Hyun, Tae Kyung; Kumar, Dhinesh; Cho, Young-Yeol; Hyun, Hae-Nam; Kim, Ju-Sung

    2013-02-25

    Oil bodies (OBs) are the intracellular particles derived from oilseeds. These OBs store lipids as a carbon resource, and have been exploited for a variety of industrial applications including biofuels. Oleosin and caleosin are the common OB structural proteins which are enabling biotechnological enhancement of oil content and OB-based pharmaceutical formations via stabilizing OBs. Although the draft whole genome sequence information for Ricinus communis L. (castor bean) and Linum usitatissimum L. (flax), important oil seed plants, is available in public database, OB-structural proteins in these plants are poorly indentified. Therefore, in this study, we performed a comprehensive bioinformatic analysis including analysis of the genome sequence, conserved domains and phylogenetic relationships to identify OB structural proteins in castor bean and flax genomes. Using comprehensive analysis, we have identified 6 and 15 OB-structural proteins from castor bean and flax, respectively. A complete overview of this gene family in castor bean and flax is presented, including the gene structures, phylogeny and conserved motifs, resulting in the presence of central hydrophobic regions with proline knot motif, providing an evolutionary proof that this central hydrophobic region had evolved from duplications in the primitive eukaryotes. In addition, expression analysis of L-oleosin and caleosin genes using quantitative real-time PCR demonstrated that seed contained their maximum expression, except that RcCLO-1 expressed maximum in cotyledon. Thus, our comparative genomics analysis of oleosin and caleosin genes and their putatively encoded proteins in two non-model plant species provides insights into the prospective usage of gene resources for improving OB-stability. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Organization of feed-forward loop motifs reveals architectural principles in natural and engineered networks.

    Science.gov (United States)

    Gorochowski, Thomas E; Grierson, Claire S; di Bernardo, Mario

    2018-03-01

    Network motifs are significantly overrepresented subgraphs that have been proposed as building blocks for natural and engineered networks. Detailed functional analysis has been performed for many types of motif in isolation, but less is known about how motifs work together to perform complex tasks. To address this issue, we measure the aggregation of network motifs via methods that extract precisely how these structures are connected. Applying this approach to a broad spectrum of networked systems and focusing on the widespread feed-forward loop motif, we uncover striking differences in motif organization. The types of connection are often highly constrained, differ between domains, and clearly capture architectural principles. We show how this information can be used to effectively predict functionally important nodes in the metabolic network of Escherichia coli . Our findings have implications for understanding how networked systems are constructed from motif parts and elucidate constraints that guide their evolution.

  8. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

    Science.gov (United States)

    Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2017-07-27

    Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  10. Locating a tree in a phylogenetic network

    NARCIS (Netherlands)

    Iersel, van L.J.J.; Semple, C.; Steel, M.A.

    2010-01-01

    Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster

  11. [Cover motifs of the Tidsskrift. A 14-year cavalcade].

    Science.gov (United States)

    Nylenna, M

    1998-12-10

    In 1985 the Journal of the Norwegian Medical Association changed its cover policy, moving the table of contents inside the Journal and introducing cover illustrations. This article provides an analysis of all cover illustrations published over this 14-year period, 420 covers in all. There is a great variation in cover motifs and designs and a development towards more general motifs. The initial emphasis on historical and medical aspects is now less pronounced, while the use of works of art and nature motifs has increased, and the cover now more often has a direct bearing on the specific contents of the issue. Professor of medical history Oivind Larsen has photographed two thirds of the covers and contributed 95% of the inside essay-style reflections on the cover motif. Over the years, he has expanded the role of the historian of medicine disseminating knowledge to include that of the raconteur with a personal tone of voice. The Journal's covers are now one of its most characteristic features, emblematic of the Journal's ambition of standing for quality and timelessness vis-à-vis the news media, and of its aim of bridging the gap between medicine and the humanities.

  12. Insights into the motif preference of APOBEC3 enzymes.

    Directory of Open Access Journals (Sweden)

    Diako Ebrahimi

    Full Text Available We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

  13. Robustness of ancestral sequence reconstruction to phylogenetic uncertainty.

    Science.gov (United States)

    Hanson-Smith, Victor; Kolaczkowski, Bryan; Thornton, Joseph W

    2010-09-01

    Ancestral sequence reconstruction (ASR) is widely used to formulate and test hypotheses about the sequences, functions, and structures of ancient genes. Ancestral sequences are usually inferred from an alignment of extant sequences using a maximum likelihood (ML) phylogenetic algorithm, which calculates the most likely ancestral sequence assuming a probabilistic model of sequence evolution and a specific phylogeny--typically the tree with the ML. The true phylogeny is seldom known with certainty, however. ML methods ignore this uncertainty, whereas Bayesian methods incorporate it by integrating the likelihood of each ancestral state over a distribution of possible trees. It is not known whether Bayesian approaches to phylogenetic uncertainty improve the accuracy of inferred ancestral sequences. Here, we use simulation-based experiments under both simplified and empirically derived conditions to compare the accuracy of ASR carried out using ML and Bayesian approaches. We show that incorporating phylogenetic uncertainty by integrating over topologies very rarely changes the inferred ancestral state and does not improve the accuracy of the reconstructed ancestral sequence. Ancestral state reconstructions are robust to uncertainty about the underlying tree because the conditions that produce phylogenetic uncertainty also make the ancestral state identical across plausible trees; conversely, the conditions under which different phylogenies yield different inferred ancestral states produce little or no ambiguity about the true phylogeny. Our results suggest that ML can produce accurate ASRs, even in the face of phylogenetic uncertainty. Using Bayesian integration to incorporate this uncertainty is neither necessary nor beneficial.

  14. I-motif DNA structures are formed in the nuclei of human cells

    Science.gov (United States)

    Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

    2018-06-01

    Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.

  15. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2018-03-11

    Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucine-aspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteome-wide assessment of these motifs, we developed an active-learning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter-species comparison revealed a conserved LD signalling core, and reveals the emergence of species-specific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.

  16. Locating a tree in a phylogenetic network

    OpenAIRE

    van Iersel, Leo; Semple, Charles; Steel, Mike

    2010-01-01

    Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in the network. Both problems are known to be NP-complete in general. In this article, we consider t...

  17. Nonbinary tree-based phylogenetic networks

    OpenAIRE

    Jetten, Laura; van Iersel, Leo

    2016-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and st...

  18. Encoding phylogenetic trees in terms of weighted quartets.

    Science.gov (United States)

    Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles

    2008-04-01

    One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.

  19. A CRE/AP-1-like motif is essential for induced syncytin-2 expression and fusion in human trophoblast-like model.

    Directory of Open Access Journals (Sweden)

    Chirine Toufaily

    Full Text Available Syncytin-2 is encoded by the envelope gene of Endogenous Retrovirus-FRD (ERVFRD-1 and plays a critical role in fusion of placental trophoblasts leading to the formation of the multinucleated syncytiotrophoblast. Its expression is consequently regulated in a strict manner. In the present study, we have identified a forskolin-responsive region located between positions -300 to -150 in the Syncytin-2 promoter region. This 150 bp region in the context of a minimal promoter mediated an 80-fold induction of promoter activity following forskolin stimulation. EMSA analyses with competition experiments with nuclear extracts from forskolin-stimulated BeWo cells demonstrated that the -211 to -177 region specifically bound two forskolin-induced complexes, one of them containing a CRE/AP-1-like motif. Site-directed mutagenesis of the CRE/AP-1 binding site in the context of the Syncytin-2 promoter or a heterologous promoter showed that this motif was mostly essential for forskolin-induced promoter activity. Transfection experiments with dominant negative mutants and constitutively activated CREB expression vectors in addition to Chromatin Immunoprecipitation suggested that a CREB family member, CREB2 was binding and acting through the CRE/AP-1 motif. We further demonstrated the binding of JunD to this same motif. Similar to forskolin and soluble cAMP, CREB2 and JunD overexpression induced Syncytin-2 promoter activity in a CRE/AP-1-dependent manner and Syncytin-2 expression. In addition, BeWo cell fusion was induced by both CREB2 and JunD overexpression, while being repressed following silencing of either gene. These results thereby demonstrate that induced expression of Syncytin-2 is highly dependent on the interaction of bZIP-containing transcription factors to a CRE/AP-1 motif and that this element is important for the regulation of Syncytin-2 expression, which results in the formation of the peripheral syncytiotrophoblast layer.

  20. Sequence Analysis and Phylogenetic Profiling of the Nonstructural (NS Genes of H9N2 Influenza A Viruses Isolated in Iran during 1998-2007

    Directory of Open Access Journals (Sweden)

    Ebrahimi, M.

    2014-11-01

    Full Text Available The earliest evidences on circulation of Avian Influenza (AI virus on the Iranian poultry farms date back to 1998. Great economic losses through dramatic drop in egg production and high mortality rates are characteristically attributed to H9N2 AI virus. In the present work non-structural (NS genes of 10 Iranian H9N2 chicken AI viruses collected during 1998-2007 were fully sequenced and subjected to a phylogenetic analysis. The observations proved allele A was the single-detectable type of the NS gene within the studied isolates. All the examined Iranian isolates fell into the Korean sublineage with a relatively broad sequence homology (91.6-98% in nucleotide construction of the NS genes. The motif for PDZ ligand recognition of the group one isolates was either EDEV (N=6 or ESEV (N=1 While all viruses as group two contained a PL motif “KSEV” (N=3. The present work provides useful epidemiological data at molecular level on source and contemporary evolution of H9N2 virus population in Iran.

  1. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  2. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    International Nuclear Information System (INIS)

    Park, Sung-Won; Do, Hyun-Jin; Huh, Sun-Hyung; Sung, Boreum; Uhm, Sang-Jun; Song, Hyuk; Kim, Nam-Hyung; Kim, Jae-Hwan

    2012-01-01

    Highlights: ► We found the putative nuclear export signal motif within human NANOG homeodomain. ► Leucine-rich residues are important for human NANOG homeodomain nuclear export. ► CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif ( 125 MQELSNILNL 134 ) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-ΔNLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  3. Leucine-based receptor sorting motifs are dependent on the spacing relative to the plasma membrane

    DEFF Research Database (Denmark)

    Geisler, C; Dietrich, J; Nielsen, B L

    1998-01-01

    Many integral membrane proteins contain leucine-based motifs within their cytoplasmic domains that mediate internalization and intracellular sorting. Two types of leucine-based motifs have been identified. One type is dependent on phosphorylation, whereas the other type, which includes an acidic...... amino acid, is constitutively active. In this study, we have investigated how the spacing relative to the plasma membrane affects the function of both types of leucine-based motifs. For phosphorylation-dependent leucine-based motifs, a minimal spacing of 7 residues between the plasma membrane...... and the phospho-acceptor was required for phosphorylation and thereby activation of the motifs. For constitutively active leucine-based motifs, a minimal spacing of 6 residues between the plasma membrane and the acidic residue was required for optimal activity of the motifs. In addition, we found that the acidic...

  4. Functional identification of a Lippia dulcis bornyl diphosphate synthase that contains a duplicated, inhibitory arginine-rich motif.

    Science.gov (United States)

    Hurd, Matthew C; Kwon, Moonhyuk; Ro, Dae-Kyun

    2017-08-26

    Lippia dulcis (Aztec sweet herb) contains the potent natural sweetener hernandulcin, a sesquiterpene ketone found in the leaves and flowers. Utilizing the leaves for agricultural application is challenging due to the presence of the bitter-tasting and toxic monoterpene, camphor. To unlock the commercial potential of L. dulcis leaves, the first step of camphor biosynthesis by a bornyl diphosphate synthase needs to be elucidated. Two putative monoterpene synthases (LdTPS3 and LdTPS9) were isolated from L. dulcis leaf cDNA. To elucidate their catalytic functions, E. coli-produced recombinant enzymes with truncations of their chloroplast transit peptides were assayed with geranyl diphosphate (GPP). In vitro enzyme assays showed that LdTPS3 encodes bornyl diphosphate synthase (thus named LdBPPS) while LdTPS9 encodes linalool synthase. Interestingly, the N-terminus of LdBPPS possesses two arginine-rich (RRX 8 W) motifs, and enzyme assays showed that the presence of both RRX 8 W motifs completely inhibits the catalytic activity of LdBPPS. Only after the removal of the putative chloroplast transit peptide and the first RRX 8 W, LdBPPS could react with GPP to produce bornyl diphosphate. LdBPPS is distantly related to the known bornyl diphosphate synthase from sage in a phylogenetic analysis, indicating a converged evolution of camphor biosynthesis in sage and L. dulcis. The discovery of LdBPPS opens up the possibility of engineering L. dulcis to remove the undesirable product, camphor. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Global patterns of amphibian phylogenetic diversity

    DEFF Research Database (Denmark)

    Fritz, Susanne; Rahbek, Carsten

    2012-01-01

    Aim  Phylogenetic diversity can provide insight into how evolutionary processes may have shaped contemporary patterns of species richness. Here, we aim to test for the influence of phylogenetic history on global patterns of amphibian species richness, and to identify areas where macroevolutionary...... processes such as diversification and dispersal have left strong signatures on contemporary species richness. Location  Global; equal-area grid cells of approximately 10,000 km2. Methods  We generated an amphibian global supertree (6111 species) and repeated analyses with the largest available molecular...... phylogeny (2792 species). We combined each tree with global species distributions to map four indices of phylogenetic diversity. To investigate congruence between global spatial patterns of amphibian species richness and phylogenetic diversity, we selected Faith’s phylogenetic diversity (PD) index...

  6. Tree-Based Unrooted Phylogenetic Networks.

    Science.gov (United States)

    Francis, A; Huber, K T; Moulton, V

    2018-02-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.

  7. PhyloSift: phylogenetic analysis of genomes and metagenomes.

    Science.gov (United States)

    Darling, Aaron E; Jospin, Guillaume; Lowe, Eric; Matsen, Frederick A; Bik, Holly M; Eisen, Jonathan A

    2014-01-01

    Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).

  8. PhyloSift: phylogenetic analysis of genomes and metagenomes

    Directory of Open Access Journals (Sweden)

    Aaron E. Darling

    2014-01-01

    Full Text Available Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454.

  9. Network motif frequency vectors reveal evolving metabolic network organisation.

    Science.gov (United States)

    Pearcy, Nicole; Crofts, Jonathan J; Chuzhanova, Nadia

    2015-01-01

    At the systems level many organisms of interest may be described by their patterns of interaction, and as such, are perhaps best characterised via network or graph models. Metabolic networks, in particular, are fundamental to the proper functioning of many important biological processes, and thus, have been widely studied over the past decade or so. Such investigations have revealed a number of shared topological features, such as a short characteristic path-length, large clustering coefficient and hierarchical modular structure. However, the extent to which evolutionary and functional properties of metabolism manifest via this underlying network architecture remains unclear. In this paper, we employ a novel graph embedding technique, based upon low-order network motifs, to compare metabolic network structure for 383 bacterial species categorised according to a number of biological features. In particular, we introduce a new global significance score which enables us to quantify important evolutionary relationships that exist between organisms and their physical environments. Using this new approach, we demonstrate a number of significant correlations between environmental factors, such as growth conditions and habitat variability, and network motif structure, providing evidence that organism adaptability leads to increased complexities in the resultant metabolic networks.

  10. DistAMo: A web-based tool to characterize DNA-motif distribution on bacterial chromosomes

    Directory of Open Access Journals (Sweden)

    Patrick eSobetzko

    2016-03-01

    Full Text Available Short DNA motifs are involved in a multitude of functions such as for example chromosome segregation, DNA replication or mismatch repair. Distribution of such motifs is often not random and the specific chromosomal pattern relates to the respective motif function. Computational approaches which quantitatively assess such chromosomal motif patterns are necessary. Here we present a new computer tool DistAMo (Distribution Analysis of DNA Motifs. The algorithm uses codon redundancy to calculate the relative abundance of short DNA motifs from single genes to entire chromosomes. Comparative genomics analyses of the GATC-motif distribution in γ-proteobacterial genomes using DistAMo revealed that (i genes beside the replication origin are enriched in GATCs, (ii genome-wide GATC distribution follows a distinct pattern and (iii genes involved in DNA replication and repair are enriched in GATCs. These features are specific for bacterial chromosomes encoding a Dam methyltransferase. The new software is available as a stand-alone or as an easy-to-use web-based server version at http://www.computational.bio.uni-giessen.de/distamo.

  11. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA.

    Science.gov (United States)

    Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

    2016-02-02

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.

  12. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  13. Nodal distances for rooted phylogenetic trees.

    Science.gov (United States)

    Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente, Gabriel

    2010-08-01

    Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).

  14. Ant-Based Phylogenetic Reconstruction (ABPR: A new distance algorithm for phylogenetic estimation based on ant colony optimization

    Directory of Open Access Journals (Sweden)

    Karla Vittori

    2008-12-01

    Full Text Available We propose a new distance algorithm for phylogenetic estimation based on Ant Colony Optimization (ACO, named Ant-Based Phylogenetic Reconstruction (ABPR. ABPR joins two taxa iteratively based on evolutionary distance among sequences, while also accounting for the quality of the phylogenetic tree built according to the total length of the tree. Similar to optimization algorithms for phylogenetic estimation, the algorithm allows exploration of a larger set of nearly optimal solutions. We applied the algorithm to four empirical data sets of mitochondrial DNA ranging from 12 to 186 sequences, and from 898 to 16,608 base pairs, and covering taxonomic levels from populations to orders. We show that ABPR performs better than the commonly used Neighbor-Joining algorithm, except when sequences are too closely related (e.g., population-level sequences. The phylogenetic relationships recovered at and above species level by ABPR agree with conventional views. However, like other algorithms of phylogenetic estimation, the proposed algorithm failed to recover expected relationships when distances are too similar or when rates of evolution are very variable, leading to the problem of long-branch attraction. ABPR, as well as other ACO-based algorithms, is emerging as a fast and accurate alternative method of phylogenetic estimation for large data sets.

  15. Aquatic insect ecophysiological traits reveal phylogenetically based differences in dissolved cadmium susceptibility.

    Science.gov (United States)

    Buchwalter, David B; Cain, Daniel J; Martin, Caitrin A; Xie, Lingtian; Luoma, Samuel N; Garland, Theodore

    2008-06-17

    We used a phylogenetically based comparative approach to evaluate the potential for physiological studies to reveal patterns of diversity in traits related to susceptibility to an environmental stressor, the trace metal cadmium (Cd). Physiological traits related to Cd bioaccumulation, compartmentalization, and ultimately susceptibility were measured in 21 aquatic insect species representing the orders Ephemeroptera, Plecoptera, and Trichoptera. We mapped these experimentally derived physiological traits onto a phylogeny and quantified the tendency for related species to be similar (phylogenetic signal). All traits related to Cd bioaccumulation and susceptibility exhibited statistically significant phylogenetic signal, although the signal strength varied among traits. Conventional and phylogenetically based regression models were compared, revealing great variability within orders but consistent, strong differences among insect families. Uptake and elimination rate constants were positively correlated among species, but only when effects of body size and phylogeny were incorporated in the analysis. Together, uptake and elimination rates predicted dramatic Cd bioaccumulation differences among species that agreed with field-based measurements. We discovered a potential tradeoff between the ability to eliminate Cd and the ability to detoxify it across species, particularly mayflies. The best-fit regression models were driven by phylogenetic parameters (especially differences among families) rather than functional traits, suggesting that it may eventually be possible to predict a taxon's physiological performance based on its phylogenetic position, provided adequate physiological information is available for close relatives. There appears to be great potential for evolutionary physiological approaches to augment our understanding of insect responses to environmental stressors in nature.

  16. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  17. iFORM: Incorporating Find Occurrence of Regulatory Motifs.

    Science.gov (United States)

    Ren, Chao; Chen, Hebing; Yang, Bite; Liu, Feng; Ouyang, Zhangyi; Bo, Xiaochen; Shu, Wenjie

    2016-01-01

    Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.

  18. Lucky Motifs in Chinese Folk Art: Interpreting Paper-cut from Chinese Shaanxi

    OpenAIRE

    Xuxiao WANG

    2013-01-01

    Paper-cut is not simply a form of traditional Chinese folk art. Lucky motifs developed in paper-cut certainly acquired profound cultural connotations. As paper-cut is a time-honoured skill across the nation, interpreting those motifs requires cultural receptiveness and anthropological sensitivity. The author of this article analyzes examples of paper-cut from Northern Shaanxi, China, to identify the cohesive motifs and explore the auspiciousness of the specific concepts of Fu, Lu, Shou, Xi. T...

  19. MOMFER: A Search Engine of Thompson's Motif-Index of Folk Literature

    NARCIS (Netherlands)

    Karsdorp, F.B.; van der Meulen, Marten; Meder, Theo; van den Bosch, Antal

    2015-01-01

    More than fifty years after the first edition of Thompson's seminal Motif-Indexof Folk Literature, we present an online search engine tailored to fully disclose the index digitally. This search engine, called MOMFER, greatly enhances the searchability of the Motif-Index and provides exciting new

  20. Maximum Parsimony on Phylogenetic networks

    Science.gov (United States)

    2012-01-01

    Background Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past. Results In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores. Conclusion The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are

  1. The Independent Evolution Method Is Not a Viable Phylogenetic Comparative Method.

    Directory of Open Access Journals (Sweden)

    Randi H Griffin

    Full Text Available Phylogenetic comparative methods (PCMs use data on species traits and phylogenetic relationships to shed light on evolutionary questions. Recently, Smaers and Vinicius suggested a new PCM, Independent Evolution (IE, which purportedly employs a novel model of evolution based on Felsenstein's Adaptive Peak Model. The authors found that IE improves upon previous PCMs by producing more accurate estimates of ancestral states, as well as separate estimates of evolutionary rates for each branch of a phylogenetic tree. Here, we document substantial theoretical and computational issues with IE. When data are simulated under a simple Brownian motion model of evolution, IE produces severely biased estimates of ancestral states and changes along individual branches. We show that these branch-specific changes are essentially ancestor-descendant or "directional" contrasts, and draw parallels between IE and previous PCMs such as "minimum evolution". Additionally, while comparisons of branch-specific changes between variables have been interpreted as reflecting the relative strength of selection on those traits, we demonstrate through simulations that regressing IE estimated branch-specific changes against one another gives a biased estimate of the scaling relationship between these variables, and provides no advantages or insights beyond established PCMs such as phylogenetically independent contrasts. In light of our findings, we discuss the results of previous papers that employed IE. We conclude that Independent Evolution is not a viable PCM, and should not be used in comparative analyses.

  2. How pathogens use linear motifs to perturb host cell networks

    KAUST Repository

    Via, Allegra; Uyar, Bora; Brun, Christine; Zanzoni, Andreas

    2015-01-01

    Molecular mimicry is one of the powerful stratagems that pathogens employ to colonise their hosts and take advantage of host cell functions to guarantee their replication and dissemination. In particular, several viruses have evolved the ability to interact with host cell components through protein short linear motifs (SLiMs) that mimic host SLiMs, thus facilitating their internalisation and the manipulation of a wide range of cellular networks. Here we present convincing evidence from the literature that motif mimicry also represents an effective, widespread hijacking strategy in prokaryotic and eukaryotic parasites. Further insights into host motif mimicry would be of great help in the elucidation of the molecular mechanisms behind host cell invasion and the development of anti-infective therapeutic strategies.

  3. Faster exact Markovian probability functions for motif occurrences: a DFA-only approach.

    Science.gov (United States)

    Ribeca, Paolo; Raineri, Emanuele

    2008-12-15

    The computation of the statistical properties of motif occurrences has an obviously relevant application: patterns that are significantly over- or under-represented in genomes or proteins are interesting candidates for biological roles. However, the problem is computationally hard; as a result, virtually all the existing motif finders use fast but approximate scoring functions, in spite of the fact that they have been shown to produce systematically incorrect results. A few interesting exact approaches are known, but they are very slow and hence not practical in the case of realistic sequences. We give an exact solution, solely based on deterministic finite-state automata (DFA), to the problem of finding the whole relevant part of the probability distribution function of a simple-word motif in a homogeneous (biological) sequence. Out of that, the z-value can always be computed, while the P-value can be obtained either when it is not too extreme with respect to the number of floating-point digits available in the implementation, or when the number of pattern occurrences is moderately low. In particular, the time complexity of the algorithms for Markov models of moderate order (0 manage to obtain an algorithm which is both easily interpretable and efficient. This approach can be used for exact statistical studies of very long genomes and protein sequences, as we illustrate with some examples on the scale of the human genome.

  4. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

    Science.gov (United States)

    Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2014-02-17

    As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of

  5. Functional & phylogenetic diversity of copepod communities

    Science.gov (United States)

    Benedetti, F.; Ayata, S. D.; Blanco-Bercial, L.; Cornils, A.; Guilhaumon, F.

    2016-02-01

    The diversity of natural communities is classically estimated through species identification (taxonomic diversity) but can also be estimated from the ecological functions performed by the species (functional diversity), or from the phylogenetic relationships among them (phylogenetic diversity). Estimating functional diversity requires the definition of specific functional traits, i.e., phenotypic characteristics that impact fitness and are relevant to ecosystem functioning. Estimating phylogenetic diversity requires the description of phylogenetic relationships, for instance by using molecular tools. In the present study, we focused on the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. First, we implemented a specific trait database for the most commonly-sampled and abundant copepod species of the Mediterranean Sea. Our database includes 191 species, described by seven traits encompassing diverse ecological functions: minimal and maximal body length, trophic group, feeding type, spawning strategy, diel vertical migration and vertical habitat. Clustering analysis in the functional trait space revealed that Mediterranean copepods can be gathered into groups that have different ecological roles. Second, we reconstructed a phylogenetic tree using the available sequences of 18S rRNA. Our tree included 154 of the analyzed Mediterranean copepod species. We used these two datasets to describe the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. The replacement component (turn-over) and the species richness difference component (nestedness) of the beta diversity indices were identified. Finally, by comparing various and complementary aspects of plankton diversity (taxonomic, functional, and phylogenetic diversity) we were able to gain a better understanding of the relationships among the zooplankton community, biodiversity, ecosystem function, and environmental forcing.

  6. treespace: Statistical exploration of landscapes of phylogenetic trees.

    Science.gov (United States)

    Jombart, Thibaut; Kendall, Michelle; Almagro-Garcia, Jacob; Colijn, Caroline

    2017-11-01

    The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low-dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  7. Phylogenetic diversity and relationships among species of genus ...

    African Journals Online (AJOL)

    Fifty six Nicotiana species were used to construct phylogenetic trees and to asses the genetic relationships between them. Genetic distances estimated from RAPD analysis was used to construct phylogenetic trees using Phylogenetic Inference Package (PHYLIP). Since phylogenetic relationships estimated for closely ...

  8. Lucky Motifs in Chinese Folk Art: Interpreting Paper-cut from Chinese Shaanxi

    Directory of Open Access Journals (Sweden)

    Xuxiao WANG

    2013-11-01

    Full Text Available Paper-cut is not simply a form of traditional Chinese folk art. Lucky motifs developed in paper-cut certainly acquired profound cultural connotations. As paper-cut is a time-honoured skill across the nation, interpreting those motifs requires cultural receptiveness and anthropological sensitivity. The author of this article analyzes examples of paper-cut from Northern Shaanxi, China, to identify the cohesive motifs and explore the auspiciousness of the specific concepts of Fu, Lu, Shou, Xi. The paper-cut of Northern Shaanxi is an ideal representative of the craft as a whole because of the relative stability of this region in history, in terms of both art and culture. Furthermore, its straightforward style provides a clear demonstration of motifs regarding folk understanding of expectations for life.

  9. The 5S rRNA loop E: chemical probing and phylogenetic data versus crystal structure.

    Science.gov (United States)

    Leontis, N B; Westhof, E

    1998-09-01

    A significant fraction of the bases in a folded, structured RNA molecule participate in noncanonical base pairing interactions, often in the context of internal loops or multi-helix junction loops. The appearance of each new high-resolution RNA structure provides welcome data to guide efforts to understand and predict RNA 3D structure, especially when the RNA in question is a functionally conserved molecule. The recent publication of the crystal structure of the "Loop E" region of bacterial 5S ribosomal RNA is such an event [Correll CC, Freeborn B, Moore PB, Steitz TA, 1997, Cell 91:705-712]. In addition to providing more examples of already established noncanonical base pairs, such as purine-purine sheared pairings, trans-Hoogsteen UA, and GU wobble pairs, the structure provides the first high-resolution views of two new purine-purine pairings and a new GU pairing. The goal of the present analysis is to expand the capabilities of both chemical probing and phylogenetic analysis to predict with greater accuracy the structures of RNA molecules. First, in light of existing chemical probing data, we investigate what lessons could be learned regarding the interpretation of this widely used method of RNA structure probing. Then we analyze the 3D structure with reference to molecular phylogeny data (assuming conservation of function) to discover what alternative base pairings are geometrically compatible with the structure. The comparisons between previous modeling efforts and crystal structures show that the intricate involvements of ions and water molecules in the maintenance of non-Watson-Crick pairs render the process of correctly identifying the interacting sites in such pairs treacherous, except in cases of trans-Hoogsteen A/U or sheared A/G pairs for the adenine N1 site. The phylogenetic analysis identifies A/A, A/C, A/U and C/A, C/C, and C/U pairings isosteric with sheared A/G, as well as A/A and A/C pairings isosteric with both G/U and G/G bifurcated pairings

  10. Fast optimization of statistical potentials for structurally constrained phylogenetic models

    Directory of Open Access Journals (Sweden)

    Rodrigue Nicolas

    2009-09-01

    Full Text Available Abstract Background Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms. Results Here, we develop an alternative optimization procedure, based on a leave-one-out argument coupled to fast gradient descent algorithms. We assess that the leave-one-out potential yields very similar results to the joint approach developed previously, both in terms of the resulting potential parameters, and by Bayes factor evaluation in a phylogenetic context. On the other hand, the leave-one-out approach results in a considerable computational benefit (up to a 1,000 fold decrease in computational time for the optimization procedure. Conclusion Due to its computational speed, the optimization method we propose offers an attractive alternative for the design and empirical evaluation of alternative forms of potentials, using large data sets and high-dimensional parameterizations.

  11. Nonbinary Tree-Based Phylogenetic Networks

    NARCIS (Netherlands)

    Jetten, L.; van Iersel, L.J.J.

    2018-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example

  12. Dimensionality of social networks using motifs and eigenvalues.

    Directory of Open Access Journals (Sweden)

    Anthony Bonato

    Full Text Available We consider the dimensionality of social networks, and develop experiments aimed at predicting that dimension. We find that a social network model with nodes and links sampled from an m-dimensional metric space with power-law distributed influence regions best fits samples from real-world networks when m scales logarithmically with the number of nodes of the network. This supports a logarithmic dimension hypothesis, and we provide evidence with two different social networks, Facebook and LinkedIn. Further, we employ two different methods for confirming the hypothesis: the first uses the distribution of motif counts, and the second exploits the eigenvalue distribution.

  13. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  14. Ultrafast Approximation for Phylogenetic Bootstrap

    NARCIS (Netherlands)

    Bui Quang Minh, [No Value; Nguyen, Thi; von Haeseler, Arndt

    Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and

  15. Gene Isolation Using Degenerate Primers Targeting Protein Motif: A Laboratory Exercise

    Science.gov (United States)

    Yeo, Brandon Pei Hui; Foong, Lian Chee; Tam, Sheh May; Lee, Vivian; Hwang, Siaw San

    2018-01-01

    Structures and functions of protein motifs are widely included in many biology-based course syllabi. However, little emphasis is placed to link this knowledge to applications in biotechnology to enhance the learning experience. Here, the conserved motifs of nucleotide binding site-leucine rich repeats (NBS-LRR) proteins, successfully used for the…

  16. Wayward Warriors: The Viking Motif in Swedish and English Children's Literature

    Science.gov (United States)

    Sundmark, Björn

    2014-01-01

    In this article the Viking motif in children's literature is explored--from its roots in (adult) nationalist and antiquarian discourse, over pedagogical and historical texts for children, to the eventual diversification (or dissolution) of the motif into different genres and forms. The focus is on Swedish Viking narratives, but points of…

  17. Undergraduate Students’ Difficulties in Reading and Constructing Phylogenetic Tree

    Science.gov (United States)

    Sa'adah, S.; Tapilouw, F. S.; Hidayat, T.

    2017-02-01

    Representation is a very important communication tool to communicate scientific concepts. Biologists produce phylogenetic representation to express their understanding of evolutionary relationships. The phylogenetic tree is visual representation depict a hypothesis about the evolutionary relationship and widely used in the biological sciences. Phylogenetic tree currently growing for many disciplines in biology. Consequently, learning about phylogenetic tree become an important part of biological education and an interesting area for biology education research. However, research showed many students often struggle with interpreting the information that phylogenetic trees depict. The purpose of this study was to investigate undergraduate students’ difficulties in reading and constructing a phylogenetic tree. The method of this study is a descriptive method. In this study, we used questionnaires, interviews, multiple choice and open-ended questions, reflective journals and observations. The findings showed students experiencing difficulties, especially in constructing a phylogenetic tree. The students’ responds indicated that main reasons for difficulties in constructing a phylogenetic tree are difficult to placing taxa in a phylogenetic tree based on the data provided so that the phylogenetic tree constructed does not describe the actual evolutionary relationship (incorrect relatedness). Students also have difficulties in determining the sister group, character synapomorphy, autapomorphy from data provided (character table) and comparing among phylogenetic tree. According to them building the phylogenetic tree is more difficult than reading the phylogenetic tree. Finding this studies provide information to undergraduate instructor and students to overcome learning difficulties of reading and constructing phylogenetic tree.

  18. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    Science.gov (United States)

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  19. Phylogenetic structure in tropical hummingbird communities

    DEFF Research Database (Denmark)

    Graham, Catherine H; Parra, Juan L; Rahbek, Carsten

    2009-01-01

    How biotic interactions, current and historical environment, and biogeographic barriers determine community structure is a fundamental question in ecology and evolution, especially in diverse tropical regions. To evaluate patterns of local and regional diversity, we quantified the phylogenetic...... composition of 189 hummingbird communities in Ecuador. We assessed how species and phylogenetic composition changed along environmental gradients and across biogeographic barriers. We show that humid, low-elevation communities are phylogenetically overdispersed (coexistence of distant relatives), a pattern...... that is consistent with the idea that competition influences the local composition of hummingbirds. At higher elevations communities are phylogenetically clustered (coexistence of close relatives), consistent with the expectation of environmental filtering, which may result from the challenge of sustaining...

  20. Constructing phylogenetic trees using interacting pathways.

    Science.gov (United States)

    Wan, Peng; Che, Dongsheng

    2013-01-01

    Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms-eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.

  1. Modulation of i-motif thermodynamic stability by the introduction of UNA (unlocked nucleic acid) monomers

    DEFF Research Database (Denmark)

    Pasternak, Anna; Wengel, Jesper

    2011-01-01

    The influence of acyclic RNA derivatives, UNA (unlocked nucleic acid) monomers, on i-DNA thermodynamic stability has been investigated. The 22 nt human telomeric fragment was chosen as the model sequence for stability studies. UNA monomers modulate i-motif stability in a position-depending manner...

  2. A reconstruction problem for a class of phylogenetic networks with lateral gene transfers.

    Science.gov (United States)

    Cardona, Gabriel; Pons, Joan Carles; Rosselló, Francesc

    2015-01-01

    Lateral, or Horizontal, Gene Transfers are a type of asymmetric evolutionary events where genetic material is transferred from one species to another. In this paper we consider LGT networks, a general model of phylogenetic networks with lateral gene transfers which consist, roughly, of a principal rooted tree with its leaves labelled on a set of taxa, and a set of extra secondary arcs between nodes in this tree representing lateral gene transfers. An LGT network gives rise in a natural way to a principal phylogenetic subtree and a set of secondary phylogenetic subtrees, which, roughly, represent, respectively, the main line of evolution of most genes and the secondary lines of evolution through lateral gene transfers. We introduce a set of simple conditions on an LGT network that guarantee that its principal and secondary phylogenetic subtrees are pairwise different and that these subtrees determine, up to isomorphism, the LGT network. We then give an algorithm that, given a set of pairwise different phylogenetic trees [Formula: see text] on the same set of taxa, outputs, when it exists, the LGT network that satisfies these conditions and such that its principal phylogenetic tree is [Formula: see text] and its secondary phylogenetic trees are [Formula: see text].

  3. Inferring Phylogenetic Networks Using PhyloNet.

    Science.gov (United States)

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  4. Phylogenetic classification of bony fishes.

    Science.gov (United States)

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution

  5. Sequence and phylogenetic analysis of virulent Newcastle disease virus isolates from Pakistan during 2009–2013 reveals circulation of new sub genotype

    International Nuclear Information System (INIS)

    Siddique, Naila; Naeem, Khalid; Abbas, Muhammad Athar; Ali Malik, Akbar; Rashid, Farooq; Rafique, Saba; Ghafar, Abdul; Rehman, Abdul

    2013-01-01

    Despite observing the standard bio-security measures at commercial poultry farms and extensive use of Newcastle disease vaccines, a new genotype VII-f of Newcastle disease virus (NDV) got introduced in Pakistan during 2011. In this regard 300 ND outbreaks recorded so far have resulted into huge losses of approximately USD 200 million during 2011–2013. A total of 33 NDV isolates recovered during 2009–2013 throughout Pakistan were characterized biologically and phylogenetically. The phylogenetic analysis revealed a new velogenic sub genotype VII-f circulating in commercial and domestic poultry along with the earlier reported sub genotype VII-b. Partial sequencing of Fusion gene revealed two types of cleavage site motifs; lentogenic 112 GRQGRL 117 and velogenic 112 RRQKRF 117 along with some point mutations indicative of genetic diversity. We report here a new sub genotype of virulent NDV circulating in commercial and backyard poultry in Pakistan and provide evidence for the possible genetic diversity which may be causing new NDV out breaks. - Highlights: • The first report of isolation of new genotype VII-f of virulent Newcastle disease virus (NDV) in Pakistan. • We report the partial Fusion gene sequences of new genotype VII-f of virulent NDV from Pakistan. • We report the phylogenetic relationship of new NDV strains with reported NDV strains. • Provide outbreak history of new virulent NDV strain in commercial and backyard poultry in Pakistan. • We provide possible evidence for the role of backyard poultry in NDV outbreaks

  6. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-01

    LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  7. APOCALYPTIC MOTIFS IN THE CYCLE OF STORIES BY M.A. BULGAKOV «NOTES OF A YOUNG DOCTOR»

    Directory of Open Access Journals (Sweden)

    Evgeniy Igorevich Erokhov

    2015-10-01

    Full Text Available The motif analysis of a cycle of stories by M.A. Bulgakov «Notes of a Young Doctor» from the point of view of their apocalyptic problematics was first performed in this article. To identify apocalyptic motifs the method of motif analysis, developed by B.M. Gasparov, was used which will also help to prove the interpenetration of motifs in the cycle of stories. The result of the research work is the identification of apocalyptic motifs which are manifested in the experiences of the main character and the events taking place around him and passing through the prism of physician’s perception of the world. Our identified motifs show that the stories in the cycle are united not only thematically and with the help of the image of the main character, but with the help of the motifs which reflect interpenetration of apocalyptic motifs in the stories of one cycle. There are the following apocalyptic motifs in the cycle of stories by Bulgakov: diseases, darkness (as part of the landscape, resurrection from the dead and beast. They all belong to the biblical type which is allocated on the basis of the associative bond of these motifs with the biblical texts.

  8. Undergraduate Students’ Initial Ability in Understanding Phylogenetic Tree

    Science.gov (United States)

    Sa'adah, S.; Hidayat, T.; Sudargo, Fransisca

    2017-04-01

    The Phylogenetic tree is a visual representation depicts a hypothesis about the evolutionary relationship among taxa. Evolutionary experts use this representation to evaluate the evidence for evolution. The phylogenetic tree is currently growing for many disciplines in biology. Consequently, learning about the phylogenetic tree has become an important part of biological education and an interesting area of biology education research. Skill to understanding and reasoning of the phylogenetic tree, (called tree thinking) is an important skill for biology students. However, research showed many students have difficulty in interpreting, constructing, and comparing among the phylogenetic tree, as well as experiencing a misconception in the understanding of the phylogenetic tree. Students are often not taught how to reason about evolutionary relationship depicted in the diagram. Students are also not provided with information about the underlying theory and process of phylogenetic. This study aims to investigate the initial ability of undergraduate students in understanding and reasoning of the phylogenetic tree. The research method is the descriptive method. Students are given multiple choice questions and an essay that representative by tree thinking elements. Each correct answer made percentages. Each student is also given questionnaires. The results showed that the undergraduate students’ initial ability in understanding and reasoning phylogenetic tree is low. Many students are not able to answer questions about the phylogenetic tree. Only 19 % undergraduate student who answered correctly on indicator evaluate the evolutionary relationship among taxa, 25% undergraduate student who answered correctly on indicator applying concepts of the clade, 17% undergraduate student who answered correctly on indicator determines the character evolution, and only a few undergraduate student who can construct the phylogenetic tree.

  9. Incorporating phylogenetic information for the definition of floristic districts in hyperdiverse Amazon forests: Implications for conservation.

    Science.gov (United States)

    Guevara Andino, Juan Ernesto; Pitman, Nigel C A; Ter Steege, Hans; Mogollón, Hugo; Ceron, Carlos; Palacios, Walter; Oleas, Nora; Fine, Paul V A

    2017-11-01

    Using complementary metrics to evaluate phylogenetic diversity can facilitate the delimitation of floristic units and conservation priority areas. In this study, we describe the spatial patterns of phylogenetic alpha and beta diversity, phylogenetic endemism, and evolutionary distinctiveness of the hyperdiverse Ecuador Amazon forests and define priority areas for conservation. We established a network of 62 one-hectare plots in terra firme forests of Ecuadorian Amazon. In these plots, we tagged, collected, and identified every single adult tree with dbh ≥10 cm. These data were combined with a regional community phylogenetic tree to calculate different phylogenetic diversity (PD) metrics in order to create spatial models. We used Loess regression to estimate the spatial variation of taxonomic and phylogenetic beta diversity as well as phylogenetic endemism and evolutionary distinctiveness. We found evidence for the definition of three floristic districts in the Ecuadorian Amazon, supported by both taxonomic and phylogenetic diversity data. Areas with high levels of phylogenetic endemism and evolutionary distinctiveness in Ecuadorian Amazon forests are unprotected. Furthermore, these areas are severely threatened by proposed plans of oil and mining extraction at large scales and should be prioritized in conservation planning for this region.

  10. PENGARUH KEBUTUHAN TERHADAP MOTIF PENGGUNAAN KARTU DEBET BANK CENTRAL ASIA (BCA DI KALANGAN MAHASISWA AKTIF FAKULTAS EKONOMI UNIVERSITAS KRISTEN PETRA SURABAYA

    Directory of Open Access Journals (Sweden)

    Hatane Semuel

    2003-01-01

    Full Text Available This research is conducted to the students of class 1999, 2000, and 2001 of Economics Faculty of Petra Christian University. This research focuses on cognitive and affective motive of those subjects in using BCA's debit card, assuming that consumers tend to use objective and subjective factors in considering to purchase certain product. The result of this research shows that achievement needs, power needs, and affiliation needs simultaneously influence the motive of the subjects in using BCA's debit card, in which 46% of the influence can be explained by the model. Achievement needs have more dominant influence compared to power needs. Besides that, education factor (study duration found to influence the cognitive motive, which is shown by the motive difference among class 1999, 2000, and 2001. Abstract in Bahasa Indonesia : Penelitian dilakukan terhadap mahasiswa Fakultas ekonomi Universitas Kristen PETRA Surabaya angkatan 1999, 2000, dan 2001. Fokus penelitian pada motif kognitif dan afektif dalam menggunakan produk kartu debet BCA, dengan asumsi bahwa konsumen dalam mempertimbangkan pembelian suatu produk lebih didasarkan pada faktor obyektif dan subyektif. Hasil penelitian mengungkapkan kebutuhan yang diukur melalui, achievement needs, power needs, affiliation needs ternyata secara serempak berpengaruh terhadap motif penggunaan kartu debet BCA di kalangan mahasiswa Fakultas Ekonomi Universitas Kristen PETRA Surabaya, dengan kemampuan 46% dapat dijelaskan dari dalam model. Achievement needs mempunyai pengaruh lebih dominan dibandingkan power needs. Selain itu faktor pendidikan (lama studi mempunyai dampak terhadap motif kognitif, hal ini terungkap dari adanya perbedaan motif tersebut pada angkatan 1999 dengan angkatan 2000 maupun angkatan 2001. Kata kunci: motif, achievement, power, needs.

  11. Identification of helix capping and {beta}-turn motifs from NMR chemical shifts

    Energy Technology Data Exchange (ETDEWEB)

    Shen Yang; Bax, Ad, E-mail: bax@nih.gov [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)

    2012-03-15

    We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and {sup 13}C{sup {beta}} chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of {beta}-turns: I, II, I Prime , II Prime and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and {beta}-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7-0.9 for the Matthews correlation coefficient of its predictions far exceed those attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures.

  12. Identification of helix capping and β-turn motifs from NMR chemical shifts

    International Nuclear Information System (INIS)

    Shen Yang; Bax, Ad

    2012-01-01

    We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and 13 C β chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of β-turns: I, II, I′, II′ and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and β-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7–0.9 for the Matthews correlation coefficient of its predictions far exceed those attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures.

  13. Identification of putative regulatory motifs in the upstream regions of co-expressed functional groups of genes in Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Joshi NV

    2009-01-01

    Full Text Available Abstract Background Regulation of gene expression in Plasmodium falciparum (Pf remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS; this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.

  14. Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants.

    Science.gov (United States)

    Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D

    2017-12-01

    models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.

  15. treeman: an R package for efficient and intuitive manipulation of phylogenetic trees.

    Science.gov (United States)

    Bennett, Dominic J; Sutton, Mark D; Turvey, Samuel T

    2017-01-07

    Phylogenetic trees are hierarchical structures used for representing the inter-relationships between biological entities. They are the most common tool for representing evolution and are essential to a range of fields across the life sciences. The manipulation of phylogenetic trees-in terms of adding or removing tips-is often performed by researchers not just for reasons of management but also for performing simulations in order to understand the processes of evolution. Despite this, the most common programming language among biologists, R, has few class structures well suited to these tasks. We present an R package that contains a new class, called TreeMan, for representing the phylogenetic tree. This class has a list structure allowing phylogenetic trees to be manipulated more efficiently. Computational running times are reduced because of the ready ability to vectorise and parallelise methods. Development is also improved due to fewer lines of code being required for performing manipulation processes. We present three use cases-pinning missing taxa to a supertree, simulating evolution with a tree-growth model and detecting significant phylogenetic turnover-that demonstrate the new package's speed and simplicity.

  16. Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

    Science.gov (United States)

    Soufari, Heddy

    2017-01-01

    Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515

  17. Phylogenetic evidence for cladogenetic polyploidization in land plants.

    Science.gov (United States)

    Zhan, Shing H; Drori, Michal; Goldberg, Emma E; Otto, Sarah P; Mayrose, Itay

    2016-07-01

    Polyploidization is a common and recurring phenomenon in plants and is often thought to be a mechanism of "instant speciation". Whether polyploidization is associated with the formation of new species (cladogenesis) or simply occurs over time within a lineage (anagenesis), however, has never been assessed systematically. We tested this hypothesis using phylogenetic and karyotypic information from 235 plant genera (mostly angiosperms). We first constructed a large database of combined sequence and chromosome number data sets using an automated procedure. We then applied likelihood models (ClaSSE) that estimate the degree of synchronization between polyploidization and speciation events in maximum likelihood and Bayesian frameworks. Our maximum likelihood analysis indicated that 35 genera supported a model that includes cladogenetic transitions over a model with only anagenetic transitions, whereas three genera supported a model that incorporates anagenetic transitions over one with only cladogenetic transitions. Furthermore, the Bayesian analysis supported a preponderance of cladogenetic change in four genera but did not support a preponderance of anagenetic change in any genus. Overall, these phylogenetic analyses provide the first broad confirmation that polyploidization is temporally associated with speciation events, suggesting that it is indeed a major speciation mechanism in plants, at least in some genera. © 2016 Botanical Society of America.

  18. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  19. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  20. Rearrangement moves on rooted phylogenetic networks.

    Science.gov (United States)

    Gambette, Philippe; van Iersel, Leo; Jones, Mark; Lafond, Manuel; Pardi, Fabio; Scornavacca, Celine

    2017-08-01

    Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network-that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose "horizontal" moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and "vertical" moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves-named rNNI and rSPR-reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results-separating the contributions of horizontal and vertical moves-we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for

  1. Rearrangement moves on rooted phylogenetic networks.

    Directory of Open Access Journals (Sweden)

    Philippe Gambette

    2017-08-01

    Full Text Available Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network-that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose "horizontal" moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and "vertical" moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves-named rNNI and rSPR-reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results-separating the contributions of horizontal and vertical moves-we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide

  2. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica genome: new insights from bioinformatics analysis

    Directory of Open Access Journals (Sweden)

    Chepyshko Hanna

    2012-07-01

    Full Text Available Abstract Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were

  3. Phylogenetic tests of distribution patterns in South Asia: towards

    Indian Academy of Sciences (India)

    The last four decades have seen an increasing integration of phylogenetics and biogeography. However, a dearth of phylogenetic studies has precluded such biogeographic analyses in South Asia until recently. Noting the increase in phylogenetic research and interest in phylogenetic biogeography in the region, we ...

  4. Phylogenetic search through partial tree mixing

    Science.gov (United States)

    2012-01-01

    Background Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. Results When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda Conclusions The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution. PMID:23320449

  5. Salt-bridge Swapping in the EXXERFXYY Motif of Proton Coupled Oligopeptide Transporters

    DEFF Research Database (Denmark)

    Aduri, Nanda G; Prabhala, Bala K; Ernst, Heidi A

    2015-01-01

    to as E1XXE2R), located on Helix I, in interactions with the proton. In this study we investigated the intracellular substrate accumulation by motif variants with all possible combinations of glutamate residues changed to glutamine and arginine changed to a tyrosine; the latter being a natural variant......-motif salt bridge, i.e. R-E2 to R-E1, which is consistent with previous structural studies. Molecular dynamics simulations of the motif variants E1XXE2R and E1XXQ2R support this mechanism. The simulations showed that upon changing conformation, arginine pushes Helix V, through interactions with the highly...

  6. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Directory of Open Access Journals (Sweden)

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  7. Phylogenetic Signal in AFLP Data Sets

    NARCIS (Netherlands)

    Koopman, W.J.M.

    2005-01-01

    AFLP markers provide a potential source of phylogenetic information for molecular systematic studies. However, there are properties of restriction fragment data that limit phylogenetic interpretation of AFLPs. These are (a) possible nonindependence of fragments, (b) problems of homology assignment

  8. How does cognition evolve? Phylogenetic comparative psychology

    Science.gov (United States)

    Matthews, Luke J.; Hare, Brian A.; Nunn, Charles L.; Anderson, Rindy C.; Aureli, Filippo; Brannon, Elizabeth M.; Call, Josep; Drea, Christine M.; Emery, Nathan J.; Haun, Daniel B. M.; Herrmann, Esther; Jacobs, Lucia F.; Platt, Michael L.; Rosati, Alexandra G.; Sandel, Aaron A.; Schroepfer, Kara K.; Seed, Amanda M.; Tan, Jingzhi; van Schaik, Carel P.; Wobber, Victoria

    2014-01-01

    Now more than ever animal studies have the potential to test hypotheses regarding how cognition evolves. Comparative psychologists have developed new techniques to probe the cognitive mechanisms underlying animal behavior, and they have become increasingly skillful at adapting methodologies to test multiple species. Meanwhile, evolutionary biologists have generated quantitative approaches to investigate the phylogenetic distribution and function of phenotypic traits, including cognition. In particular, phylogenetic methods can quantitatively (1) test whether specific cognitive abilities are correlated with life history (e.g., lifespan), morphology (e.g., brain size), or socio-ecological variables (e.g., social system), (2) measure how strongly phylogenetic relatedness predicts the distribution of cognitive skills across species, and (3) estimate the ancestral state of a given cognitive trait using measures of cognitive performance from extant species. Phylogenetic methods can also be used to guide the selection of species comparisons that offer the strongest tests of a priori predictions of cognitive evolutionary hypotheses (i.e., phylogenetic targeting). Here, we explain how an integration of comparative psychology and evolutionary biology will answer a host of questions regarding the phylogenetic distribution and history of cognitive traits, as well as the evolutionary processes that drove their evolution. PMID:21927850

  9. How does cognition evolve? Phylogenetic comparative psychology.

    Science.gov (United States)

    MacLean, Evan L; Matthews, Luke J; Hare, Brian A; Nunn, Charles L; Anderson, Rindy C; Aureli, Filippo; Brannon, Elizabeth M; Call, Josep; Drea, Christine M; Emery, Nathan J; Haun, Daniel B M; Herrmann, Esther; Jacobs, Lucia F; Platt, Michael L; Rosati, Alexandra G; Sandel, Aaron A; Schroepfer, Kara K; Seed, Amanda M; Tan, Jingzhi; van Schaik, Carel P; Wobber, Victoria

    2012-03-01

    Now more than ever animal studies have the potential to test hypotheses regarding how cognition evolves. Comparative psychologists have developed new techniques to probe the cognitive mechanisms underlying animal behavior, and they have become increasingly skillful at adapting methodologies to test multiple species. Meanwhile, evolutionary biologists have generated quantitative approaches to investigate the phylogenetic distribution and function of phenotypic traits, including cognition. In particular, phylogenetic methods can quantitatively (1) test whether specific cognitive abilities are correlated with life history (e.g., lifespan), morphology (e.g., brain size), or socio-ecological variables (e.g., social system), (2) measure how strongly phylogenetic relatedness predicts the distribution of cognitive skills across species, and (3) estimate the ancestral state of a given cognitive trait using measures of cognitive performance from extant species. Phylogenetic methods can also be used to guide the selection of species comparisons that offer the strongest tests of a priori predictions of cognitive evolutionary hypotheses (i.e., phylogenetic targeting). Here, we explain how an integration of comparative psychology and evolutionary biology will answer a host of questions regarding the phylogenetic distribution and history of cognitive traits, as well as the evolutionary processes that drove their evolution.

  10. Through the Portal: Viking Motifs Incorporated in the Romanesque Style in Telemark, Norway

    Directory of Open Access Journals (Sweden)

    Kristine Ødeby

    2013-09-01

    Full Text Available This paper presents the results of an analysis of motifs identified on six carved wooden Romanesque portal panels from the Norwegian county of Telemark. The findings suggest that animal motifs in the Late Viking style survived long into the Late Medieval period and were reused on these medieval portals. Stylistically, late expressions of Viking animal art do not differ a great deal from those of the subsequent Romanesque style. However, their symbolical differences are considered to be significant. The motifs themselves, and the issue of whether the Romanesque style adopted motifs from pre-Christian art, have attracted less attention. The motif portraying Sigurd slaying the dragon is considered in depth. It will be suggested that Sigurd, serving as a mediator between the old and the new beliefs when he appeared in late Viking contexts, was given a new role when portrayed in Christian art. Metaphor and liminality are a central part of this paper, and the theories of Alfred Gell and Margrete Andås suggest that the portal itself affects those who pass through it, and that the iconography is meaningful from a liminal perspective.

  11. The phylogenetic distribution of extrafloral nectaries in plants.

    Science.gov (United States)

    Weber, Marjorie G; Keeler, Kathleen H

    2013-06-01

    Understanding the evolutionary patterns of ecologically relevant traits is a central goal in plant biology. However, for most important traits, we lack the comprehensive understanding of their taxonomic distribution needed to evaluate their evolutionary mode and tempo across the tree of life. Here we evaluate the broad phylogenetic patterns of a common plant-defence trait found across vascular plants: extrafloral nectaries (EFNs), plant glands that secrete nectar and are located outside the flower. EFNs typically defend plants indirectly by attracting invertebrate predators who reduce herbivory. Records of EFNs published over the last 135 years were compiled. After accounting for changes in taxonomy, phylogenetic comparative methods were used to evaluate patterns of EFN evolution, using a phylogeny of over 55 000 species of vascular plants. Using comparisons of parametric and non-parametric models, the true number of species with EFNs likely to exist beyond the current list was estimated. To date, EFNs have been reported in 3941 species representing 745 genera in 108 families, about 1-2 % of vascular plant species and approx. 21 % of families. They are found in 33 of 65 angiosperm orders. Foliar nectaries are known in four of 36 fern families. Extrafloral nectaries are unknown in early angiosperms, magnoliids and gymnosperms. They occur throughout monocotyledons, yet most EFNs are found within eudicots, with the bulk of species with EFNs being rosids. Phylogenetic analyses strongly support the repeated gain and loss of EFNs across plant clades, especially in more derived dicot families, and suggest that EFNs are found in a minimum of 457 independent lineages. However, model selection methods estimate that the number of unreported cases of EFNs may be as high as the number of species already reported. EFNs are widespread and evolutionarily labile traits that have repeatedly evolved a remarkable number of times in vascular plants. Our current understanding of the

  12. An intracellular motif of GLUT4 regulates fusion of GLUT4-containing vesicles.

    Science.gov (United States)

    Heyward, Catherine A; Pettitt, Trevor R; Leney, Sophie E; Welsh, Gavin I; Tavaré, Jeremy M; Wakelam, Michael J O

    2008-05-20

    Insulin stimulates glucose uptake by adipocytes through increasing translocation of the glucose transporter GLUT4 from an intracellular compartment to the plasma membrane. Fusion of GLUT4-containing vesicles at the cell surface is thought to involve phospholipase D activity, generating the signalling lipid phosphatidic acid, although the mechanism of action is not yet clear. Here we report the identification of a putative phosphatidic acid-binding motif in a GLUT4 intracellular loop. Mutation of this motif causes a decrease in the insulin-induced exposure of GLUT4 at the cell surface of 3T3-L1 adipocytes via an effect on vesicle fusion. The potential phosphatidic acid-binding motif identified in this study is unique to GLUT4 among the sugar transporters, therefore this motif may provide a unique mechanism for regulating insulin-induced translocation by phospholipase D signalling.

  13. Phylogenetic reconstruction methods: an overview.

    Science.gov (United States)

    De Bruyn, Alexandre; Martin, Darren P; Lefeuvre, Pierre

    2014-01-01

    Initially designed to infer evolutionary relationships based on morphological and physiological characters, phylogenetic reconstruction methods have greatly benefited from recent developments in molecular biology and sequencing technologies with a number of powerful methods having been developed specifically to infer phylogenies from macromolecular data. This chapter, while presenting an overview of basic concepts and methods used in phylogenetic reconstruction, is primarily intended as a simplified step-by-step guide to the construction of phylogenetic trees from nucleotide sequences using fairly up-to-date maximum likelihood methods implemented in freely available computer programs. While the analysis of chloroplast sequences from various Vanilla species is used as an illustrative example, the techniques covered here are relevant to the comparative analysis of homologous sequences datasets sampled from any group of organisms.

  14. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2014-05-29

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  15. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Gao, Xin; Arold, Stefan T.

    2014-01-01

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  16. Identification of group specific motifs in Beta-lactamase family of proteins

    Directory of Open Access Journals (Sweden)

    Saxena Akansha

    2009-12-01

    Full Text Available Abstract Background Beta-lactamases are one of the most serious threats to public health. In order to combat this threat we need to study the molecular and functional diversity of these enzymes and identify signatures specific to these enzymes. These signatures will enable us to develop inhibitors and diagnostic probes specific to lactamases. The existing classification of beta-lactamases was developed nearly 30 years ago when few lactamases were available. DLact database contain more than 2000 beta-lactamase, which can be used to study the molecular diversity and to identify signatures specific to this family. Methods A set of 2020 beta-lactamase proteins available in the DLact database http://59.160.102.202/DLact were classified using graph-based clustering of Best Bi-Directional Hits. Non-redundant (> 90 percent identical protein sequences from each group were aligned using T-Coffee and annotated using information available in literature. Motifs specific to each group were predicted using PRATT program. Results The graph-based classification of beta-lactamase proteins resulted in the formation of six groups (Four major groups containing 191, 726, 774 and 73 proteins while two minor groups containing 50 and 8 proteins. Based on the information available in literature, we found that each of the four major groups correspond to the four classes proposed by Ambler. The two minor groups were novel and do not contain molecular signatures of beta-lactamase proteins reported in literature. The group-specific motifs showed high sensitivity (> 70% and very high specificity (> 90%. The motifs from three groups (corresponding to class A, C and D had a high level of conservation at DNA as well as protein level whereas the motifs from the fourth group (corresponding to class B showed conservation at only protein level. Conclusion The graph-based classification of beta-lactamase proteins corresponds with the classification proposed by Ambler, thus there is

  17. Assembly and phylogenetic structure of Neotropical palm communities

    DEFF Research Database (Denmark)

    Eiserhardt, Wolf L.; Svenning, J.-C.; Balslev, Henrik

    Diversity, composition and dynamics of Neotropical palm communities are receiving an increasing amount of attention due to their economic importance, but also because their high species richness and functional diversity render them valuable model systems for overall forest biodiversity. However......, to better understand these palm communities, it is crucial to gain insight into the mechanisms responsible for their assembly. These can be dispersal limitation, environmental filtering, or biotic interactions. If the degree of niche conservatism is known for a group of organisms, patterns of community...... an unspecific assumption of “general niche conservatism”, phylogenetic signal will be analysed for Neotropical palms. Moreover, as an example for evolutionary mechanisms disrupting phylogenetic signal, speciation modes will be examined in selected genera. With the combined results we aim to show the relative...

  18. Efficient parsimony-based methods for phylogenetic network reconstruction.

    Science.gov (United States)

    Jin, Guohua; Nakhleh, Luay; Snir, Sagi; Tuller, Tamir

    2007-01-15

    Phylogenies--the evolutionary histories of groups of organisms-play a major role in representing relationships among biological entities. Although many biological processes can be effectively modeled as tree-like relationships, others, such as hybrid speciation and horizontal gene transfer (HGT), result in networks, rather than trees, of relationships. Hybrid speciation is a significant evolutionary mechanism in plants, fish and other groups of species. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Maximum parsimony is one of the most commonly used criteria for phylogenetic tree inference. Roughly speaking, inference based on this criterion seeks the tree that minimizes the amount of evolution. In 1990, Jotun Hein proposed using this criterion for inferring the evolution of sequences subject to recombination. Preliminary results on small synthetic datasets. Nakhleh et al. (2005) demonstrated the criterion's application to phylogenetic network reconstruction in general and HGT detection in particular. However, the naive algorithms used by the authors are inapplicable to large datasets due to their demanding computational requirements. Further, no rigorous theoretical analysis of computing the criterion was given, nor was it tested on biological data. In the present work we prove that the problem of scoring the parsimony of a phylogenetic network is NP-hard and provide an improved fixed parameter tractable algorithm for it. Further, we devise efficient heuristics for parsimony-based reconstruction of phylogenetic networks. We test our methods on both synthetic and biological data (rbcL gene in bacteria) and obtain very promising results.

  19. Origin and evolution of group XI secretory phospholipase A2 from flax (Linum usitatissimum) based on phylogenetic analysis of conserved domains.

    Science.gov (United States)

    Gupta, Payal; Saini, Raman; Dash, Prasanta K

    2017-07-01

    Phospholipase A 2 (PLA 2 ) belongs to class of lipolytic enzymes (EC 3.1.1.4). Lysophosphatidic acid (LPA) and free fatty acids (FFAs) are the products of PLA 2 catalyzed hydrolysis of phosphoglycerides at sn-2 position. LPA and FFA that act as second mediators involved in the development and maturation of plants and animals. Mining of flax genome identified two phospholipase A 2 encoding genes, viz., LusPLA 2 I and LusPLA 2 II (Linum usitatissimum secretory phospholipase A 2 ). Molecular simulation of LusPLA 2 s with already characterized plant sPLA 2 s revealed the presence of conserved motifs and signature domains necessary to classify them as secretory phospholipase A 2 . Phylogenetic analysis of flax sPLA 2 with representative sPLA 2 s from other organisms revealed that they evolved rapidly via gene duplication/deletion events and shares a common ancestor. Our study is the first report of detailed phylogenetic analysis for secretory phospholipase A 2 in flax. Comparative genomic analysis of two LusPLA 2 s with earlier reported plant sPLA 2 s, based on their gene architectures, sequence similarities, and domain structures are presented elucidating the uniqueness of flax sPLA 2 .

  20. Comprehensive human transcription factor binding site map for combinatory binding motifs discovery.

    Directory of Open Access Journals (Sweden)

    Arnoldo J Müller-Molina

    Full Text Available To know the map between transcription factors (TFs and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.

  1. Memfasilitasi Penalaran Geometri Transformasi Siswa Melalui Eksplorasi Motif Melayu dengan Bantuan Grid

    Directory of Open Access Journals (Sweden)

    Febrian Febrian

    2017-10-01

    Full Text Available Geometri transformasi merupakan pengetahuan yang krusial dalam geometri yang dapat membangun banyak kemampuan lainnya seperti penalaran matematis. Oleh karena itu, geometri transformasi disarankan untuk diberikan pada pebelajar mulai dari usia dini. Penelitian terdahulu menunjukkan bahwa anak-anak memiliki sense untuk melihat karakteristik kedinamisan pada benda, oleh karena itu memfasilitasi pembelajaran yang dapat memanfaatkan sense ini menjadi sangat penting untuk membangun pemahaman geometri transformasi. Penelitian design research ini bertujuan untuk memfasilitasi siswa sekolah dasar untuk dapat mengembangkan pengetahuan awal mereka mengenai komposisi transformasi. Subjek penelitian adalah siswa kelas IV Sekolah Dasar Negeri 001 Toapaya, Kabupaten Bintan, Kepulauan Riau. Pendekatan pembelajaran yang digunakan adalah PMRI dengan konteks motif melayu itik pulang petang dengan bantuan grid. Hasil menunjukkan bahwa setting pembelajaran dapat memfasilitasi penalaran geometri transformasi melalui kegiatan eksplorasi motif dengan bantuan grid. Kata Kunci: komposisi transformasi, penalaran, motif melayu, grid, PMRI Transformation geometry is a crucial knowledge in geometry that can emerge many skills especially mathematical reasoning. Therefore, transformation geometry is suggested to be taught to children especially the young learners. Existing research implies that children have particular sense to see dynamic characteristic of an object or others. On the behalf of this statement, facilitating students in learning process that makes use of this students sense becomes important to undertake to help develop students reasoning of transformation geometry. The subtopic being highlighted is the composition of transformation. This design research aims to facilitate this situation. The subject of the research is fourth graders of the State Elementary School of 001 at Toapaya, Kabupaten Bintan, Kepulauan Riau. The learning approach used was PMRI by using

  2. Sequence and phylogenetic analysis of virulent Newcastle disease virus isolates from Pakistan during 2009–2013 reveals circulation of new sub genotype

    Energy Technology Data Exchange (ETDEWEB)

    Siddique, Naila, E-mail: naila.nrlpd@gmail.com [National Reference Laboratory for Poultry Diseases, Animal Sciences Institute, National Agricultural Research Center, Islamabad (Pakistan); Naeem, Khalid; Abbas, Muhammad Athar; Ali Malik, Akbar; Rashid, Farooq; Rafique, Saba; Ghafar, Abdul; Rehman, Abdul [National Reference Laboratory for Poultry Diseases, Animal Sciences Institute, National Agricultural Research Center, Islamabad (Pakistan)

    2013-09-15

    Despite observing the standard bio-security measures at commercial poultry farms and extensive use of Newcastle disease vaccines, a new genotype VII-f of Newcastle disease virus (NDV) got introduced in Pakistan during 2011. In this regard 300 ND outbreaks recorded so far have resulted into huge losses of approximately USD 200 million during 2011–2013. A total of 33 NDV isolates recovered during 2009–2013 throughout Pakistan were characterized biologically and phylogenetically. The phylogenetic analysis revealed a new velogenic sub genotype VII-f circulating in commercial and domestic poultry along with the earlier reported sub genotype VII-b. Partial sequencing of Fusion gene revealed two types of cleavage site motifs; lentogenic {sup 112}GRQGRL{sup 117} and velogenic {sup 112}RRQKRF{sup 117} along with some point mutations indicative of genetic diversity. We report here a new sub genotype of virulent NDV circulating in commercial and backyard poultry in Pakistan and provide evidence for the possible genetic diversity which may be causing new NDV out breaks. - Highlights: • The first report of isolation of new genotype VII-f of virulent Newcastle disease virus (NDV) in Pakistan. • We report the partial Fusion gene sequences of new genotype VII-f of virulent NDV from Pakistan. • We report the phylogenetic relationship of new NDV strains with reported NDV strains. • Provide outbreak history of new virulent NDV strain in commercial and backyard poultry in Pakistan. • We provide possible evidence for the role of backyard poultry in NDV outbreaks.

  3. Nucleotide diversity and phylogenetic relationships among ...

    Indian Academy of Sciences (India)

    NIRAJ SINGH

    for phylogenetic analysis of Gladiolus and related taxa using combined datasets from chloroplast genome. The psbA–trnH ... phylogenetic relationships among cultivars could be useful for hybridization programmes for further improvement of the crop. [Singh N. ... breeding in nature, and exhibited diverse pollination mech-.

  4. An intracellular motif of GLUT4 regulates fusion of GLUT4-containing vesicles

    Directory of Open Access Journals (Sweden)

    Welsh Gavin I

    2008-05-01

    Full Text Available Abstract Background Insulin stimulates glucose uptake by adipocytes through increasing translocation of the glucose transporter GLUT4 from an intracellular compartment to the plasma membrane. Fusion of GLUT4-containing vesicles at the cell surface is thought to involve phospholipase D activity, generating the signalling lipid phosphatidic acid, although the mechanism of action is not yet clear. Results Here we report the identification of a putative phosphatidic acid-binding motif in a GLUT4 intracellular loop. Mutation of this motif causes a decrease in the insulin-induced exposure of GLUT4 at the cell surface of 3T3-L1 adipocytes via an effect on vesicle fusion. Conclusion The potential phosphatidic acid-binding motif identified in this study is unique to GLUT4 among the sugar transporters, therefore this motif may provide a unique mechanism for regulating insulin-induced translocation by phospholipase D signalling.

  5. A phylogenetic transform enhances analysis of compositional microbiota data.

    Science.gov (United States)

    Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

    2017-02-15

    Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.

  6. Multiple TPR motifs characterize the Fanconi anemia FANCG protein.

    Science.gov (United States)

    Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans

    2004-01-05

    The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.

  7. Elucidation of functional markers from Aspergillus nidulans developmental regulator FlbB and their phylogenetic distribution.

    Directory of Open Access Journals (Sweden)

    Marc S Cortese

    Full Text Available Aspergillus nidulans is a filamentous fungus widely used as a model for biotechnological and clinical research. It is also used as a platform for the study of basic eukaryotic developmental processes. Previous studies identified and partially characterized a set of proteins controlling cellular transformations in this ascomycete. Among these proteins, the bZip type transcription factor FlbB is a key regulator of reproduction, stress responses and cell-death. Our aim here was the prediction, through various bioinformatic methods, of key functional residues and motifs within FlbB in order to inform the design of future laboratory experiments and further the understanding of the molecular mechanisms that control fungal development. A dataset of FlbB orthologs and those of its key interaction partner FlbE was assembled from 40 members of the Pezizomycotina. Unique features were identified in each of the three structural domains of FlbB. The N-terminal region encoded a bZip transcription factor domain with a novel histidine-containing DNA binding motif while the dimerization determinants exhibited two distinct profiles that segregated by class. The C-terminal region of FlbB showed high similarity with the AP-1 family of stress response regulators but with variable patterns of conserved cysteines that segregated by class and order. Motif conservation analysis revealed that nine FlbB orthologs belonging to the Eurotiales order contained a motif in the central region that could mediate interaction with FlbE. The key residues and motifs identified here provide a basis for the design of follow-up experimental investigations. Additionally, the presence or absence of these residues and motifs among the FlbB orthologs could help explain the differences in the developmental programs among fungal species as well as define putative complementation groups that could serve to extend known functional characterizations to other species.

  8. The city as a motif in Slovene youth literature

    Directory of Open Access Journals (Sweden)

    Milena Mileva Blažić

    2003-01-01

    Full Text Available The article presents the city as motif of Slovenian youth literature in four different periods, beginning in the first period of original Slovenian youth literature in the second half of the 19th century, second period in the first half of the 20th century, third period in the second half of the 20th century and after 1950, when significant books were produced in the field of short modern stories, emphasising on picture books and realistic narrative prose, and the fourth period after 1990. A discernable shift can be observed in the thirties of the 20th century, during the times of socialist realism. The most significant change occurred after 1960, when massive migration from rural to urban environments caused by industrialisation began. The motif of urban environment especially marked modern realistic narrative, coined problematic narrative after 1990, with its focus on issues of growing up in such environments. The city as motif or theme doesn’t appear only in realistic narrative, but since the early 20th century also in fantastic narrative, thus it dichotomically presents the image of real world in Slovenian youth realistic narrative.

  9. Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

    Science.gov (United States)

    Yang, Ziheng; Zhu, Tianqi

    2018-02-20

    The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

  10. Binding of the cSH3 domain of Grb2 adaptor to two distinct RXXK motifs within Gab1 docker employs differential mechanisms.

    Science.gov (United States)

    McDonald, Caleb B; Seldeen, Kenneth L; Deegan, Brian J; Bhat, Vikas; Farooq, Amjad

    2011-01-01

    A ubiquitous component of cellular signaling machinery, Gab1 docker plays a pivotal role in routing extracellular information in the form of growth factors and cytokines to downstream targets such as transcription factors within the nucleus. Here, using isothermal titration calorimetry (ITC) in combination with macromolecular modeling (MM), we show that although Gab1 contains four distinct RXXK motifs, designated G1, G2, G3, and G4, only G1 and G2 motifs bind to the cSH3 domain of Grb2 adaptor and do so with distinct mechanisms. Thus, while the G1 motif strictly requires the PPRPPKP consensus sequence for high-affinity binding to the cSH3 domain, the G2 motif displays preference for the PXVXRXLKPXR consensus. Such sequential differences in the binding of G1 and G2 motifs arise from their ability to adopt distinct polyproline type II (PPII)- and 3(10) -helical conformations upon binding to the cSH3 domain, respectively. Collectively, our study provides detailed biophysical insights into a key protein-protein interaction involved in a diverse array of signaling cascades central to health and disease. Copyright © 2010 John Wiley & Sons, Ltd.

  11. Peptide-binding motifs of two common equine class I MHC molecules in Thoroughbred horses.

    Science.gov (United States)

    Bergmann, Tobias; Lindvall, Mikaela; Moore, Erin; Moore, Eugene; Sidney, John; Miller, Donald; Tallmadge, Rebecca L; Myers, Paisley T; Malaker, Stacy A; Shabanowitz, Jeffrey; Osterrieder, Nikolaus; Peters, Bjoern; Hunt, Donald F; Antczak, Douglas F; Sette, Alessandro

    2017-05-01

    Quantitative peptide-binding motifs of MHC class I alleles provide a valuable tool to efficiently identify putative T cell epitopes. Detailed information on equine MHC class I alleles is still very limited, and to date, only a single equine MHC class I allele, Eqca-1*00101 (ELA-A3 haplotype), has been characterized. The present study extends the number of characterized ELA class I specificities in two additional haplotypes found commonly in the Thoroughbred breed. Accordingly, we here report quantitative binding motifs for the ELA-A2 allele Eqca-16*00101 and the ELA-A9 allele Eqca-1*00201. Utilizing analyses of endogenously bound and eluted ligands and the screening of positional scanning combinatorial libraries, detailed and quantitative peptide-binding motifs were derived for both alleles. Eqca-16*00101 preferentially binds peptides with aliphatic/hydrophobic residues in position 2 and at the C-terminus, and Eqca-1*00201 has a preference for peptides with arginine in position 2 and hydrophobic/aliphatic residues at the C-terminus. Interestingly, the Eqca-16*00101 motif resembles that of the human HLA A02-supertype, while the Eqca-1*00201 motif resembles that of the HLA B27-supertype and two macaque class I alleles. It is expected that the identified motifs will facilitate the selection of candidate epitopes for the study of immune responses in horses.

  12. High affinity recognition of a Phytophthora protein by Arabidopsis via an RGD motif

    NARCIS (Netherlands)

    Senchou, V.; Weide, R.L.; Carrasco, A.; Bouyssou, H.; Pont-Lezica, R.; Govers, F.; Canut, H.

    2004-01-01

    The RGD tripeptide sequence, a cell adhesion motif present in several extracellular matrix proteins of mammalians, is involved in numerous plant processes. In plant-pathogen interactions, the RGD motif is believed to reduce plant defence responses by disrupting adhesions between the cell wall and

  13. DXD Motif-Dependent and -Independent Effects of the Chlamydia trachomatis Cytotoxin CT166

    Directory of Open Access Journals (Sweden)

    Miriam Bothe

    2015-02-01

    Full Text Available The Gram-negative, intracellular bacterium Chlamydia trachomatis causes acute and chronic urogenital tract infection, potentially leading to infertility and ectopic pregnancy. The only partially characterized cytotoxin CT166 of serovar D exhibits a DXD motif, which is important for the enzymatic activity of many bacterial and mammalian type A glycosyltransferases, leading to the hypothesis that CT166 possess glycosyltransferase activity. CT166-expressing HeLa cells exhibit actin reorganization, including cell rounding, which has been attributed to the inhibition of the Rho-GTPases Rac/Cdc42. Exploiting the glycosylation-sensitive Ras(27H5 antibody, we here show that CT166 induces an epitope change in Ras, resulting in inhibited ERK and PI3K signaling and delayed cell cycle progression. Consistent with the hypothesis that these effects strictly depend on the DXD motif, CT166 with the mutated DXD motif causes neither Ras-ERK inhibition nor delayed cell cycle progression. In contrast, CT166 with the mutated DXD motif is still capable of inhibiting cell migration, suggesting that CT166 with the mutated DXD motif cannot be regarded as inactive in any case. Taken together, CT166 affects various fundamental cellular processes, strongly suggesting its importance for the intracellular survival of chlamydia.

  14. Regulation of TCF ETS-domain transcription factors by helix-loop-helix motifs.

    Science.gov (United States)

    Stinson, Julie; Inoue, Toshiaki; Yates, Paula; Clancy, Anne; Norton, John D; Sharrocks, Andrew D

    2003-08-15

    DNA binding by the ternary complex factor (TCF) subfamily of ETS-domain transcription factors is tightly regulated by intramolecular and intermolecular interactions. The helix-loop-helix (HLH)-containing Id proteins are trans-acting negative regulators of DNA binding by the TCFs. In the TCF, SAP-2/Net/ERP, intramolecular inhibition of DNA binding is promoted by the cis-acting NID region that also contains an HLH-like motif. The NID also acts as a transcriptional repression domain. Here, we have studied the role of HLH motifs in regulating DNA binding and transcription by the TCF protein SAP-1 and how Cdk-mediated phosphorylation affects the inhibitory activity of the Id proteins towards the TCFs. We demonstrate that the NID region of SAP-1 is an autoinhibitory motif that acts to inhibit DNA binding and also functions as a transcription repression domain. This region can be functionally replaced by fusion of Id proteins to SAP-1, whereby the Id moiety then acts to repress DNA binding in cis. Phosphorylation of the Ids by cyclin-Cdk complexes results in reduction in protein-protein interactions between the Ids and TCFs and relief of their DNA-binding inhibitory activity. In revealing distinct mechanisms through which HLH motifs modulate the activity of TCFs, our results therefore provide further insight into the role of HLH motifs in regulating TCF function and how the inhibitory properties of the trans-acting Id HLH proteins are themselves regulated by phosphorylation.

  15. Alanine substitutions in the GXXXG motif alter C99 cleavage by γ-secretase but not its dimerization.

    Science.gov (United States)

    Higashide, Hidekazu; Ishihara, Seiko; Nobuhara, Mika; Ihara, Yasuo; Funamoto, Satoru

    2017-03-01

    The amyloid β (Aβ) protein is a major component of senile plaques, one of the neuropathological hallmarks of Alzheimer's disease. Amyloidogenic processing of amyloid precursor protein (APP) by β- and γ-secretases leads to production of Aβ. APP contains tandem triple repeats of the GXXXG motif in its extracellular juxtamembrane and transmembrane regions. It is reported that the GXXXG motif is related to protein-protein interactions, but it remains controversial whether the GXXXG motif in APP is involved in substrate dimerization and whether dimerization affects γ-secretase-dependent cleavage. Therefore, the relationship between the GXXXG motifs, substrate dimerization, and γ-secretase-dependent cleavage sites remains unclear. Here, we applied blue native poly acrylamide gel electrophoresis to examine the effect of alanine substitutions within the GXXXG motifs of APP carboxyl terminal fragment (C99) on its dimerization and Aβ production. Surprisingly, alanine substitutions in the motif failed to alter C99 dimerization in detergent soluble state. Cell-based and solubilized γ-secretase assays demonstrated that increasing alanine substitutions in the motif tended to decrease long Aβ species such as Aβ42 and Aβ43 and to increase in short Aβ species concomitantly. Our data suggest that the GXXXG motif is crucial for Aβ production, but not for C99 dimerization. © 2016 International Society for Neurochemistry.

  16. Growth rules based on the modularity of the Canarian Aeonium (Crassulaceae) and their phylogenetic value

    DEFF Research Database (Denmark)

    Jorgensen, T.H.; Olesen, J.M.

    2000-01-01

    Growth forms of 22 species of Aeonium (Crassulaceae) were quantified. Since all species are simple in their modular construction, models were developed to predict module length, branching mode and flowering probability using linear and logistic regression. When combined, the parameters...... of these models are species specific. A discriminant analysis generates a statistically significant separation of species at the level of phylogenetic sections. The results therefore demonstrate the phylogenetic value of growth rules in plants. This dynamic approach strongly contrasts with the traditional static...

  17. Phylogenetic relationships of typical antbirds (Thamnophilidae and test of incongruence based on Bayes factors

    Directory of Open Access Journals (Sweden)

    Nylander Johan AA

    2004-07-01

    Full Text Available Abstract Background The typical antbirds (Thamnophilidae form a monophyletic and diverse family of suboscine passerines that inhabit neotropical forests. However, the phylogenetic relationships within this assemblage are poorly understood. Herein, we present a hypothesis of the generic relationships of this group based on Bayesian inference analyses of two nuclear introns and the mitochondrial cytochrome b gene. The level of phylogenetic congruence between the individual genes has been investigated utilizing Bayes factors. We also explore how changes in the substitution models affected the observed incongruence between partitions of our data set. Results The phylogenetic analysis supports both novel relationships, as well as traditional groupings. Among the more interesting novel relationship suggested is that the Terenura antwrens, the wing-banded antbird (Myrmornis torquata, the spot-winged antshrike (Pygiptila stellaris and the russet antshrike (Thamnistes anabatinus are sisters to all other typical antbirds. The remaining genera fall into two major clades. The first includes antshrikes, antvireos and the Herpsilochmus antwrens, while the second clade consists of most antwren genera, the Myrmeciza antbirds, the "professional" ant-following antbirds, and allied species. Our results also support previously suggested polyphyly of Myrmotherula antwrens and Myrmeciza antbirds. The tests of phylogenetic incongruence, using Bayes factors, clearly suggests that allowing the gene partitions to have separate topology parameters clearly increased the model likelihood. However, changing a component of the nucleotide substitution model had much higher impact on the model likelihood. Conclusions The phylogenetic results are in broad agreement with traditional classification of the typical antbirds, but some relationships are unexpected based on external morphology. In these cases their true affinities may have been obscured by convergent evolution and

  18. Phylogenetic Position of Barbus lacerta Heckel, 1843

    Directory of Open Access Journals (Sweden)

    Mustafa Korkmaz

    2015-11-01

    As a result, five clades come out from phylogenetic reconstruction and in phylogenetic tree Barbus lacerta determined to be sister group of Barbus macedonicus, Barbus oligolepis and Barbus plebejus complex.

  19. The phylogenetics of succession can guide restoration

    DEFF Research Database (Denmark)

    Shooner, Stephanie; Chisholm, Chelsea Lee; Davies, T. Jonathan

    2015-01-01

    Phylogenetic tools have increasingly been used in community ecology to describe the evolutionary relationships among co-occurring species. In studies of succession, such tools may allow us to identify the evolutionary lineages most suited for particular stages of succession and habitat...... rehabilitation. However, to date, these two applications have been largely separate. Here, we suggest that information on phylogenetic community structure might help to inform community restoration strategies following major disturbance. Our study examined phylogenetic patterns of succession based...... for species sorting along abiotic gradients (slope and aspect) on the mine sites that had been abandoned for the longest. Synthesis and applications. Understanding the trajectory of succession is critical for restoration efforts. Our results suggest that early colonizers represent a phylogenetically random...

  20. Effects of Phylogenetic Tree Style on Student Comprehension

    Science.gov (United States)

    Dees, Jonathan Andrew

    Phylogenetic trees are powerful tools of evolutionary biology that have become prominent across the life sciences. Consequently, learning to interpret and reason from phylogenetic trees is now an essential component of biology education. However, students often struggle to understand these diagrams, even after explicit instruction. One factor that has been observed to affect student understanding of phylogenetic trees is style (i.e., diagonal or bracket). The goal of this dissertation research was to systematically explore effects of style on student interpretations and construction of phylogenetic trees in the context of an introductory biology course. Before instruction, students were significantly more accurate with bracket phylogenetic trees for a variety of interpretation and construction tasks. Explicit instruction that balanced the use of diagonal and bracket phylogenetic trees mitigated some, but not all, style effects. After instruction, students were significantly more accurate for interpretation tasks involving taxa relatedness and construction exercises when using the bracket style. Based on this dissertation research and prior studies on style effects, I advocate for introductory biology instructors to use only the bracket style. Future research should examine causes of style effects and variables other than style to inform the development of research-based instruction that best supports student understanding of phylogenetic trees.

  1. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    Science.gov (United States)

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  2. Neighboring phosphoSer-Pro motifs in the undefined domain of IRAK1 impart bivalent advantage for Pin1 binding.

    Science.gov (United States)

    Rogals, Monique J; Greenwood, Alexander I; Kwon, Jeahoo; Lu, Kun Ping; Nicholson, Linda K

    2016-12-01

    The peptidyl prolyl isomerase Pin1 has two domains that are considered to be its binding (WW) and catalytic (PPIase) domains, both of which interact with phosphorylated Ser/Thr-Pro motifs. This shared specificity might influence substrate selection, as many known Pin1 substrates have multiple sequentially close phosphoSer/Thr-Pro motifs, including the protein interleukin-1 receptor-associated kinase-1 (IRAK1). The IRAK1 undefined domain (UD) contains two sets of such neighboring motifs (Ser131/Ser144 and Ser163/Ser173), suggesting possible bivalent interactions with Pin1. Using a series of NMR titrations with 15N-labeled full-length Pin1 (Pin1-FL), PPIase, or WW domain and phosphopeptides representing the Ser131/Ser144 and Ser163/Ser173 regions of IRAK1-UD, bivalent interactions were investigated. Binding studies using singly phosphorylated peptides showed that individual motifs displayed weak affinities (> 100 μm) for Pin1-FL and each isolated domain. Analysis of dually phosphorylated peptides binding to Pin1-FL showed that inclusion of bivalent states was necessary to fit the data. The resulting complex model and fitted parameters were applied to predict the impact of bivalent states at low micromolar concentrations, demonstrating significant affinity enhancement for both dually phosphorylated peptides (3.5 and 24 μm for peptides based on the Ser131/Ser144 and Ser163/Ser173 regions, respectively). The complementary technique biolayer interferometry confirmed the predicted affinity enhancement for a representative set of singly and dually phosphorylated Ser131/Ser144 peptides at low micromolar concentrations, validating model predictions. These studies provide novel insights regarding the complexity of interactions between Pin1 and activated IRAK1, and more broadly suggest that phosphorylation of neighboring Ser/Thr-Pro motifs in proteins might provide competitive advantage at cellular concentrations for engaging with Pin1. © 2016 Federation of European

  3. Nucleotide diversity and phylogenetic relationships among ...

    Indian Academy of Sciences (India)

    Navya

    2 attached at the base of tree as the diverging Iridaceae relative's lineage. Present study revealed that psbA-trnH region are useful in addressing questions of phylogenetic relationships among the Gladiolus cultivars, as these intergenic spacers are more variable and have more phylogenetically informative sites than the ...

  4. Phylogenetic trees and Euclidean embeddings.

    Science.gov (United States)

    Layer, Mark; Rhodes, John A

    2017-01-01

    It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.

  5. Point estimates in phylogenetic reconstructions

    OpenAIRE

    Benner, Philipp; Bacak, Miroslav; Bourguignon, Pierre-Yves

    2013-01-01

    Motivation: The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space of phylogenetic trees, and ad hoc methods such as the derivation of a consensus tree makeup for the ill-definition of the usual concepts of posterior mean, while bootstrap methods mitigate the absence of a sound concept of variance. Yielding satisfactory results with sufficiently concentrated pos...

  6. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids.

    Science.gov (United States)

    Jansen, Robert K; Kaittanis, Charalambos; Saski, Christopher; Lee, Seung-Bum; Tomkins, Jeffrey; Alverson, Andrew J; Daniell, Henry

    2006-04-09

    Cucumis as sister to the Myrtales and therefore do not support the monophyly of the eurosid I clade. Phylogenies based on DNA sequences from complete chloroplast genome sequences provide strong support for the position of the Vitaceae as the earliest diverging lineage of rosids. Our phylogenetic analyses support recent assertions that inadequate taxon sampling and incorrect model specification for concatenated multi-gene data sets can mislead phylogenetic inferences when using whole chloroplast genomes for phylogeny reconstruction.

  7. Phylogenetic analyses of Vitis (Vitaceae based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids

    Directory of Open Access Journals (Sweden)

    Alverson Andrew J

    2006-04-01

    . However, maximum likelihood analyses place Cucumis as sister to the Myrtales and therefore do not support the monophyly of the eurosid I clade. Conclusion Phylogenies based on DNA sequences from complete chloroplast genome sequences provide strong support for the position of the Vitaceae as the earliest diverging lineage of rosids. Our phylogenetic analyses support recent assertions that inadequate taxon sampling and incorrect model specification for concatenated multi-gene data sets can mislead phylogenetic inferences when using whole chloroplast genomes for phylogeny reconstruction.

  8. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian; Huser, Raphaë l; Ali, Amal J.; Merzaban, Jasmeen; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T.

    2018-01-01

    and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter

  9. TOPDOM: database of conservatively located domains and motifs in proteins.

    Science.gov (United States)

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  10. Evidence for the additions of clustered interacting nodes during the evolution of protein interaction networks from network motifs

    Directory of Open Access Journals (Sweden)

    Guo Hao

    2011-05-01

    Full Text Available Abstract Background High-throughput screens have revealed large-scale protein interaction networks defining most cellular functions. How the proteins were added to the protein interaction network during its growth is a basic and important issue. Network motifs represent the simplest building blocks of cellular machines and are of biological significance. Results Here we study the evolution of protein interaction networks from the perspective of network motifs. We find that in current protein interaction networks, proteins of the same age class tend to form motifs and such co-origins of motif constituents are affected by their topologies and biological functions. Further, we find that the proteins within motifs whose constituents are of the same age class tend to be densely interconnected, co-evolve and share the same biological functions, and these motifs tend to be within protein complexes. Conclusions Our findings provide novel evidence for the hypothesis of the additions of clustered interacting nodes and point out network motifs, especially the motifs with the dense topology and specific function may play important roles during this process. Our results suggest functional constraints may be the underlying driving force for such additions of clustered interacting nodes.

  11. A Consistent Phylogenetic Backbone for the Fungi

    Science.gov (United States)

    Ebersberger, Ingo; de Matos Simoes, Ricardo; Kupczok, Anne; Gube, Matthias; Kothe, Erika; Voigt, Kerstin; von Haeseler, Arndt

    2012-01-01

    The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen phylogenetic conclusions. We reason that branches (splits) recovered with independent data and different tree reconstruction methods are likely to reflect true evolutionary relationships. Two complementary phylogenomic data sets based on 99 fungal genomes and 109 fungal expressed sequence tag (EST) sets analyzed with four different tree reconstruction methods shed light from different angles on the fungal tree of life. Eleven additional data sets address specifically the phylogenetic position of Blastocladiomycota, Ustilaginomycotina, and Dothideomycetes, respectively. The combined evidence from the resulting trees supports the deep-level stability of the fungal groups toward a comprehensive natural system of the fungi. In addition, our analysis reveals methodologically interesting aspects. Enrichment for EST encoded data—a common practice in phylogenomic analyses—introduces a strong bias toward slowly evolving and functionally correlated genes. Consequently, the generalization of phylogenomic data sets as collections of randomly selected genes cannot be taken for granted. A thorough characterization of the data to assess possible influences on the tree reconstruction should therefore become a standard in phylogenomic analyses. PMID:22114356

  12. FTZ-Factor1 and Fushi tarazu interact via conserved nuclear receptor and coactivator motifs

    Science.gov (United States)

    Schwartz, Carol J.E.; Sampson, Heidi M.; Hlousek, Daniela; Percival-Smith, Anthony; Copeland, John W.R.; Simmonds, Andrew J.; Krause, Henry M.

    2001-01-01

    To activate transcription, most nuclear receptor proteins require coactivators that bind to their ligand-binding domains (LBDs). The Drosophila FTZ-Factor1 (FTZ-F1) protein is a conserved member of the nuclear receptor superfamily, but was previously thought to lack an AF2 motif, a motif that is required for ligand and coactivator binding. Here we show that FTZ-F1 does have an AF2 motif and that it is required to bind a coactivator, the homeodomain-containing protein Fushi tarazu (FTZ). We also show that FTZ contains an AF2-interacting nuclear receptor box, the first to be found in a homeodomain protein. Both interaction motifs are shown to be necessary for physical interactions in vitro and for functional interactions in developing embryos. These unexpected findings have important implications for the conserved homologs of the two proteins. PMID:11157757

  13. Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities.

    Science.gov (United States)

    Parks, Donovan H; Beiko, Robert G

    2013-01-01

    High-throughput sequencing techniques have made large-scale spatial and temporal surveys of microbial communities routine. Gaining insight into microbial diversity requires methods for effectively analyzing and visualizing these extensive data sets. Phylogenetic β-diversity measures address this challenge by allowing the relationship between large numbers of environmental samples to be explored using standard multivariate analysis techniques. Despite the success and widespread use of phylogenetic β-diversity measures, an extensive comparative analysis of these measures has not been performed. Here, we compare 39 measures of phylogenetic β diversity in order to establish the relative similarity of these measures along with key properties and performance characteristics. While many measures are highly correlated, those commonly used within microbial ecology were found to be distinct from those popular within classical ecology, and from the recently recommended Gower and Canberra measures. Many of the measures are surprisingly robust to different rootings of the gene tree, the choice of similarity threshold used to define operational taxonomic units, and the presence of outlying basal lineages. Measures differ considerably in their sensitivity to rare organisms, and the effectiveness of measures can vary substantially under alternative models of differentiation. Consequently, the depth of sequencing required to reveal underlying patterns of relationships between environmental samples depends on the selected measure. Our results demonstrate that using complementary measures of phylogenetic β diversity can further our understanding of how communities are phylogenetically differentiated. Open-source software implementing the phylogenetic β-diversity measures evaluated in this manuscript is available at http://kiwi.cs.dal.ca/Software/ExpressBetaDiversity.

  14. Open Reading Frame Phylogenetic Analysis on the Cloud

    Directory of Open Access Journals (Sweden)

    Che-Lun Hung

    2013-01-01

    Full Text Available Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus.

  15. Exon silencing by UAGG motifs in response to neuronal excitation.

    Directory of Open Access Journals (Sweden)

    Ping An

    2007-02-01

    Full Text Available Alternative pre-mRNA splicing plays fundamental roles in neurons by generating functional diversity in proteins associated with the communication and connectivity of the synapse. The CI cassette of the NMDA R1 receptor is one of a variety of exons that show an increase in exon skipping in response to cell excitation, but the molecular nature of this splicing responsiveness is not yet understood. Here we investigate the molecular basis for the induced changes in splicing of the CI cassette exon in primary rat cortical cultures in response to KCl-induced depolarization using an expression assay with a tight neuron-specific readout. In this system, exon silencing in response to neuronal excitation was mediated by multiple UAGG-type silencing motifs, and transfer of the motifs to a constitutive exon conferred a similar responsiveness by gain of function. Biochemical analysis of protein binding to UAGG motifs in extracts prepared from treated and mock-treated cortical cultures showed an increase in nuclear hnRNP A1-RNA binding activity in parallel with excitation. Evidence for the role of the NMDA receptor and calcium signaling in the induced splicing response was shown by the use of specific antagonists, as well as cell-permeable inhibitors of signaling pathways. Finally, a wider role for exon-skipping responsiveness is shown to involve additional exons with UAGG-related silencing motifs, and transcripts involved in synaptic functions. These results suggest that, at the post-transcriptional level, excitable exons such as the CI cassette may be involved in strategies by which neurons mount adaptive responses to hyperstimulation.

  16. A Conserved GPG-Motif in the HIV-1 Nef Core Is Required for Principal Nef-Activities.

    Directory of Open Access Journals (Sweden)

    Marta Martínez-Bonet

    Full Text Available To find out new determinants required for Nef activity we performed a functional alanine scanning analysis along a discrete but highly conserved region at the core of HIV-1 Nef. We identified the GPG-motif, located at the 121-137 region of HIV-1 NL4.3 Nef, as a novel protein signature strictly required for the p56Lck dependent Nef-induced CD4-downregulation in T-cells. Since the Nef-GPG motif was dispensable for CD4-downregulation in HeLa-CD4 cells, Nef/AP-1 interaction and Nef-dependent effects on Tf-R trafficking, the observed effects on CD4 downregulation cannot be attributed to structure constraints or to alterations on general protein trafficking. Besides, we found that the GPG-motif was also required for Nef-dependent inhibition of ring actin re-organization upon TCR triggering and MHCI downregulation, suggesting that the GPG-motif could actively cooperate with the Nef PxxP motif for these HIV-1 Nef-related effects. Finally, we observed that the Nef-GPG motif was required for optimal infectivity of those viruses produced in T-cells. According to these findings, we propose the conserved GPG-motif in HIV-1 Nef as functional region required for HIV-1 infectivity and therefore with a potential interest for the interference of Nef activity during HIV-1 infection.

  17. Characterization of Escherichia coli Phylogenetic Groups ...

    African Journals Online (AJOL)

    Background: Escherichia coli strains mainly fall into four phylogenetic groups (A, B1, B2, and D) and that virulent extra‑intestinal strains mainly belong to groups B2 and D. Aim: The aim was to determine the association between phylogenetic groups of E. coli causing extraintestinal infections (ExPEC) regarding the site of ...

  18. Linear motif atlas for phosphorylation-dependent signaling

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Jensen, LJ; Diella, F

    2008-01-01

    bind to them remains a challenge. NetPhorest is an atlas of consensus sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding domains [Src homology 2 (SH2), phosphotyrosine binding (PTB), BRCA1 C-terminal (BRCT), WW, and 14-3-3]. The atlas reveals new aspects of signaling...

  19. The conjugal-bed motif in the Alcestis Barcinonensis: two notes

    Directory of Open Access Journals (Sweden)

    Rosario Moreno Soldevila

    2011-06-01

    Full Text Available This paper focuses on the centrality occupied by the conjugal-bed motif in the anonymous poem known as Alcestis Barcinonensis, in the light of which two new interpretations of lines 21-22 and 83-85 are provided. In the first passage, beato … toro should be read as a subtle allusion to marital love, one of the central themes of the poem; in the second, uestigia alludes to a well-known literary motif related to the bed of love, thus providing a more accurate interpretation of the post mortem fidelity which Alcestis demands from her husband.

  20. A Practical Algorithm for Reconstructing Level-1 Phylogenetic Networks

    NARCIS (Netherlands)

    K.T. Huber; L.J.J. van Iersel (Leo); S.M. Kelk (Steven); R. Suchecki

    2010-01-01

    htmlabstractRecently much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks - a type of

  1. A practical algorithm for reconstructing level-1 phylogenetic networks

    NARCIS (Netherlands)

    Huber, K.T.; Iersel, van L.J.J.; Kelk, S.M.; Suchecki, R.

    2011-01-01

    Recently, much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here, we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks-a type of network

  2. Phylogenetic distribution of plant snoRNA families.

    Science.gov (United States)

    Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie; Hertel, Jana; Grosse, Ivo; Stadler, Peter F

    2016-11-24

    Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families. The sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.

  3. Phylogenetic signal in the acoustic parameters of the advertisement calls of four clades of anurans.

    Science.gov (United States)

    Gingras, Bruno; Mohandesan, Elmira; Boko, Drasko; Fitch, W Tecumseh

    2013-07-01

    Anuran vocalizations, especially their advertisement calls, are largely species-specific and can be used to identify taxonomic affiliations. Because anurans are not vocal learners, their vocalizations are generally assumed to have a strong genetic component. This suggests that the degree of similarity between advertisement calls may be related to large-scale phylogenetic relationships. To test this hypothesis, advertisement calls from 90 species belonging to four large clades (Bufo, Hylinae, Leptodactylus, and Rana) were analyzed. Phylogenetic distances were estimated based on the DNA sequences of the 12S mitochondrial ribosomal RNA gene, and, for a subset of 49 species, on the rhodopsin gene. Mean values for five acoustic parameters (coefficient of variation of root-mean-square amplitude, dominant frequency, spectral flux, spectral irregularity, and spectral flatness) were computed for each species. We then tested for phylogenetic signal on the body-size-corrected residuals of these five parameters, using three statistical tests (Moran's I, Mantel, and Blomberg's K) and three models of genetic distance (pairwise distances, Abouheif's proximities, and the variance-covariance matrix derived from the phylogenetic tree). A significant phylogenetic signal was detected for most acoustic parameters on the 12S dataset, across statistical tests and genetic distance models, both for the entire sample of 90 species and within clades in several cases. A further analysis on a subset of 49 species using genetic distances derived from rhodopsin and from 12S broadly confirmed the results obtained on the larger sample, indicating that the phylogenetic signals observed in these acoustic parameters can be detected using a variety of genetic distance models derived either from a variable mitochondrial sequence or from a conserved nuclear gene. We found a robust relationship, in a large number of species, between anuran phylogenetic relatedness and acoustic similarity in the

  4. Folding and unfolding phylogenetic trees and networks.

    Science.gov (United States)

    Huber, Katharina T; Moulton, Vincent; Steel, Mike; Wu, Taoyang

    2016-12-01

    Phylogenetic networks are rooted, labelled directed acyclic graphswhich are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be "unfolded" to obtain a MUL-tree U(N) and, conversely, a MUL-tree T can in certain circumstances be "folded" to obtain aphylogenetic network F(T) that exhibits T. In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable networks, phylogenetic networks N for which F(U(N)) is isomorphic to N, characterise such networks, and show that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U(N). To do this, we develop aphylogenetic analogue of graph fibrations. This allows us to view U(N) as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in U(N) and reconciling phylogenetic trees with networks.

  5. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  6. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.

    2015-05-12

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches yield good discriminant results, identifying dominant features of regulatory mechanisms nevertheless remains a challenge. In this work, we look at decision rules that may help identifying such features. Findings are we present a simple decision rule for classification of candidate poly (A) tail signal motifs in human genomic sequence obtained by evaluating features during the construction of gradient boosted trees. We found that values of a single feature based on the frequency of adenine in the genomic sequence surrounding candidate signal and the number of consecutive adenine molecules in a well-defined region immediately following the motif displays good discriminative potential in classification of poly (A) tail motifs for samples covered by the rule. Conclusions is the resulting simple rule can be used as an efficient filter in construction of more complex poly(A) tail motifs classification algorithms.

  7. PHYLOGEOrec: A QGIS plugin for spatial phylogeographic reconstruction from phylogenetic tree and geographical information data

    Science.gov (United States)

    Nashrulloh, Maulana Malik; Kurniawan, Nia; Rahardi, Brian

    2017-11-01

    The increasing availability of genetic sequence data associated with explicit geographic and environment (including biotic and abiotic components) information offers new opportunities to study the processes that shape biodiversity and its patterns. Developing phylogeography reconstruction, by integrating phylogenetic and biogeographic knowledge, provides richer and deeper visualization and information on diversification events than ever before. Geographical information systems such as QGIS provide an environment for spatial modeling, analysis, and dissemination by which phylogenetic models can be explicitly linked with their associated spatial data, and subsequently, they will be integrated with other related georeferenced datasets describing the biotic and abiotic environment. We are introducing PHYLOGEOrec, a QGIS plugin for building spatial phylogeographic reconstructions constructed from phylogenetic tree and geographical information data based on QGIS2threejs. By using PHYLOGEOrec, researchers can integrate existing phylogeny and geographical information data, resulting in three-dimensional geographic visualizations of phylogenetic trees in the Keyhole Markup Language (KML) format. Such formats can be overlaid on a map using QGIS and finally, spatially viewed in QGIS by means of a QGIS2threejs engine for further analysis. KML can also be viewed in reputable geobrowsers with KML-support (i.e., Google Earth).

  8. Topological variation in single-gene phylogenetic trees

    OpenAIRE

    Castresana, Jose

    2007-01-01

    A recent large-scale phylogenomic study has shown the great degree of topological variation that can be found among eukaryotic phylogenetic trees constructed from single genes, highlighting the problems that can be associated with gene sampling in phylogenetic studies.

  9. An improved model for whole genome phylogenetic analysis by Fourier transform.

    Science.gov (United States)

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees

  10. Accurate quantification of microRNA via single strand displacement reaction on DNA origami motif.

    Directory of Open Access Journals (Sweden)

    Jie Zhu

    Full Text Available DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs.

  11. Accurate Quantification of microRNA via Single Strand Displacement Reaction on DNA Origami Motif

    Science.gov (United States)

    Lou, Jingyu; Li, Weidong; Li, Sheng; Zhu, Hongxin; Yang, Lun; Zhang, Aiping; He, Lin; Li, Can

    2013-01-01

    DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs) play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs) labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs. PMID:23990889

  12. Accurate quantification of microRNA via single strand displacement reaction on DNA origami motif.

    Science.gov (United States)

    Zhu, Jie; Feng, Xiaolu; Lou, Jingyu; Li, Weidong; Li, Sheng; Zhu, Hongxin; Yang, Lun; Zhang, Aiping; He, Lin; Li, Can

    2013-01-01

    DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs) play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs) labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs.

  13. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  14. PhyDesign: an online application for profiling phylogenetic informativeness

    Directory of Open Access Journals (Sweden)

    Townsend Jeffrey P

    2011-05-01

    Full Text Available Abstract Background The rapid increase in number of sequenced genomes for species across of the tree of life is revealing a diverse suite of orthologous genes that could potentially be employed to inform molecular phylogenetic studies that encompass broader taxonomic sampling. Optimal usage of this diversity of loci requires user-friendly tools to facilitate widespread cost-effective locus prioritization for phylogenetic sampling. The Townsend (2007 phylogenetic informativeness provides a unique empirical metric for guiding marker selection. However, no software or automated methodology to evaluate sequence alignments and estimate the phylogenetic informativeness metric has been available. Results Here, we present PhyDesign, a platform-independent online application that implements the Townsend (2007 phylogenetic informativeness analysis, providing a quantitative prediction of the utility of loci to solve specific phylogenetic questions. An easy-to-use interface facilitates uploading of alignments and ultrametric trees to calculate and depict profiles of informativeness over specified time ranges, and provides rankings of locus prioritization for epochs of interest. Conclusions By providing these profiles, PhyDesign facilitates locus prioritization increasing the efficiency of sequencing for phylogenetic purposes compared to traditional studies with more laborious and low capacity screening methods, as well as increasing the accuracy of phylogenetic studies. Together with a manual and sample files, the application is freely accessible at http://phydesign.townsend.yale.edu.

  15. Effects of phylogenetic reconstruction method on the robustness of species delimitation using single-locus data.

    Science.gov (United States)

    Tang, Cuong Q; Humphreys, Aelys M; Fontaneto, Diego; Barraclough, Timothy G; Paradis, Emmanuel

    2014-10-01

    Coalescent-based species delimitation methods combine population genetic and phylogenetic theory to provide an objective means for delineating evolutionarily significant units of diversity. The generalised mixed Yule coalescent (GMYC) and the Poisson tree process (PTP) are methods that use ultrametric (GMYC or PTP) or non-ultrametric (PTP) gene trees as input, intended for use mostly with single-locus data such as DNA barcodes. Here, we assess how robust the GMYC and PTP are to different phylogenetic reconstruction and branch smoothing methods. We reconstruct over 400 ultrametric trees using up to 30 different combinations of phylogenetic and smoothing methods and perform over 2000 separate species delimitation analyses across 16 empirical data sets. We then assess how variable diversity estimates are, in terms of richness and identity, with respect to species delimitation, phylogenetic and smoothing methods. The PTP method generally generates diversity estimates that are more robust to different phylogenetic methods. The GMYC is more sensitive, but provides consistent estimates for BEAST trees. The lower consistency of GMYC estimates is likely a result of differences among gene trees introduced by the smoothing step. Unresolved nodes (real anomalies or methodological artefacts) affect both GMYC and PTP estimates, but have a greater effect on GMYC estimates. Branch smoothing is a difficult step and perhaps an underappreciated source of bias that may be widespread among studies of diversity and diversification. Nevertheless, careful choice of phylogenetic method does produce equivalent PTP and GMYC diversity estimates. We recommend simultaneous use of the PTP model with any model-based gene tree (e.g. RAxML) and GMYC approaches with BEAST trees for obtaining species hypotheses.

  16. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications.

    Science.gov (United States)

    Goremykin, Vadim V; Holland, Barbara; Hirsch-Ernst, Karen I; Hellwig, Frank H

    2005-09-01

    Determining the phylogenetic relationships among the major lines of angiosperms is a long-standing problem, yet the uncertainty as to the phylogenetic affinity of these lines persists. While a number of studies have suggested that the ANITA (Amborella-Nymphaeales-Illiciales-Trimeniales-Aristolochiales) grade is basal within angiosperms, studies of complete chloroplast genome sequences also suggested an alternative tree, wherein the line leading to the grasses branches first among the angiosperms. To improve taxon sampling in the existing chloroplast genome data, we sequenced the chloroplast genome of the monocot Acorus calamus. We generated a concatenated alignment (89,436 positions for 15 taxa), encompassing almost all sequences usable for phylogeny reconstruction within spermatophytes. The data still contain support for both the ANITA-basal and grasses-basal hypotheses. Using simulations we can show that were the ANITA-basal hypothesis true, parsimony (and distance-based methods with many models) would be expected to fail to recover it. The self-evident explanation for this failure appears to be a long-branch attraction (LBA) between the clade of grasses and the out-group. However, this LBA cannot explain the discrepancies observed between tree topology recovered using the maximum likelihood (ML) method and the topologies recovered using the parsimony and distance-based methods when grasses are deleted. Furthermore, the fact that neither maximum parsimony nor distance methods consistently recover the ML tree, when according to the simulations they would be expected to, when the out-group (Pinus) is deleted, suggests that either the generating tree is not correct or the best symmetric model is misspecified (or both). We demonstrate that the tree recovered under ML is extremely sensitive to model specification and that the best symmetric model is misspecified. Hence, we remain agnostic regarding phylogenetic relationships among basal angiosperm lineages.

  17. Phylogenetically diverse macrophyte community promotes species diversity of mobile epi-benthic invertebrates

    Science.gov (United States)

    Nakamoto, Kenta; Hayakawa, Jun; Kawamura, Tomohiko; Kodama, Masafumi; Yamada, Hideaki; Kitagawa, Takashi; Watanabe, Yoshiro

    2018-07-01

    Various aspects of plant diversity such as species diversity and phylogenetic diversity enhance the species diversity of associated animals in terrestrial systems. In marine systems, however, the effects of macrophyte diversity on the species diversity of associated animals have received little attention. Here, we sampled in a subtropical seagrass-seaweed mixed bed to elucidate the effect of the macrophyte phylogenetic diversity based on the taxonomic relatedness as well as the macrophyte species diversity on species diversity of mobile epi-benthic invertebrates. Using regression analyses for each macrophyte parameter as well as multiple regression analyses, we found that the macrophyte phylogenetic diversity (taxonomic diversity index: Delta) positively influenced the invertebrate species richness and diversity index (H‧). Although the macrophyte species richness and H‧ also positively influenced the invertebrate species richness, the best fit model for invertebrate species richness did not include them, suggesting that the macrophyte species diversity indirectly influenced invertebrate species diversity. Possible explanations of the effects of macrophyte Delta on the invertebrate species diversity were the niche complementarity effect and the selection effect. This is the first study which demonstrates that macrophyte phylogenetic diversity has a strong effect on the species diversity of mobile epi-benthic invertebrates.

  18. Seed plant phylogenetic diversity and species richness in conservation planning within a global biodiversity hotspot in eastern Asia.

    Science.gov (United States)

    Li, Rong; Kraft, Nathan J B; Yu, Haiying; Li, Heng

    2015-12-01

    One of the main goals of conservation biology is to understand the factors shaping variation in biodiversity across the planet. This understanding is critical for conservation planners to be able to develop effective conservation strategies. Although many studies have focused on species richness and the protection of rare and endemic species, less attention has been paid to the protection of the phylogenetic dimension of biodiversity. We explored how phylogenetic diversity, species richness, and phylogenetic community structure vary in seed plant communities along an elevational gradient in a relatively understudied high mountain region, the Dulong Valley, in southeastern Tibet, China. As expected, phylogenetic diversity was well correlated with species richness among the elevational bands and among communities. At the community level, evergreen broad-leaved forests had the highest levels of species richness and phylogenetic diversity. Using null model analyses, we found evidence of nonrandom phylogenetic structure across the region. Evergreen broad-leaved forests were phylogenetically overdispersed, whereas other vegetation types tended to be phylogenetically clustered. We suggest that communities with high species richness or overdispersed phylogenetic structure should be a focus for biodiversity conservation within the Dulong Valley because these areas may help maximize the potential of this flora to respond to future global change. In biodiversity hotspots worldwide, we suggest that the phylogenetic structure of a community may serve as a useful measure of phylogenetic diversity in the context of conservation planning. © 2015 Society for Conservation Biology.

  19. Phylogenetic relationships of Hemiptera inferred from mitochondrial and nuclear genes.

    Science.gov (United States)

    Song, Nan; Li, Hu; Cai, Wanzhi; Yan, Fengming; Wang, Jianyun; Song, Fan

    2016-11-01

    Here, we reconstructed the Hemiptera phylogeny based on the expanded mitochondrial protein-coding genes and the nuclear 18S rRNA gene, separately. The differential rates of change across lineages may associate with long-branch attraction (LBA) effect and result in conflicting estimates of phylogeny from different types of data. To reduce the potential effects of systematic biases on inferences of topology, various data coding schemes, site removal method, and different algorithms were utilized in phylogenetic reconstruction. We show that the outgroups Phthiraptera, Thysanoptera, and the ingroup Sternorrhyncha share similar base composition, and exhibit "long branches" relative to other hemipterans. Thus, the long-branch attraction between these groups is suspected to cause the failure of recovering Hemiptera under the homogeneous model. In contrast, a monophyletic Hemiptera is supported when heterogeneous model is utilized in the analysis. Although higher level phylogenetic relationships within Hemiptera remain to be answered, consensus between analyses is beginning to converge on a stable phylogeny.

  20. Spike Pattern Structure Influences Synaptic Efficacy Variability Under STDP and Synaptic Homeostasis. I: Spike Generating Models on Converging Motifs

    Directory of Open Access Journals (Sweden)

    Zedong eBi

    2016-02-01

    Full Text Available In neural systems, synaptic plasticity is usually driven by spike trains. Due to the inherent noises of neurons and synapses as well as the randomness of connection details, spike trains typically exhibit variability such as spatial randomness and temporal stochasticity, resulting in variability of synaptic changes under plasticity, which we call efficacy variability. How the variability of spike trains influences the efficacy variability of synapses remains unclear. In this paper, we try to understand this influence under pair-wise additive spike-timing dependent plasticity (STDP when the mean strength of plastic synapses into a neuron is bounded (synaptic homeostasis. Specifically, we systematically study, analytically and numerically, how four aspects of statistical features, i.e. synchronous firing, burstiness/regularity, heterogeneity of rates and heterogeneity of cross-correlations, as well as their interactions influence the efficacy variability in converging motifs (simple networks in which one neuron receives from many other neurons. Neurons (including the post-synaptic neuron in a converging motif generate spikes according to statistical models with tunable parameters. In this way, we can explicitly control the statistics of the spike patterns, and investigate their influence onto the efficacy variability, without worrying about the feedback from synaptic changes onto the dynamics of the post-synaptic neuron. We separate efficacy variability into two parts: the drift part (DriftV induced by the heterogeneity of change rates of different synapses, and the diffusion part (DiffV induced by weight diffusion caused by stochasticity of spike trains. Our main findings are: (1 synchronous firing and burstiness tend to increase DiffV, (2 heterogeneity of rates induces DriftV when potentiation and depression in STDP are not balanced, and (3 heterogeneity of cross-correlations induces DriftV together with heterogeneity of rates. We anticipate our

  1. Maximizing the phylogenetic diversity of seed banks.

    Science.gov (United States)

    Griffiths, Kate E; Balding, Sharon T; Dickie, John B; Lewis, Gwilym P; Pearce, Tim R; Grenyer, Richard

    2015-04-01

    Ex situ conservation efforts such as those of zoos, botanical gardens, and seed banks will form a vital complement to in situ conservation actions over the coming decades. It is therefore necessary to pay the same attention to the biological diversity represented in ex situ conservation facilities as is often paid to protected-area networks. Building the phylogenetic diversity of ex situ collections will strengthen our capacity to respond to biodiversity loss. Since 2000, the Millennium Seed Bank Partnership has banked seed from 14% of the world's plant species. We assessed the taxonomic, geographic, and phylogenetic diversity of the Millennium Seed Bank collection of legumes (Leguminosae). We compared the collection with all known legume genera, their known geographic range (at country and regional levels), and a genus-level phylogeny of the legume family constructed for this study. Over half the phylogenetic diversity of legumes at the genus level was represented in the Millennium Seed Bank. However, pragmatic prioritization of species of economic importance and endangerment has led to the banking of a less-than-optimal phylogenetic diversity and prioritization of range-restricted species risks an underdispersed collection. The current state of the phylogenetic diversity of legumes in the Millennium Seed Bank could be substantially improved through the strategic banking of relatively few additional taxa. Our method draws on tools that are widely applied to in situ conservation planning, and it can be used to evaluate and improve the phylogenetic diversity of ex situ collections. © 2014 Society for Conservation Biology.

  2. Reconstructing phylogenetic networks using maximum parsimony.

    Science.gov (United States)

    Nakhleh, Luay; Jin, Guohua; Zhao, Fengmei; Mellor-Crummey, John

    2005-01-01

    Phylogenies - the evolutionary histories of groups of organisms - are one of the most widely used tools throughout the life sciences, as well as objects of research within systematics, evolutionary biology, epidemiology, etc. Almost every tool devised to date to reconstruct phylogenies produces trees; yet it is widely understood and accepted that trees oversimplify the evolutionary histories of many groups of organims, most prominently bacteria (because of horizontal gene transfer) and plants (because of hybrid speciation). Various methods and criteria have been introduced for phylogenetic tree reconstruction. Parsimony is one of the most widely used and studied criteria, and various accurate and efficient heuristics for reconstructing trees based on parsimony have been devised. Jotun Hein suggested a straightforward extension of the parsimony criterion to phylogenetic networks. In this paper we formalize this concept, and provide the first experimental study of the quality of parsimony as a criterion for constructing and evaluating phylogenetic networks. Our results show that, when extended to phylogenetic networks, the parsimony criterion produces promising results. In a great majority of the cases in our experiments, the parsimony criterion accurately predicts the numbers and placements of non-tree events.

  3. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution.

    Science.gov (United States)

    Kendall, Michelle; Colijn, Caroline

    2016-10-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. phylogenetics, evolution, tree metrics, genetics, sequencing. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. Molecular Phylogenetics: Mathematical Framework and Unsolved Problems

    Science.gov (United States)

    Xia, Xuhua

    Phylogenetic relationship is essential in dating evolutionary events, reconstructing ancestral genes, predicting sites that are important to natural selection, and, ultimately, understanding genomic evolution. Three categories of phylogenetic methods are currently used: the distance-based, the maximum parsimony, and the maximum likelihood method. Here, I present the mathematical framework of these methods and their rationales, provide computational details for each of them, illustrate analytically and numerically the potential biases inherent in these methods, and outline computational challenges and unresolved problems. This is followed by a brief discussion of the Bayesian approach that has been recently used in molecular phylogenetics.

  5. Transnationalism as a motif in family stories.

    Science.gov (United States)

    Stone, Elizabeth; Gomez, Erica; Hotzoglou, Despina; Lipnitsky, Jane Y

    2005-12-01

    Family stories have long been recognized as a vehicle for assessing components of a family's emotional and social life, including the degree to which an immigrant family has been willing to assimilate. Transnationalism, defined as living in one or more cultures and maintaining connections to both, is now increasingly common. A qualitative study of family stories in the family of those who appear completely "American" suggests that an affiliation with one's home country is nevertheless detectable in the stories via motifs such as (1) positively connotated home remedies, (2) continuing denigration of home country "enemies," (3) extensive knowledge of the home country history and politics, (4) praise of endogamy and negative assessment of exogamy, (5) superiority of home country to America, and (6) beauty of home country. Furthermore, an awareness of which model--assimilationist or transnational--governs a family's experience may help clarify a clinician's understanding of a family's strengths, vulnerabilities, and mode of framing their cultural experiences.

  6. Insertion of tetracysteine motifs into dopamine transporter extracellular domains.

    Directory of Open Access Journals (Sweden)

    Deanna M Navaroli

    Full Text Available The neuronal dopamine transporter (DAT is a major determinant of extracellular dopamine (DA levels and is the primary target for a variety of addictive and therapeutic psychoactive drugs. DAT is acutely regulated by protein kinase C (PKC activation and amphetamine exposure, both of which modulate DAT surface expression by endocytic trafficking. In order to use live imaging approaches to study DAT endocytosis, methods are needed to exclusively label the DAT surface pool. The use of membrane impermeant, sulfonated biarsenic dyes holds potential as one such approach, and requires introduction of an extracellular tetracysteine motif (tetraCys; CCPGCC to facilitate dye binding. In the current study, we took advantage of intrinsic proline-glycine (Pro-Gly dipeptides encoded in predicted DAT extracellular domains to introduce tetraCys motifs into DAT extracellular loops 2, 3, and 4. [(3H]DA uptake studies, surface biotinylation and fluorescence microscopy in PC12 cells indicate that tetraCys insertion into the DAT second extracellular loop results in a functional transporter that maintains PKC-mediated downregulation. Introduction of tetraCys into extracellular loops 3 and 4 yielded DATs with severely compromised function that failed to mature and traffic to the cell surface. This is the first demonstration of successful introduction of a tetracysteine motif into a DAT extracellular domain, and may hold promise for use of biarsenic dyes in live DAT imaging studies.

  7. Phylogenetic community structure: temporal variation in fish assemblage

    OpenAIRE

    Santorelli, Sergio; Magnusson, William; Ferreira, Efrem; Caramaschi, Erica; Zuanon, Jansen; Amadio, Sidnéia

    2014-01-01

    Hypotheses about phylogenetic relationships among species allow inferences about the mechanisms that affect species coexistence. Nevertheless, most studies assume that phylogenetic patterns identified are stable over time. We used data on monthly samples of fish from a single lake over 10 years to show that the structure in phylogenetic assemblages varies over time and conclusions depend heavily on the time scale investigated. The data set was organized in guild structures and temporal scales...

  8. phylo-node: A molecular phylogenetic toolkit using Node.js.

    Science.gov (United States)

    O'Halloran, Damien M

    2017-01-01

    Node.js is an open-source and cross-platform environment that provides a JavaScript codebase for back-end server-side applications. JavaScript has been used to develop very fast and user-friendly front-end tools for bioinformatic and phylogenetic analyses. However, no such toolkits are available using Node.js to conduct comprehensive molecular phylogenetic analysis. To address this problem, I have developed, phylo-node, which was developed using Node.js and provides a stable and scalable toolkit that allows the user to perform diverse molecular and phylogenetic tasks. phylo-node can execute the analysis and process the resulting outputs from a suite of software options that provides tools for read processing and genome alignment, sequence retrieval, multiple sequence alignment, primer design, evolutionary modeling, and phylogeny reconstruction. Furthermore, phylo-node enables the user to deploy server dependent applications, and also provides simple integration and interoperation with other Node modules and languages using Node inheritance patterns, and a customized piping module to support the production of diverse pipelines. phylo-node is open-source and freely available to all users without sign-up or login requirements. All source code and user guidelines are openly available at the GitHub repository: https://github.com/dohalloran/phylo-node.

  9. MicroRNA categorization using sequence motifs and k-mers.

    Science.gov (United States)

    Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens

    2017-03-14

    Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

  10. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria.

    Science.gov (United States)

    Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A

    2013-09-02

    In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome

  11. Spatiotemporal network motif reveals the biological traits of developmental gene regulatory networks in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Kim Man-Sun

    2012-05-01

    Full Text Available Abstract Background Network motifs provided a “conceptual tool” for understanding the functional principles of biological networks, but such motifs have primarily been used to consider static network structures. Static networks, however, cannot be used to reveal time- and region-specific traits of biological systems. To overcome this limitation, we proposed the concept of a “spatiotemporal network motif,” a spatiotemporal sequence of network motifs of sub-networks which are active only at specific time points and body parts. Results On the basis of this concept, we analyzed the developmental gene regulatory network of the Drosophila melanogaster embryo. We identified spatiotemporal network motifs and investigated their distribution pattern in time and space. As a result, we found how key developmental processes are temporally and spatially regulated by the gene network. In particular, we found that nested feedback loops appeared frequently throughout the entire developmental process. From mathematical simulations, we found that mutual inhibition in the nested feedback loops contributes to the formation of spatial expression patterns. Conclusions Taken together, the proposed concept and the simulations can be used to unravel the design principle of developmental gene regulatory networks.

  12. On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence

    Directory of Open Access Journals (Sweden)

    Theobald Douglas L

    2011-11-01

    Full Text Available Abstract Background The universal common ancestry (UCA of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. Results For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation, readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial

  13. Modelling a 3D structure for EgDf1 from shape Echinococcus granulosus: putative epitopes, phosphorylation motifs and ligand

    Science.gov (United States)

    Paulino, M.; Esteves, A.; Vega, M.; Tabares, G.; Ehrlich, R.; Tapia, O.

    1998-07-01

    EgDf1 is a developmentally regulated protein from the parasite Echinococcus granulosus related to a family of hydrophobic ligand binding proteins. This protein could play a crucial role during the parasite life cycle development since this organism is unable to synthetize most of their own lipids de novo. Furthermore, it has been shown that two related protein from other parasitic platyhelminths (Fh15 from Fasciola hepatica and Sm14 from Schistosoma mansoni) are able to confer protective inmunity against experimental infection in animal models. A three-dimensional structure would help establishing structure/function relationships on a knowledge based manner. 3D structures for EgDf1 protein were modelled by using myelin P2 (mP2) and intestine fatty acid binding protein (I-FABP) as templates. Molecular dynamics techniques were used to validate the models. Template mP2 yielded the best 3D structure for EgDf1. Palmitic and oleic acids were docked inside EgDf1. The present theoretical results suggest definite location in the secondary structure of the epitopic regions, consensus phosphorylation motifs and oleic acid as a good ligand candidate to EgDf1. This protein might well be involved in the process of supplying hydrophobic metabolites for membrane biosynthesis and for signaling pathways.

  14. Functional motifs responsible for human metapneumovirus M2-2-mediated innate immune evasion.

    Science.gov (United States)

    Chen, Yu; Deng, Xiaoling; Deng, Junfang; Zhou, Jiehua; Ren, Yuping; Liu, Shengxuan; Prusak, Deborah J; Wood, Thomas G; Bao, Xiaoyong

    2016-12-01

    Human metapneumovirus (hMPV) is a major cause of lower respiratory infection in young children. Repeated infections occur throughout life, but its immune evasion mechanisms are largely unknown. We recently found that hMPV M2-2 protein elicits immune evasion by targeting mitochondrial antiviral-signaling protein (MAVS), an antiviral signaling molecule. However, the molecular mechanisms underlying such inhibition are not known. Our mutagenesis studies revealed that PDZ-binding motifs, 29-DEMI-32 and 39-KEALSDGI-46, located in an immune inhibitory region of M2-2, are responsible for M2-2-mediated immune evasion. We also found both motifs prevent TRAF5 and TRAF6, the MAVS downstream adaptors, to be recruited to MAVS, while the motif 39-KEALSDGI-46 also blocks TRAF3 migrating to MAVS. In parallel, these TRAFs are important in activating transcription factors NF-kB and/or IRF-3 by hMPV. Our findings collectively demonstrate that M2-2 uses its PDZ motifs to launch the hMPV immune evasion through blocking the interaction of MAVS and its downstream TRAFs. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Core signalling motif displaying multistability through multi-state enzymes

    DEFF Research Database (Denmark)

    Feng, Song; Saez Cornellana, Meritxell; Wiuf, Carsten Henrik

    2016-01-01

    Bistability, and more generally multistability, is a key system dynamics feature enabling decision-making and memory in cells. Deciphering the molecular determinants of multistability is thus crucial for a better understanding of cellular pathways and their (re)engineering in synthetic biology....... Here, we show that a key motif found predominantly in eukaryotic signalling systems, namely a futile signalling cycle, can display bistability when featuring a two-state kinase. We provide necessary and sufficient mathematical conditions on the kinetic parameters of this motif that guarantee...... the existence of multiple steady states. These conditions foster the intuition that bistability arises as a consequence of competition between the two states of the kinase. Extending from this result, we find that increasing the number of kinase states linearly translates into an increase in the number...

  16. PDL1 Signals through Conserved Sequence Motifs to Overcome Interferon-Mediated Cytotoxicity

    Directory of Open Access Journals (Sweden)

    Maria Gato-Cañas

    2017-08-01

    Full Text Available PDL1 blockade produces remarkable clinical responses, thought to occur by T cell reactivation through prevention of PDL1-PD1 T cell inhibitory interactions. Here, we find that PDL1 cell-intrinsic signaling protects cancer cells from interferon (IFN cytotoxicity and accelerates tumor progression. PDL1 inhibited IFN signal transduction through a conserved class of sequence motifs that mediate crosstalk with IFN signaling. Abrogation of PDL1 expression or antibody-mediated PDL1 blockade strongly sensitized cancer cells to IFN cytotoxicity through a STAT3/caspase-7-dependent pathway. Moreover, somatic mutations found in human carcinomas within these PDL1 sequence motifs disrupted motif regulation, resulting in PDL1 molecules with enhanced protective activities from type I and type II IFN cytotoxicity. Overall, our results reveal a mode of action of PDL1 in cancer cells as a first line of defense against IFN cytotoxicity.

  17. Virulence, serotype and phylogenetic groups of diarrhoeagenic ...

    African Journals Online (AJOL)

    Dr DADIE Thomas

    2014-02-17

    Feb 17, 2014 ... The virulence, serotype and phylogenetic traits of diarrhoeagenic Escherichia coli were detected in 502 strains isolated during digestive infections. Molecular detection of the target virulence genes, rfb gene of operon O and phylogenetic grouping genes Chua, yjaA and TSPE4.C2 was performed.

  18. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

    Science.gov (United States)

    Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

    2017-03-17

    Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Identification of a Baeyer-Villiger monooxygenase sequence motif

    NARCIS (Netherlands)

    Fraaije, MW; Kamerbeek, NM; van Berkel, WJH; Janssen, DB; Kamerbeek, Nanne M.; Berkel, Willem J.H. van

    2002-01-01

    Baeyer-Villiger monooxygenases (BVMOs) form a distinct class of flavoproteins that catalyze the insertion of an oxygen atom in a C-C bond using dioxygen and NAD(P)H. Using newly characterized BVMO sequences, we have uncovered a BVMO-identifying sequence motif: FXGXXXRXXXW(P/D). Studies with

  20. Phylogenetic framework for coevolutionary studies: a compass for exploring jungles of tangled trees.

    Science.gov (United States)

    Martínez-Aquino, Andrés

    2016-08-01

    Phylogenetics is used to detect past evolutionary events, from how species originated to how their ecological interactions with other species arose, which can mirror cophylogenetic patterns. Cophylogenetic reconstructions uncover past ecological relationships between taxa through inferred coevolutionary events on trees, for example, codivergence, duplication, host-switching, and loss. These events can be detected by cophylogenetic analyses based on nodes and the length and branching pattern of the phylogenetic trees of symbiotic associations, for example, host-parasite. In the past 2 decades, algorithms have been developed for cophylogetenic analyses and implemented in different software, for example, statistical congruence index and event-based methods. Based on the combination of these approaches, it is possible to integrate temporal information into cophylogenetical inference, such as estimates of lineage divergence times between 2 taxa, for example, hosts and parasites. Additionally, the advances in phylogenetic biogeography applying methods based on parametric process models and combined Bayesian approaches, can be useful for interpreting coevolutionary histories in a scenario of biogeographical area connectivity through time. This article briefly reviews the basics of parasitology and provides an overview of software packages in cophylogenetic methods. Thus, the objective here is to present a phylogenetic framework for coevolutionary studies, with special emphasis on groups of parasitic organisms. Researchers wishing to undertake phylogeny-based coevolutionary studies can use this review as a "compass" when "walking" through jungles of tangled phylogenetic trees.

  1. Phylogenetic analysis and protein structure modelling identifies distinct Ca(2+)/Cation antiporters and conservation of gene family structure within Arabidopsis and rice species.

    Science.gov (United States)

    Pittman, Jon K; Hirschi, Kendal D

    2016-12-01

    The Ca(2+)/Cation Antiporter (CaCA) superfamily is an ancient and widespread family of ion-coupled cation transporters found in nearly all kingdoms of life. In animals, K(+)-dependent and K(+)-indendent Na(+)/Ca(2+) exchangers (NCKX and NCX) are important CaCA members. Recently it was proposed that all rice and Arabidopsis CaCA proteins should be classified as NCX proteins. Here we performed phylogenetic analysis of CaCA genes and protein structure homology modelling to further characterise members of this transporter superfamily. Phylogenetic analysis of rice and Arabidopsis CaCAs in comparison with selected CaCA members from non-plant species demonstrated that these genes form clearly distinct families, with the H(+)/Cation exchanger (CAX) and cation/Ca(2+) exchanger (CCX) families dominant in higher plants but the NCKX and NCX families absent. NCX-related Mg(2+)/H(+) exchanger (MHX) and CAX-related Na(+)/Ca(2+) exchanger-like (NCL) proteins are instead present. Analysis of genomes of ten closely-related rice species and four Arabidopsis-related species found that CaCA gene family structures are highly conserved within related plants, apart from minor variation. Protein structures were modelled for OsCAX1a and OsMHX1. Despite exhibiting broad structural conservation, there are clear structural differences observed between the different CaCA types. Members of the CaCA superfamily form clearly distinct families with different phylogenetic, structural and functional characteristics, and therefore should not be simply classified as NCX proteins, which should remain as a separate gene family.

  2. Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    Full Text Available The new technology of protein binding microarrays (PBMs allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for reconstructing a binding site motif represented as a positional weight matrix from PBM data. The reconstructed motifs were evaluated in terms of three criteria: concordance with reference motifs from the literature and ability to predict in vivo and in vitro bindings. The evaluation encompassed over 200 transcription factors and some 300 assays. The results show a tradeoff between how the methods perform according to the different criteria, and a dichotomy of method types. Algorithms that construct motifs with low information content predict PBM probe ranking more faithfully, while methods that produce highly informative motifs match reference motifs better. Interestingly, in predicting high-affinity binding, all methods give far poorer results for in vivo assays compared to in vitro assays.

  3. Computational study of the fibril organization of polyglutamine repeats reveals a common motif identified in beta-helices.

    Science.gov (United States)

    Zanuy, David; Gunasekaran, Kannan; Lesk, Arthur M; Nussinov, Ruth

    2006-04-21

    The formation of fibril aggregates by long polyglutamine sequences is assumed to play a major role in neurodegenerative diseases such as Huntington. Here, we model peptides rich in glutamine, through a series of molecular dynamics simulations. Starting from a rigid nanotube-like conformation, we have obtained a new conformational template that shares structural features of a tubular helix and of a beta-helix conformational organization. Our new model can be described as a super-helical arrangement of flat beta-sheet segments linked by planar turns or bends. Interestingly, our comprehensive analysis of the Protein Data Bank reveals that this is a common motif in beta-helices (termed beta-bend), although it has not been identified so far. The motif is based on the alternation of beta-sheet and helical conformation as the protein sequence is followed from the N to the C termini (beta-alpha(R)-beta-polyPro-beta). We further identify this motif in the ssNMR structure of the protofibril of the amyloidogenic peptide Abeta(1-40). The recurrence of the beta-bend suggests a general mode of connecting long parallel beta-sheet segments that would allow the growth of partially ordered fibril structures. The design allows the peptide backbone to change direction with a minimal loss of main chain hydrogen bonds. The identification of a coherent organization beyond that of the beta-sheet segments in different folds rich in parallel beta-sheets suggests a higher degree of ordered structure in protein fibrils, in agreement with their low solubility and dense molecular packing.

  4. The prevalence of terraced treescapes in analyses of phylogenetic data sets.

    Science.gov (United States)

    Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J

    2018-04-04

    The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. Terraces were identified in nearly all data sets with taxon coverage densities tree. Terraces found during bootstrap resampling reduced overall support. If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

  5. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  6. The SOD gene family in tomato: identification, phylogenetic relationships and expression patterns

    Directory of Open Access Journals (Sweden)

    kun feng

    2016-08-01

    Full Text Available Superoxide dismutases (SODs are critical antioxidant enzymes that protect organisms from reactive oxygen species (ROS caused by adverse conditions, and have been widely found in the cytoplasm, chloroplasts, and mitochondria of eukaryotic and prokaryotic cells. Tomato (Solanum lycopersicum L. is an important economic crop and is cultivated worldwide. However, abiotic and biotic stresses severely hinder growth and development of the plant, which affects the production and quality of the crop. To reveal the potential roles of SOD genes under various stresses, we performed a systematic analysis of the tomato SOD gene family and analyzed the expression patterns of SlSOD genes in response to abiotic stresses at the whole-genome level. The characteristics of the SlSOD gene family were determined by analyzing gene structure, conserved motifs, chromosomal distribution, phylogenetic relationships, and expression patterns. We determined that there are at least nine SOD genes in tomato, including four Cu/ZnSODs, three FeSODs, and one MnSOD, and they are unevenly distributed on 12 chromosomes. Phylogenetic analyses of SOD genes from tomato and other plant species were separated into two groups with a high bootstrap value, indicating that these SOD genes were present before the monocot-dicot split. Additionally, many cis-elements that respond to different stresses were found in the promoters of nine SlSOD genes. Gene expression analysis based on RNA-seq data showed that most genes were expressed in all tested tissues, with the exception of SlSOD6 and SlSOD8, which were only expressed in young fruits. Microarray data analysis showed that most members of the SlSOD gene family were altered under salt- and drought-stress conditions. This genome-wide analysis of SlSOD genes helps to clarify the function of SlSOD genes under different stress conditions and provides information to aid in further understanding the evolutionary relationships of SOD genes in plants.

  7. ["Long-branch Attraction" artifact in phylogenetic reconstruction].

    Science.gov (United States)

    Li, Yi-Wei; Yu, Li; Zhang, Ya-Ping

    2007-06-01

    Phylogenetic reconstruction among various organisms not only helps understand their evolutionary history but also reveal several fundamental evolutionary questions. Understanding of the evolutionary relationships among organisms establishes the foundation for the investigations of other biological disciplines. However, almost all the widely used phylogenetic methods have limitations which fail to eliminate systematic errors effectively, preventing the reconstruction of true organismal relationships. "Long-branch Attraction" (LBA) artifact is one of the most disturbing factors in phylogenetic reconstruction. In this review, the conception and analytic method as well as the avoidance strategy of LBA were summarized. In addition, several typical examples were provided. The approach to avoid and resolve LBA artifact has been discussed.

  8. A Maximum Parsimony Model to Reconstruct Phylogenetic Network in Honey Bee Evolution

    OpenAIRE

    Usha Chouhan; K. R. Pardasani

    2007-01-01

    Phylogenies ; The evolutionary histories of groups of species are one of the most widely used tools throughout the life sciences, as well as objects of research with in systematic, evolutionary biology. In every phylogenetic analysis reconstruction produces trees. These trees represent the evolutionary histories of many groups of organisms, bacteria due to horizontal gene transfer and plants due to process of hybridization. The process of gene transfer in bacteria and hyb...

  9. Phylogenetic patterns of extinction risk in the eastern arc ecosystems, an African biodiversity hotspot.

    Science.gov (United States)

    Yessoufou, Kowiyou; Daru, Barnabas H; Davies, T Jonathan

    2012-01-01

    There is an urgent need to reduce drastically the rate at which biodiversity is declining worldwide. Phylogenetic methods are increasingly being recognised as providing a useful framework for predicting future losses, and guiding efforts for pre-emptive conservation actions. In this study, we used a reconstructed phylogenetic tree of angiosperm species of the Eastern Arc Mountains - an important African biodiversity hotspot - and described the distribution of extinction risk across taxonomic ranks and phylogeny. We provide evidence for both taxonomic and phylogenetic selectivity in extinction risk. However, we found that selectivity varies with IUCN extinction risk category. Vulnerable species are more closely related than expected by chance, whereas endangered and critically endangered species are not significantly clustered on the phylogeny. We suggest that the general observation for taxonomic and phylogenetic selectivity (i.e. phylogenetic signal, the tendency of closely related species to share similar traits) in extinction risks is therefore largely driven by vulnerable species, and not necessarily the most highly threatened. We also used information on altitudinal distribution and climate to generate a predictive model of at-risk species richness, and found that greater threatened species richness is found at higher altitude, allowing for more informed conservation decision making. Our results indicate that evolutionary history can help predict plant susceptibility to extinction threats in the hyper-diverse but woefully-understudied Eastern Arc Mountains, and illustrate the contribution of phylogenetic approaches in conserving African floristic biodiversity where detailed ecological and evolutionary data are often lacking.

  10. Phylogenetic patterns of extinction risk in the eastern arc ecosystems, an African biodiversity hotspot.

    Directory of Open Access Journals (Sweden)

    Kowiyou Yessoufou

    Full Text Available There is an urgent need to reduce drastically the rate at which biodiversity is declining worldwide. Phylogenetic methods are increasingly being recognised as providing a useful framework for predicting future losses, and guiding efforts for pre-emptive conservation actions. In this study, we used a reconstructed phylogenetic tree of angiosperm species of the Eastern Arc Mountains - an important African biodiversity hotspot - and described the distribution of extinction risk across taxonomic ranks and phylogeny. We provide evidence for both taxonomic and phylogenetic selectivity in extinction risk. However, we found that selectivity varies with IUCN extinction risk category. Vulnerable species are more closely related than expected by chance, whereas endangered and critically endangered species are not significantly clustered on the phylogeny. We suggest that the general observation for taxonomic and phylogenetic selectivity (i.e. phylogenetic signal, the tendency of closely related species to share similar traits in extinction risks is therefore largely driven by vulnerable species, and not necessarily the most highly threatened. We also used information on altitudinal distribution and climate to generate a predictive model of at-risk species richness, and found that greater threatened species richness is found at higher altitude, allowing for more informed conservation decision making. Our results indicate that evolutionary history can help predict plant susceptibility to extinction threats in the hyper-diverse but woefully-understudied Eastern Arc Mountains, and illustrate the contribution of phylogenetic approaches in conserving African floristic biodiversity where detailed ecological and evolutionary data are often lacking.

  11. DNA regulatory motif selection based on support vector machine ...

    African Journals Online (AJOL)

    ... machine (SVM) and its application in microarray experiment of Kashin-Beck disease. ... speed and amount of the corresponding mRNA in gene replication process. ... and revealed that some motifs may be related to the immune reactions.

  12. A novel fibron