WorldWideScience

Sample records for eukaryotic structural genomics

  1. Susceptibilities to DNA Structural Transitions within Eukaryotic Genomes

    Science.gov (United States)

    Zhabinskaya, Dina; Benham, Craig; Madden, Sally

    2012-02-01

    We analyze the competitive transitions to alternate secondary DNA structures in a negatively supercoiled DNA molecule of kilobase length and specified base sequence. We use statistical mechanics to calculate the competition among all regions within the sequence that are susceptible to transitions to alternate structures. We use an approximate numerical method since the calculation of an exact partition function is numerically cumbersome for DNA molecules of lengths longer than hundreds of base pairs. This method yields accurate results in reasonable computational times. We implement algorithms that calculate the competition between transitions to denatured states and to Z-form DNA. We analyze these transitions near the transcription start sites (TSS) of a set of eukaryotic genes. We find an enhancement of Z-forming regions upstream of the TSS and a depletion of denatured regions around the start sites. We confirm that these finding are statistically significant by comparing our results to a set of randomized genes with preserved base composition at each position relative to the gene start sites. When we study the correlation of these transitions in orthologous mouse and human genes we find a clear evolutionary conservation of both types of transitions around the TSS.

  2. The eukaryotic genome is structurally and functionally more like a social insect colony than a book.

    Science.gov (United States)

    Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin

    2017-11-01

    Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.

  3. Comparative Genomics of Eukaryotes.

    NARCIS (Netherlands)

    Noort, V. van

    2007-01-01

    This thesis focuses on developing comparative genomics methods in eukaryotes, with an emphasis on applications for gene function prediction and regulatory element detection. In the past, methods have been developed to predict functional associations between gene pairs in prokaryotes. The challenge

  4. Long-Range Order and Fractality in the Structure and Organization of Eukaryotic Genomes

    Science.gov (United States)

    Polychronopoulos, Dimitris; Tsiagkas, Giannis; Athanasopoulou, Labrini; Sellis, Diamantis; Almirantis, Yannis

    2014-12-01

    The late Professor J.S. Nicolis always emphasized, both in his writings and in presentations and discussions with students and friends, the relevance of a dynamical systems approach to biology. In particular, viewing the genome as a "biological text" captures the dynamical character of both the evolution and function of the organisms in the form of correlations indicating the presence of a long-range order. This genomic structure can be expressed in forms reminiscent of natural languages and several temporal and spatial traces l by the functioning of dynamical systems: Zipf laws, self-similarity and fractality. Here we review several works of our group and recent unpublished results, focusing on the chromosomal distribution of biologically active genomic components: Genes and protein-coding segments, CpG islands, transposable elements belonging to all major classes and several types of conserved non-coding genomic elements. We report the systematic appearance of power-laws in the size distribution of the distances between elements belonging to each of these types of functional genomic elements. Moreover, fractality is also found in several cases, using box-counting and entropic scaling.We present here, for the first time in a unified way, an aggregative model of the genomic dynamics which can explain the observed patterns on the grounds of known phenomena accompanying genome evolution. Our results comply with recent findings about a "fractal globule" geometry of chromatin in the eukaryotic nucleus.

  5. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    Energy Technology Data Exchange (ETDEWEB)

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  6. Evidence of pervasive biologically functional secondary structures within the genomes of eukaryotic single-stranded DNA viruses.

    Science.gov (United States)

    Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick

    2014-02-01

    Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.

  7. Compositional patterns in the genomes of unicellular eukaryotes.

    Science.gov (United States)

    Costantini, Maria; Alvarez-Valin, Fernando; Costantini, Susan; Cammarano, Rosalia; Bernardi, Giorgio

    2013-11-05

    The genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores, large and fairly homogeneous stretches of DNA that belong to a small number of families characterized by different average GC levels, by different gene concentration (that increase with GC), different chromatin structures, different replication timing in the cell cycle, and other different properties. A question raised by these basic results concerns how far back in evolution the compartmentalized organization of the eukaryotic genomes arose. In the present work we approached this problem by studying the compositional organization of the genomes from the unicellular eukaryotes for which full sequences are available, the sample used being representative. The average GC levels of the genomes from unicellular eukaryotes cover an extremely wide range (19%-60% GC) and the compositional patterns of individual genomes are extremely different but all genomes tested show a compositional compartmentalization. The average GC range of the genomes of unicellular eukaryotes is very broad (as broad as that of prokaryotes) and individual compositional patterns cover a very broad range from very narrow to very complex. Both features are not surprising for organisms that are very far from each other both in terms of phylogenetic distances and of environmental life conditions. Most importantly, all genomes tested, a representative sample of all supergroups of unicellular eukaryotes, are compositionally compartmentalized, a major difference with prokaryotes.

  8. Insight into structure and assembly of the nuclear pore complex by utilizing the genome of a eukaryotic thermophile

    DEFF Research Database (Denmark)

    Amlacher, Stefan; Sarges, Phillip; Flemming, Dirk

    2011-01-01

    is composed of two large Nups, Nup192 and Nup170, which are flexibly bridged by short linear motifs made up of linker Nups, Nic96 and Nup53. This assembly illustrates how Nup interactions can generate structural plasticity within the NPC scaffold. Our findings therefore demonstrate the utility of the genome...

  9. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  10. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  11. Genomic impact of eukaryotic transposable elements.

    Science.gov (United States)

    Arkhipova, Irina R; Batzer, Mark A; Brosius, Juergen; Feschotte, Cédric; Moran, John V; Schmitz, Jürgen; Jurka, Jerzy

    2012-11-21

    The third international conference on the genomic impact of eukaryotic transposable elements (TEs) was held 24 to 28 February 2012 at the Asilomar Conference Center, Pacific Grove, CA, USA. Sponsored in part by the National Institutes of Health grant 5 P41 LM006252, the goal of the conference was to bring together researchers from around the world who study the impact and mechanisms of TEs using multiple computational and experimental approaches. The meeting drew close to 170 attendees and included invited floor presentations on the biology of TEs and their genomic impact, as well as numerous talks contributed by young scientists. The workshop talks were devoted to computational analysis of TEs with additional time for discussion of unresolved issues. Also, there was ample opportunity for poster presentations and informal evening discussions. The success of the meeting reflects the important role of Repbase in comparative genomic studies, and emphasizes the need for close interactions between experimental and computational biologists in the years to come.

  12. Genomic impact of eukaryotic transposable elements

    Directory of Open Access Journals (Sweden)

    Arkhipova Irina R

    2012-11-01

    Full Text Available Abstract The third international conference on the genomic impact of eukaryotic transposable elements (TEs was held 24 to 28 February 2012 at the Asilomar Conference Center, Pacific Grove, CA, USA. Sponsored in part by the National Institutes of Health grant 5 P41 LM006252, the goal of the conference was to bring together researchers from around the world who study the impact and mechanisms of TEs using multiple computational and experimental approaches. The meeting drew close to 170 attendees and included invited floor presentations on the biology of TEs and their genomic impact, as well as numerous talks contributed by young scientists. The workshop talks were devoted to computational analysis of TEs with additional time for discussion of unresolved issues. Also, there was ample opportunity for poster presentations and informal evening discussions. The success of the meeting reflects the important role of Repbase in comparative genomic studies, and emphasizes the need for close interactions between experimental and computational biologists in the years to come.

  13. The Genome of Naegleria gruberi Illuminates Early Eukaryotic Versatility

    Energy Technology Data Exchange (ETDEWEB)

    Fritz-Laylin, Lillian K.; Prochnik, Simon E.; Ginger, Michael L.; Dacks, Joel; Carpenter, Meredith L.; Field, Mark C.; Kuo, Alan; Paredez, Alex; Chapman, Jarrod; Pham, Jonathan; Shu, Shengqiang; Neupane, Rochak; Cipriano, Michael; Mancuso, Joel; Tu, Hank; Salamov, Asaf; Lindquist, Erika; Shapiro, Harris; Lucas, Susan; Grigoriev, Igor V.; Cande, W. Zacheus; Fulton, Chandler; Rokhsar, Daniel S.; Dawson, Scott C.

    2010-03-01

    Genome sequences of diverse free-living protists are essential for understanding eukaryotic evolution and molecular and cell biology. The free-living amoeboflagellate Naegleria gruberi belongs to a varied and ubiquitous protist clade (Heterolobosea) that diverged from other eukaryotic lineages over a billion years ago. Analysis of the 15,727 protein-coding genes encoded by Naegleria's 41 Mb nuclear genome indicates a capacity for both aerobic respiration and anaerobic metabolism with concomitant hydrogen production, with fundamental implications for the evolution of organelle metabolism. The Naegleria genome facilitates substantially broader phylogenomic comparisons of free-living eukaryotes than previously possible, allowing us to identify thousands of genes likely present in the pan-eukaryotic ancestor, with 40% likely eukaryotic inventions. Moreover, we construct a comprehensive catalog of amoeboid-motility genes. The Naegleria genome, analyzed in the context of other protists, reveals a remarkably complex ancestral eukaryote with a rich repertoire of cytoskeletal, sexual, signaling, and metabolic modules.

  14. diArk – a resource for eukaryotic genome research

    Directory of Open Access Journals (Sweden)

    Kollmar Martin

    2007-04-01

    Full Text Available Abstract Background The number of completed eukaryotic genome sequences and cDNA projects has increased exponentially in the past few years although most of them have not been published yet. In addition, many microarray analyses yielded thousands of sequenced EST and cDNA clones. For the researcher interested in single gene analyses (from a phylogenetic, a structural biology or other perspective it is therefore important to have up-to-date knowledge about the various resources providing primary data. Description The database is built around 3 central tables: species, sequencing projects and publications. The species table contains commonly and alternatively used scientific names, common names and the complete taxonomic information. For projects the sequence type and links to species project web-sites and species homepages are stored. All publications are linked to projects. The web-interface provides comprehensive search modules with detailed options and three different views of the selected data. We have especially focused on developing an elaborate taxonomic tree search tool that allows the user to instantaneously identify e.g. the closest relative to the organism of interest. Conclusion We have developed a database, called diArk, to store, organize, and present the most relevant information about completed genome projects and EST/cDNA data from eukaryotes. Currently, diArk provides information about 415 eukaryotes, 823 sequencing projects, and 248 publications.

  15. Origin and evolution of SINEs in eukaryotic genomes.

    Science.gov (United States)

    Kramerov, D A; Vassetzky, N S

    2011-12-01

    Short interspersed elements (SINEs) are one of the two most prolific mobile genomic elements in most of the higher eukaryotes. Although their biology is still not thoroughly understood, unusual life cycle of these simple elements amplified as genomic parasites makes their evolution unique in many ways. In contrast to most genetic elements including other transposons, SINEs emerged de novo many times in evolution from available molecules (for example, tRNA). The involvement of reverse transcription in their amplification cycle, huge number of genomic copies and modular structure allow variation mechanisms in SINEs uncommon or rare in other genetic elements (module exchange between SINE families, dimerization, and so on.). Overall, SINE evolution includes their emergence, progressive optimization and counteraction to the cell's defense against mobile genetic elements.

  16. Genome-reconstruction for eukaryotes from complex natural microbial communities.

    Science.gov (United States)

    West, Patrick T; Probst, Alexander J; Grigoriev, Igor V; Thomas, Brian C; Banfield, Jillian F

    2018-04-01

    Microbial eukaryotes are integral components of natural microbial communities, and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k -mer-based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation, and prediction of metabolic potential. We used this approach to test the effect of addition of organic carbon on a geyser-associated microbial community and detected a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the Eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon-impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. We demonstrate the broader utility of EukRep by reconstructing and evaluating relatively high-quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities. © 2018 West et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Single Cell Genomics and Transcriptomics for Unicellular Eukaryotes

    Energy Technology Data Exchange (ETDEWEB)

    Ciobanu, Doina; Clum, Alicia; Singh, Vasanth; Salamov, Asaf; Han, James; Copeland, Alex; Grigoriev, Igor; James, Timothy; Singer, Steven; Woyke, Tanja; Malmstrom, Rex; Cheng, Jan-Fang

    2014-03-14

    Despite their small size, unicellular eukaryotes have complex genomes with a high degree of plasticity that allow them to adapt quickly to environmental changes. Unicellular eukaryotes live with prokaryotes and higher eukaryotes, frequently in symbiotic or parasitic niches. To this day their contribution to the dynamics of the environmental communities remains to be understood. Unfortunately, the vast majority of eukaryotic microorganisms are either uncultured or unculturable, making genome sequencing impossible using traditional approaches. We have developed an approach to isolate unicellular eukaryotes of interest from environmental samples, and to sequence and analyze their genomes and transcriptomes. We have tested our methods with six species: an uncharacterized protist from cellulose-enriched compost identified as Platyophrya, a close relative of P. vorax; the fungus Metschnikowia bicuspidate, a parasite of water flea Daphnia; the mycoparasitic fungi Piptocephalis cylindrospora, a parasite of Cokeromyces and Mucor; Caulochytrium protosteloides, a parasite of Sordaria; Rozella allomycis, a parasite of the water mold Allomyces; and the microalgae Chlamydomonas reinhardtii. Here, we present the four components of our approach: pre-sequencing methods, sequence analysis for single cell genome assembly, sequence analysis of single cell transcriptomes, and genome annotation. This technology has the potential to uncover the complexity of single cell eukaryotes and their role in the environmental samples.

  18. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    Science.gov (United States)

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  19. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

    Directory of Open Access Journals (Sweden)

    Yubo Hou

    Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

  20. Strong eukaryotic IRESs have weak secondary structure.

    Directory of Open Access Journals (Sweden)

    Xuhua Xia

    Full Text Available BACKGROUND: The objective of this work was to investigate the hypothesis that eukaryotic Internal Ribosome Entry Sites (IRES lack secondary structure and to examine the generality of the hypothesis. METHODOLOGY/PRINCIPAL FINDINGS: IRESs of the yeast and the fruit fly are located in the 5'UTR immediately upstream of the initiation codon. The minimum folding energy (MFE of 60 nt RNA segments immediately upstream of the initiation codons was calculated as a proxy of secondary structure stability. MFE of the reverse complements of these 60 nt segments was also calculated. The relationship between MFE and empirically determined IRES activity was investigated to test the hypothesis that strong IRES activity is associated with weak secondary structure. We show that IRES activity in the yeast and the fruit fly correlates strongly with the structural stability, with highest IRES activity found in RNA segments that exhibit the weakest secondary structure. CONCLUSIONS: We found that a subset of eukaryotic IRESs exhibits very low secondary structure in the 5'-UTR sequences immediately upstream of the initiation codon. The consistency in results between the yeast and the fruit fly suggests a possible shared mechanism of cap-independent translation initiation that relies on an unstructured RNA segment.

  1. Introns Protect Eukaryotic Genomes from Transcription-Associated Genetic Instability.

    Science.gov (United States)

    Bonnet, Amandine; Grosso, Ana R; Elkaoutari, Abdessamad; Coleno, Emeline; Presle, Adrien; Sridhara, Sreerama C; Janbon, Guilhem; Géli, Vincent; de Almeida, Sérgio F; Palancade, Benoit

    2017-08-17

    Transcription is a source of genetic instability that can notably result from the formation of genotoxic DNA:RNA hybrids, or R-loops, between the nascent mRNA and its template. Here we report an unexpected function for introns in counteracting R-loop accumulation in eukaryotic genomes. Deletion of endogenous introns increases R-loop formation, while insertion of an intron into an intronless gene suppresses R-loop accumulation and its deleterious impact on transcription and recombination in yeast. Recruitment of the spliceosome onto the mRNA, but not splicing per se, is shown to be critical to attenuate R-loop formation and transcription-associated genetic instability. Genome-wide analyses in a number of distant species differing in their intron content, including human, further revealed that intron-containing genes and the intron-richest genomes are best protected against R-loop accumulation and subsequent genetic instability. Our results thereby provide a possible rationale for the conservation of introns throughout the eukaryotic lineage. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  3. GFFview: A Web Server for Parsing and Visualizing Annotation Information of Eukaryotic Genome.

    Science.gov (United States)

    Deng, Feilong; Chen, Shi-Yi; Wu, Zhou-Lin; Hu, Yongsong; Jia, Xianbo; Lai, Song-Jia

    2017-10-01

    Owing to wide application of RNA sequencing (RNA-seq) technology, more and more eukaryotic genomes have been extensively annotated, such as the gene structure, alternative splicing, and noncoding loci. Annotation information of genome is prevalently stored as plain text in General Feature Format (GFF), which could be hundreds or thousands Mb in size. Therefore, it is a challenge for manipulating GFF file for biologists who have no bioinformatic skill. In this study, we provide a web server (GFFview) for parsing the annotation information of eukaryotic genome and then generating statistical description of six indices for visualization. GFFview is very useful for investigating quality and difference of the de novo assembled transcriptome in RNA-seq studies.

  4. EUPAN enables pan-genome studies of a large number of eukaryotic genomes.

    Science.gov (United States)

    Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun

    2017-08-01

    Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Functions and structures of eukaryotic recombination proteins

    International Nuclear Information System (INIS)

    Ogawa, Tomoko

    1994-01-01

    We have found that Rad51 and RecA Proteins form strikingly similar structures together with dsDNA and ATP. Their right handed helical nucleoprotein filaments extend the B-form DNA double helixes to 1.5 times in length and wind the helix. The similarity and uniqueness of their structures must reflect functional homologies between these proteins. Therefore, it is highly probable that similar recombination proteins are present in various organisms of different evolutional states. We have succeeded to clone RAD51 genes from human, mouse, chicken and fission yeast genes, and found that the homologues are widely distributed in eukaryotes. The HsRad51 and MmRad51 or ChRad51 proteins consist of 339 amino acids differing only by 4 or 12 amino acids, respectively, and highly homologous to both yeast proteins, but less so to Dmcl. All of these proteins are homologous to the region from residues 33 to 240 of RecA which was named ''homologous core. The homologous core is likely to be responsible for functions common for all of them, such as the formation of helical nucleoprotein filament that is considered to be involved in homologous pairing in the recombination reaction. The mouse gene is transcribed at a high level in thymus, spleen, testis, and ovary, at lower level in brain and at a further lower level in some other tissues. It is transcribed efficiently in recombination active tissues. A clear functional difference of Rad51 homologues from RecA was suggested by the failure of heterologous genes to complement the deficiency of Scrad51 mutants. This failure seems to reflect the absence of a compatible partner, such as ScRad52 protein in the case of ScRad51 protein, between different species. Thus, these discoveries play a role of the starting point to understand the fundamental gene targeting in mammalian cells and in gene therapy. (J.P.N.)

  6. Discrepancy variation of dinucleotide microsatellite repeats in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    HUAN GAO

    2009-01-01

    Full Text Available To address whether there are differences of variation among repeat motif types and among taxonomic groups, we present here an analysis of variation and correlation of dinucleotide microsatellite repeats in eukaryotic genomes. Ten taxonomic groups were compared, those being primates, mammalia (excluding primates and rodentia, rodentia, birds, fish, amphibians and reptiles, insects, molluscs, plants and fungi, respectively. The data used in the analysis is from the literature published in the Journal of Molecular Ecology Notes. Analysis of variation reveals that there are no significant differences between AC and AG repeat motif types. Moreover, the number of alleles correlates positively with the copy number in both AG and AC repeats. Similar conclusions can be obtained from each taxonomic group. These results strongly suggest that the increase of SSR variation is almost linear with the increase of the copy number of each repeat motif. As well, the results suggest that the variability of SSR in the genomes of low-ranking species seem to be more than that of high-ranking species, excluding primates and fungi.

  7. [Structure and evolution of the eukaryotic FANCJ-like proteins].

    Science.gov (United States)

    Wuhe, Jike; Zefeng, Wu; Sanhong, Fan; Xuguang, Xi

    2015-02-01

    The FANCJ-like protein family is a class of ATP-dependent helicases that can catalytically unwind duplex DNA along the 5'-3' direction. It is involved in the processes of DNA damage repair, homologous recombination and G-quadruplex DNA unwinding, and plays a critical role in maintaining genome integrity. In this study, we systemically analyzed FNACJ-like proteins from 47 eukaryotic species and discussed their sequences diversity, origin and evolution, motif organization patterns and spatial structure differences. Four members of FNACJ-like proteins, including XPD, CHL1, RTEL1 and FANCJ, were found in eukaryotes, but some of them were seriously deficient in most fungi and some insects. For example, the Zygomycota fungi lost RTEL1, Basidiomycota and Ascomycota fungi lost RTEL1 and FANCJ, and Diptera insect lost FANCJ. FANCJ-like proteins contain canonical motor domains HD1 and HD2, and the HD1 domain further integrates with three unique domains Fe-S, Arch and Extra-D. Fe-S and Arch domains are relatively conservative in all members of the family, but the Extra-D domain is lost in XPD and differs from one another in rest members. There are 7, 10 and 2 specific motifs found from the three unique domains respectively, while 5 and 12 specific motifs are found from HD1 and HD2 domains except the conserved motifs reported previously. By analyzing the arrangement pattern of these specific motifs, we found that RTEL1 and FANCJ are more closer and share two specific motifs Vb2 and Vc in HD2 domain, which are likely related with their G-quadruplex DNA unwinding activity. The evidence of evolution showed that FACNJ-like proteins were originated from a helicase, which has a HD1 domain inserted by extra Fe-S domain and Arch domain. By three continuous gene duplication events and followed specialization, eukaryotes finally possessed the current four members of FANCJ-like proteins.

  8. Positive selection for unpreferred codon usage in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    Galagan James E

    2007-07-01

    Full Text Available Abstract Background Natural selection has traditionally been understood as a force responsible for pushing genes to states of higher translational efficiency, whereas lower translational efficiency has been explained by neutral mutation and genetic drift. We looked for evidence of directional selection resulting in increased unpreferred codon usage (and presumably reduced translational efficiency in three divergent clusters of eukaryotic genomes using a simple optimal-codon-based metric (Kp/Ku. Results Here we show that for some genes natural selection is indeed responsible for causing accelerated unpreferred codon substitution, and document the scope of this selection. In Cryptococcus and to a lesser extent Drosophila, we find many genes showing a statistically significant signal of selection for unpreferred codon usage in one or more lineages. We did not find evidence for this type of selection in Saccharomyces. The signal of positive selection observed from unpreferred synonymous codon substitutions is coincident in Cryptococcus and Drosophila with the distribution of upstream open reading frames (uORFs, another genic feature known to reduce translational efficiency. Functional enrichment analysis of genes exhibiting low Kp/Ku ratios reveals that genes in regulatory roles are particularly subject to this type of selection. Conclusion Through genome-wide scans, we find recent selection for unpreferred codon usage at approximately 1% of genetic loci in a Cryptococcus and several genes in Drosophila. Unpreferred codons can impede translation efficiency, and we find that genes with translation-impeding uORFs are enriched for this selection signal. We find that regulatory genes are particularly likely to be subject to selection for unpreferred codon usage. Given that expression noise can propagate through regulatory cascades, and that low translational efficiency can reduce expression noise, this finding supports the hypothesis that translational

  9. Meeting Report: Minutes from EMBO: Ten Years of Comparative Genomics of Eukaryotic Microorganisms

    Czech Academy of Sciences Publication Activity Database

    Lukeš, Julius; López-García, P.; Louis, E.; Boekhout, T.

    2016-01-01

    Roč. 167, č. 3 (2016), s. 217-221 ISSN 1434-4610 Institutional support: RVO:60077344 Keywords : protist * eukaryotic microorganisms * genomics Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 2.794, year: 2016

  10. Structure and Mechanism of a Eukaryotic FMN Adenylyltransferase

    OpenAIRE

    Huerta, Carlos; Borek, Dominika; Machius, Mischa; Grishin, Nick V.; Zhang, Hong

    2009-01-01

    Flavin mononucleotide adenylyltransferase (FMNAT) catalyzes the formation of the essential flavocoenzyme FAD and plays an important role in flavocoenzyme homeostasis regulation. By sequence comparison, bacterial and eukaryotic FMNAT enzymes belong to two different protein superfamilies and apparently utilize different set of active site residues to accomplish the same chemistry. Here we report the first structural characterization of a eukaryotic FMNAT from a pathogenic yeast Candida glabrata...

  11. Mapping and characterizing N6-methyladenine in eukaryotic genomes using single molecule real-time sequencing.

    Science.gov (United States)

    Zhu, Shijia; Beaulaurier, John; Deikus, Gintaras; Wu, Tao; Strahl, Maya; Hao, Ziyang; Luo, Guanzheng; Gregory, James A; Chess, Andrew; He, Chuan; Xiao, Andrew; Sebra, Robert; Schadt, Eric E; Fang, Gang

    2018-05-15

    N6-methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukaryotes, however, methods for high resolution mapping of m6dA events are still lacking. Single-molecule real-time (SMRT) sequencing has enabled the detection of m6dA events at single-nucleotide resolution in prokaryotic genomes, but its application to detecting m6dA in eukaryotic genomes has not been rigorously examined. Herein, we identified unique characteristics of eukaryotic m6dA methylomes that fundamentally differ from those of prokaryotes. Based on these differences, we describe the first approach for mapping m6dA events using SMRT sequencing specifically designed for the study of eukaryotic genomes, and provide appropriate strategies for designing experiments and carrying out sequencing in future studies. We apply the novel approach to study two eukaryotic genomes. For green algae, we construct the first complete genome-wide map of m6dA at single nucleotide and single molecule resolution. For human lymphoblastoid cells (hLCLs), joint analyses of SMRT sequencing and independent sequencing data suggest that putative m6dA events are enriched in the promoters of young, full length LINE-1 elements (L1s). These analyses demonstrate a general method for rigorous mapping and characterization of m6dA events in eukaryotic genomes. Published by Cold Spring Harbor Laboratory Press.

  12. Genome-wide analysis of eukaryote thaumatin-like proteins (TLPs with an emphasis on poplar

    Directory of Open Access Journals (Sweden)

    Duplessis Sébastien

    2011-02-01

    Full Text Available Abstract Background Plant inducible immunity includes the accumulation of a set of defense proteins during infection called pathogenesis-related (PR proteins, which are grouped into families termed PR-1 to PR-17. The PR-5 family is composed of thaumatin-like proteins (TLPs, which are responsive to biotic and abiotic stress and are widely studied in plants. TLPs were also recently discovered in fungi and animals. In the poplar genome, TLPs are over-represented compared with annual species and their transcripts strongly accumulate during stress conditions. Results Our analysis of the poplar TLP family suggests that the expansion of this gene family was followed by diversification, as differences in expression patterns and predicted properties correlate with phylogeny. In particular, we identified a clade of poplar TLPs that cluster to a single 350 kb locus of chromosome I and that are up-regulated by poplar leaf rust infection. A wider phylogenetic analysis of eukaryote TLPs - including plant, animal and fungi sequences - shows that TLP gene content and diversity increased markedly during land plant evolution. Mapping the reported functions of characterized TLPs to the eukaryote phylogenetic tree showed that antifungal or glycan-lytic properties are widespread across eukaryote phylogeny, suggesting that these properties are shared by most TLPs and are likely associated with the presence of a conserved acidic cleft in their 3D structure. Also, we established an exhaustive catalog of TLPs with atypical architectures such as small-TLPs, TLP-kinases and small-TLP-kinases, which have potentially developed alternative functions (such as putative receptor kinases for pathogen sensing and signaling. Conclusion Our study, based on the most recent plant genome sequences, provides evidence for TLP gene family diversification during land plant evolution. We have shown that the diverse functions described for TLPs are not restricted to specific clades but seem

  13. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  14. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    Science.gov (United States)

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  15. Roadmap for annotating transposable elements in eukaryote genomes.

    Science.gov (United States)

    Permal, Emmanuelle; Flutre, Timothée; Quesneville, Hadi

    2012-01-01

    Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.

  16. Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis

    DEFF Research Database (Denmark)

    Pedersen, Anders Gorm; Nielsen, Henrik

    1997-01-01

    Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role.This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known...

  17. Systematics of Short-range Correlations in Eukaryotic Genomes

    Science.gov (United States)

    Hameister, Jörn; Helm, Werner E.; Hütt, Marc-Thorsten; Dehnert, Manuel

    Attempts to identify a species on the basis of its DNA sequence on purely statistical grounds have been formulated for more than a decade. Solving this problem could have a huge impact on understanding processes of genome evolution and on the design of classification schemes for DNA sequences.

  18. MCM Paradox: Abundance of Eukaryotic Replicative Helicases and Genomic Integrity.

    Science.gov (United States)

    Das, Mitali; Singh, Sunita; Pradhan, Satyajit; Narayan, Gopeshwar

    2014-01-01

    As a crucial component of DNA replication licensing system, minichromosome maintenance (MCM) 2-7 complex acts as the eukaryotic DNA replicative helicase. The six related MCM proteins form a heterohexamer and bind with ORC, CDC6, and Cdt1 to form the prereplication complex. Although the MCMs are well known as replicative helicases, their overabundance and distribution patterns on chromatin present a paradox called the "MCM paradox." Several approaches had been taken to solve the MCM paradox and describe the purpose of excess MCMs distributed beyond the replication origins. Alternative functions of these MCMs rather than a helicase had also been proposed. This review focuses on several models and concepts generated to solve the MCM paradox coinciding with their helicase function and provides insight into the concept that excess MCMs are meant for licensing dormant origins as a backup during replication stress. Finally, we extend our view towards the effect of alteration of MCM level. Though an excess MCM constituent is needed for normal cells to withstand stress, there must be a delineation of the threshold level in normal and malignant cells. This review also outlooks the future prospects to better understand the MCM biology.

  19. The DNA-encoded nucleosome organization of a eukaryotic genome.

    Science.gov (United States)

    Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran

    2009-03-19

    Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.

  20. Insights into the red algae and eukaryotic evolution from the genome of Porphyra umbilicalis (Bangiophyceae, Rhodophyta).

    Science.gov (United States)

    Brawley, Susan H; Blouin, Nicolas A; Ficko-Blean, Elizabeth; Wheeler, Glen L; Lohr, Martin; Goodson, Holly V; Jenkins, Jerry W; Blaby-Haas, Crysten E; Helliwell, Katherine E; Chan, Cheong Xin; Marriage, Tara N; Bhattacharya, Debashish; Klein, Anita S; Badis, Yacine; Brodie, Juliet; Cao, Yuanyu; Collén, Jonas; Dittami, Simon M; Gachon, Claire M M; Green, Beverley R; Karpowicz, Steven J; Kim, Jay W; Kudahl, Ulrich Johan; Lin, Senjie; Michel, Gurvan; Mittag, Maria; Olson, Bradley J S C; Pangilinan, Jasmyn L; Peng, Yi; Qiu, Huan; Shu, Shengqiang; Singer, John T; Smith, Alison G; Sprecher, Brittany N; Wagner, Volker; Wang, Wenfei; Wang, Zhi-Yong; Yan, Juying; Yarish, Charles; Zäuner-Riek, Simone; Zhuang, Yunyun; Zou, Yong; Lindquist, Erika A; Grimwood, Jane; Barry, Kerrie W; Rokhsar, Daniel S; Schmutz, Jeremy; Stiller, John W; Grossman, Arthur R; Prochnik, Simon E

    2017-08-01

    Porphyra umbilicalis (laver) belongs to an ancient group of red algae (Bangiophyceae), is harvested for human food, and thrives in the harsh conditions of the upper intertidal zone. Here we present the 87.7-Mbp haploid Porphyra genome (65.8% G + C content, 13,125 gene loci) and elucidate traits that inform our understanding of the biology of red algae as one of the few multicellular eukaryotic lineages. Novel features of the Porphyra genome shared by other red algae relate to the cytoskeleton, calcium signaling, the cell cycle, and stress-tolerance mechanisms including photoprotection. Cytoskeletal motor proteins in Porphyra are restricted to a small set of kinesins that appear to be the only universal cytoskeletal motors within the red algae. Dynein motors are absent, and most red algae, including Porphyra , lack myosin. This surprisingly minimal cytoskeleton offers a potential explanation for why red algal cells and multicellular structures are more limited in size than in most multicellular lineages. Additional discoveries further relating to the stress tolerance of bangiophytes include ancestral enzymes for sulfation of the hydrophilic galactan-rich cell wall, evidence for mannan synthesis that originated before the divergence of green and red algae, and a high capacity for nutrient uptake. Our analyses provide a comprehensive understanding of the red algae, which are both commercially important and have played a major role in the evolution of other algal groups through secondary endosymbioses.

  1. Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe.

    Science.gov (United States)

    Marques, Catarina A; Dickens, Nicholas J; Paape, Daniel; Campbell, Samantha J; McCulloch, Richard

    2015-10-19

    DNA replication initiates on defined genome sites, termed origins. Origin usage appears to follow common rules in the eukaryotic organisms examined to date: all chromosomes are replicated from multiple origins, which display variations in firing efficiency and are selected from a larger pool of potential origins. To ask if these features of DNA replication are true of all eukaryotes, we describe genome-wide origin mapping in the parasite Leishmania. Origin mapping in Leishmania suggests a striking divergence in origin usage relative to characterized eukaryotes, since each chromosome appears to be replicated from a single origin. By comparing two species of Leishmania, we find evidence that such origin singularity is maintained in the face of chromosome fusion or fission events during evolution. Mapping Leishmania origins suggests that all origins fire with equal efficiency, and that the genomic sites occupied by origins differ from related non-origins sites. Finally, we provide evidence that origin location in Leishmania displays striking conservation with Trypanosoma brucei, despite the latter parasite replicating its chromosomes from multiple, variable strength origins. The demonstration of chromosome replication for a single origin in Leishmania, a microbial eukaryote, has implications for the evolution of origin multiplicity and associated controls, and may explain the pervasive aneuploidy that characterizes Leishmania chromosome architecture.

  2. The Persistent Contributions of RNA to Eukaryotic Gen(om)e Architecture and Cellular Function

    Science.gov (United States)

    Brosius, Jürgen

    2014-01-01

    Currently, the best scenario for earliest forms of life is based on RNA molecules as they have the proven ability to catalyze enzymatic reactions and harbor genetic information. Evolutionary principles valid today become apparent in such models already. Furthermore, many features of eukaryotic genome architecture might have their origins in an RNA or RNA/protein (RNP) world, including the onset of a further transition, when DNA replaced RNA as the genetic bookkeeper of the cell. Chromosome maintenance, splicing, and regulatory function via RNA may be deeply rooted in the RNA/RNP worlds. Mostly in eukaryotes, conversion from RNA to DNA is still ongoing, which greatly impacts the plasticity of extant genomes. Raw material for novel genes encoding protein or RNA, or parts of genes including regulatory elements that selection can act on, continues to enter the evolutionary lottery. PMID:25081515

  3. Yeast 2.0-connecting the dots in the construction of the world's first functional synthetic eukaryotic genome.

    Science.gov (United States)

    Pretorius, I S; Boeke, J D

    2018-06-01

    biosafety, bioethics and regulatory aspects of their pioneering work. This article presents a shared vision of constructing a synthetic eukaryotic genome in a safe model organism by using novel concepts and advanced technologies. This multidisciplinary and collaborative project is conducted under a sound governance structure that does not only respect the scientific achievements and lessons from the past, but that is also focussed on leading the present and helping to secure a brighter future for all.

  4. The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation

    Energy Technology Data Exchange (ETDEWEB)

    Blanc, Guillaume; Agarkova, Irina; Grimwood, Jane; Kuo, Alan; Brueggeman, Andrew; Dunigan, David D.; Gurnon, James; Ladunga, Istvan; Lindquist, Erika; Lucas, Susan; Pangilinan, Jasmyn; Proschold, Thomas; Salamov, Asaf; Schmutz, Jeremy; Weeks, Donald; Tamada, Takashi; Lomsadze, Alexandre; Borodovsky, Mark; Claverie, Jean-Michel; Grigoriev, Igor V.; Van Etten, James L.

    2012-02-13

    Background Little is known about the mechanisms of adaptation of life to the extreme environmental conditions encountered in polar regions. Here we present the genome sequence of a unicellular green alga from the division chlorophyta, Coccomyxa subellipsoidea C-169, which we will hereafter refer to as C-169. This is the first eukaryotic microorganism from a polar environment to have its genome sequenced. Results The 48.8 Mb genome contained in 20 chromosomes exhibits significant synteny conservation with the chromosomes of its relatives Chlorella variabilis and Chlamydomonas reinhardtii. The order of the genes is highly reshuffled within synteny blocks, suggesting that intra-chromosomal rearrangements were more prevalent than inter-chromosomal rearrangements. Remarkably, Zepp retrotransposons occur in clusters of nested elements with strictly one cluster per chromosome probably residing at the centromere. Several protein families overrepresented in C. subellipsoidae include proteins involved in lipid metabolism, transporters, cellulose synthases and short alcohol dehydrogenases. Conversely, C-169 lacks proteins that exist in all other sequenced chlorophytes, including components of the glycosyl phosphatidyl inositol anchoring system, pyruvate phosphate dikinase and the photosystem 1 reaction center subunit N (PsaN). Conclusions We suggest that some of these gene losses and gains could have contributed to adaptation to low temperatures. Comparison of these genomic features with the adaptive strategies of psychrophilic microbes suggests that prokaryotes and eukaryotes followed comparable evolutionary routes to adapt to cold environments.

  5. An Interactive Exercise To Learn Eukaryotic Cell Structure and Organelle Function.

    Science.gov (United States)

    Klionsky, Daniel J.; Tomashek, John J.

    1999-01-01

    Describes a cooperative, interactive problem-solving exercise for studying eukaryotic cell structure and function. Highlights the dynamic aspects of movement through the cell. Contains 15 references. (WRM)

  6. Structural similarities between prokaryotic and eukaryotic 5S ribosomal RNAs

    International Nuclear Information System (INIS)

    Welfle, H.; Boehm, S.; Damaschun, G.; Fabian, H.; Gast, K.; Misselwitz, R.; Mueller, J.J.; Zirwer, D.; Filimonov, V.V.; Venyaminov, S.Yu.; Zalkova, T.N.

    1986-01-01

    5S RNAs from rat liver and E. coli have been studied by diffuse X-ray and dynamic light scattering and by infrared and Raman spectroscopy. Identical structures at a resolution of 1 nm can be deduced from the comparison of the experimental X-ray scattering curves and electron distance distribution functions and from the agreement of the shape parameters. A flat shape model with a compact central region and two protruding arms was derived. Double helical stems are eleven-fold helices with a mean base pair distance of 0.28 nm. The number of base pairs (26 GC, 9 AU for E. coli; 27 GC, 9 AU for rat liver) and the degree of base stacking are the same within the experimental error. A very high regularity in the ribophosphate backbone is indicated for both 5S RNAs. The observed structural similarity and the consensus secondary structure pattern derived from comparative sequence analyses suggest the conclusion that prokaryotic and eukaryotic 5S RNAs are in general very similar with respect to their fundamental structural features. (author)

  7. Structure of the prolyl-tRNA synthetase from the eukaryotic pathogen Giardia lamblia

    Energy Technology Data Exchange (ETDEWEB)

    Larson, Eric T.; Kim, Jessica E.; Napuli, Alberto J.; Verlinde, Christophe L. M. J.; Fan, Erkang; Zucker, Frank H.; Van Voorhis, Wesley C.; Buckner, Frederick S.; Hol, Wim G. J.; Merritt, Ethan A., E-mail: merritt@u.washington.edu [Medical Structural Genomics of Pathogenic Protozoa, (United States); University of Washington, Seattle, WA 98195 (United States)

    2012-09-01

    The structure of Giardia prolyl-tRNA synthetase cocrystallized with proline and ATP shows evidence for half-of-the-sites activity, leading to a corresponding mixture of reaction substrates and product (prolyl-AMP) in the two active sites of the dimer. The genome of the human intestinal parasite Giardia lamblia contains only a single aminoacyl-tRNA synthetase gene for each amino acid. The Giardia prolyl-tRNA synthetase gene product was originally misidentified as a dual-specificity Pro/Cys enzyme, in part owing to its unexpectedly high off-target activation of cysteine, but is now believed to be a normal representative of the class of archaeal/eukaryotic prolyl-tRNA synthetases. The 2.2 Å resolution crystal structure of the G. lamblia enzyme presented here is thus the first structure determination of a prolyl-tRNA synthetase from a eukaryote. The relative occupancies of substrate (proline) and product (prolyl-AMP) in the active site are consistent with half-of-the-sites reactivity, as is the observed biphasic thermal denaturation curve for the protein in the presence of proline and MgATP. However, no corresponding induced asymmetry is evident in the structure of the protein. No thermal stabilization is observed in the presence of cysteine and ATP. The implied low affinity for the off-target activation product cysteinyl-AMP suggests that translational fidelity in Giardia is aided by the rapid release of misactivated cysteine.

  8. An HMM-based comparative genomic framework for detecting introgression in eukaryotes.

    Directory of Open Access Journals (Sweden)

    Kevin J Liu

    2014-06-01

    Full Text Available One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM-a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs to simultaneously capture the (potentially reticulate evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes. Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.

  9. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    Science.gov (United States)

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Leucine-Rich repeat receptor kinases are sporadically distributed in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    Diévart Anne

    2011-12-01

    Full Text Available Abstract Background Plant leucine-rich repeat receptor-like kinases (LRR-RLKs are receptor kinases that contain LRRs in their extracellular domain. In the last 15 years, many research groups have demonstrated major roles played by LRR-RLKs in plants during almost all developmental processes throughout the life of the plant and in defense/resistance against a large range of pathogens. Recently, a breakthrough has been made in this field that challenges the dogma of the specificity of plant LRR-RLKs. Results We analyzed ~1000 complete genomes and show that LRR-RK genes have now been identified in 8 non-plant genomes. We performed an exhaustive phylogenetic analysis of all of these receptors, revealing that all of the LRR-containing receptor subfamilies form lineage-specific clades. Our results suggest that the association of LRRs with RKs appeared independently at least four times in eukaryotic evolutionary history. Moreover, the molecular evolutionary history of the LRR-RKs found in oomycetes is reminiscent of the pattern observed in plants: expansion with amplification/deletion and evolution of the domain organization leading to the functional diversification of members of the gene family. Finally, the expression data suggest that oomycete LRR-RKs may play a role in several stages of the oomycete life cycle. Conclusions In view of the key roles that LRR-RLKs play throughout the entire lifetime of plants and plant-environment interactions, the emergence and expansion of this type of receptor in several phyla along the evolution of eukaryotes, and particularly in oomycete genomes, questions their intrinsic functions in mimicry and/or in the coevolution of receptors between hosts and pathogens.

  11. Structural and biomechanical basis of mitochondrial movement in eukaryotic cells

    Directory of Open Access Journals (Sweden)

    Wu M

    2013-10-01

    Full Text Available Min Wu,1 Aruna Kalyanasundaram,2 Jie Zhu1 1Laboratory of Biomechanics and Engineering, Institute of Biophysics, College of Science, Northwest A&F University, Yangling, Shaanxi, People's Republic of China; 2College of Pharmacology, University of Illinois at Chicago, Chicago, IL, USA Abstract: Mitochondria serve as energy-producing organelles in eukaryotic cells. In addition to providing the energy supply for cells, the mitochondria are also involved in other processes, such as proliferation, differentiation, information transfer, and apoptosis, and play an important role in regulation of cell growth and the cell cycle. In order to achieve these functions, the mitochondria need to move to the corresponding location. Therefore, mitochondrial movement has a crucial role in normal physiologic activity, and any mitochondrial movement disorder will cause irreparable damage to the organism. For example, recent studies have shown that abnormal movement of the mitochondria is likely to be the reason for Charcot–Marie–Tooth disease, amyotrophic lateral sclerosis, Alzheimer's disease, Huntington's disease, Parkinson's disease, and schizophrenia. So, in the cell, especially in the particular polarized cell, the appropriate distribution of mitochondria is crucial to the function and survival of the cell. Mitochondrial movement is mainly associated with the cytoskeleton and related proteins. However, those components play different roles according to cell type. In this paper, we summarize the structural basis of mitochondrial movement, including microtubules, actin filaments, motor proteins, and adaptin, and review studies of the biomechanical mechanisms of mitochondrial movement in different types of cells. Keywords: mitochondrial movement, microtubules, actin filaments, motor proteins, adaptin

  12. Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

    Science.gov (United States)

    Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

    2014-01-01

    A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.

  13. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    Science.gov (United States)

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  14. An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage

    Directory of Open Access Journals (Sweden)

    Stuart Gary W

    2004-12-01

    Full Text Available Abstract Background Eukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putative orthologs in order to reduce the effective size of the dataset and provide a reasonably high but unknown fraction of correctly aligned homologous sites for comparison. As an alternative, highly efficient methods that do not require the pre-alignment of operationally defined orthologs are also being explored. Results A non-alignment method based on the Singular Value Decomposition (SVD was used to compare the predicted protein complement of nine whole eukaryotic genomes ranging from yeast to man. This analysis resulted in the simultaneous identification and definition of a large number of well conserved motifs and gene families, and produced a species tree supporting one of two conflicting hypotheses of metazoan relationships. Conclusions Our SVD-based analysis of the entire protein complement of nine whole eukaryotic genomes suggests that highly conserved motifs and gene families can be identified and effectively compared in a single coherent definition space for the easy extraction of gene and species trees. While this occurs without the explicit definition of orthologs or homologous sites, the analysis can provide a basis for these definitions.

  15. Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes.

    Science.gov (United States)

    Feschotte, Cédric; Keswani, Umeshkumar; Ranganathan, Nirmal; Guibotsy, Marcel L; Levine, David

    2009-07-23

    Eukaryotic genomes contain large amount of repetitive DNA, most of which is derived from transposable elements (TEs). Progress has been made to develop computational tools for ab initio identification of repeat families, but there is an urgent need to develop tools to automate the annotation of TEs in genome sequences. Here we introduce REPCLASS, a tool that automates the classification of TE sequences. Using control repeat libraries, we show that the program can classify accurately virtually any known TE types. Combining REPCLASS to ab initio repeat finding in the genomes of Caenorhabditis elegans and Drosophila melanogaster allowed us to recover the contrasting TE landscape characteristic of these species. Unexpectedly, REPCLASS also uncovered several novel TE families in both genomes, augmenting the TE repertoire of these model species. When applied to the genomes of distant Caenorhabditis and Drosophila species, the approach revealed a remarkable conservation of TE composition profile within each genus, despite substantial interspecific covariations in genome size and in the number of TEs and TE families. Lastly, we applied REPCLASS to analyze 10 fungal genomes from a wide taxonomic range, most of which have not been analyzed for TE content previously. The results showed that TE diversity varies widely across the fungi "kingdom" and appears to positively correlate with genome size, in particular for DNA transposons. Together, these data validate REPCLASS as a powerful tool to explore the repetitive DNA landscapes of eukaryotes and to shed light onto the evolutionary forces shaping TE diversity and genome architecture.

  16. EuGI: a novel resource for studying genomic islands to facilitate horizontal gene transfer detection in eukaryotes.

    Science.gov (United States)

    Clasen, Frederick Johannes; Pierneef, Rian Ewald; Slippers, Bernard; Reva, Oleg

    2018-05-03

    Genomic islands (GIs) are inserts of foreign DNA that have potentially arisen through horizontal gene transfer (HGT). There are evidences that GIs can contribute significantly to the evolution of prokaryotes. The acquisition of GIs through HGT in eukaryotes has, however, been largely unexplored. In this study, the previously developed GI prediction tool, SeqWord Gene Island Sniffer (SWGIS), is modified to predict GIs in eukaryotic chromosomes. Artificial simulations are used to estimate ratios of predicting false positive and false negative GIs by inserting GIs into different test chromosomes and performing the SWGIS v2.0 algorithm. Using SWGIS v2.0, GIs are then identified in 36 fungal, 22 protozoan and 8 invertebrate genomes. SWGIS v2.0 predicts GIs in large eukaryotic chromosomes based on the atypical nucleotide composition of these regions. Averages for predicting false negative and false positive GIs were 20.1% and 11.01% respectively. A total of 10,550 GIs were identified in 66 eukaryotic species with 5299 of these GIs coding for at least one functional protein. The EuGI web-resource, freely accessible at http://eugi.bi.up.ac.za , was developed that allows browsing the database created from identified GIs and genes within GIs through an interactive and visual interface. SWGIS v2.0 along with the EuGI database, which houses GIs identified in 66 different eukaryotic species, and the EuGI web-resource, provide the first comprehensive resource for studying HGT in eukaryotes.

  17. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

    Science.gov (United States)

    Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

    2016-04-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.

  18. Archaeal Genome Guardians Give Insights into Eukaryotic DNA Replication and Damage Response Proteins

    Directory of Open Access Journals (Sweden)

    David S. Shin

    2014-01-01

    Full Text Available As the third domain of life, archaea, like the eukarya and bacteria, must have robust DNA replication and repair complexes to ensure genome fidelity. Archaea moreover display a breadth of unique habitats and characteristics, and structural biologists increasingly appreciate these features. As archaea include extremophiles that can withstand diverse environmental stresses, they provide fundamental systems for understanding enzymes and pathways critical to genome integrity and stress responses. Such archaeal extremophiles provide critical data on the periodic table for life as well as on the biochemical, geochemical, and physical limitations to adaptive strategies allowing organisms to thrive under environmental stress relevant to determining the boundaries for life as we know it. Specifically, archaeal enzyme structures have informed the architecture and mechanisms of key DNA repair proteins and complexes. With added abilities to temperature-trap flexible complexes and reveal core domains of transient and dynamic complexes, these structures provide insights into mechanisms of maintaining genome integrity despite extreme environmental stress. The DNA damage response protein structures noted in this review therefore inform the basis for genome integrity in the face of environmental stress, with implications for all domains of life as well as for biomanufacturing, astrobiology, and medicine.

  19. Dormant origins as a built-in safeguard in eukaryotic DNA replication against genome instability and disease development.

    Science.gov (United States)

    Shima, Naoko; Pederson, Kayla D

    2017-08-01

    DNA replication is a prerequisite for cell proliferation, yet it can be increasingly challenging for a eukaryotic cell to faithfully duplicate its genome as its size and complexity expands. Dormant origins now emerge as a key component for cells to successfully accomplish such a demanding but essential task. In this perspective, we will first provide an overview of the fundamental processes eukaryotic cells have developed to regulate origin licensing and firing. With a special focus on mammalian systems, we will then highlight the role of dormant origins in preventing replication-associated genome instability and their functional interplay with proteins involved in the DNA damage repair response for tumor suppression. Lastly, deficiencies in the origin licensing machinery will be discussed in relation to their influence on stem cell maintenance and human diseases. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Genome-wide Purification of Extrachromosomal Circular DNA from Eukaryotic Cells

    DEFF Research Database (Denmark)

    Møller, Henrik D.; Bojsen, Rasmus Kenneth; Tachibana, Chris

    2016-01-01

    Extrachromosomal circular DNAs (eccDNAs) are common genetic elements in Saccharomyces cerevisiae and are reported in other eukaryotes as well. EccDNAs contribute to genetic variation among somatic cells in multicellular organisms and to evolution of unicellular eukaryotes. Sensitive methods for d...

  1. Genome-wide Purification of Extrachromosomal Circular DNA from Eukaryotic Cells

    DEFF Research Database (Denmark)

    Møller, Henrik D.; Bojsen, Rasmus Kenneth; Tachibana, Chris

    2016-01-01

    Extrachromosomal circular DNAs (eccDNAs) are common genetic elements in Saccharomyces cerevisiae and are reported in other eukaryotes as well. EccDNAs contribute to genetic variation among somatic cells in multicellular organisms and to evolution of unicellular eukaryotes. Sensitive methods...

  2. Hybrid and rogue kinases encoded in the genomes of model eukaryotes.

    Directory of Open Access Journals (Sweden)

    Ramaswamy Rakshambikai

    Full Text Available The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes-S.cerevisiae, D. melanogaster, C.elegans, M.musculus, T.rubripes and H.sapiens-and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P.falciparum sheds light on possible strategies in host-pathogen interactions.

  3. Initiation of translation in bacteria by a structured eukaryotic IRES RNA.

    Science.gov (United States)

    Colussi, Timothy M; Costantino, David A; Zhu, Jianyu; Donohue, John Paul; Korostelev, Andrei A; Jaafar, Zane A; Plank, Terra-Dawn M; Noller, Harry F; Kieft, Jeffrey S

    2015-03-05

    The central dogma of gene expression (DNA to RNA to protein) is universal, but in different domains of life there are fundamental mechanistic differences within this pathway. For example, the canonical molecular signals used to initiate protein synthesis in bacteria and eukaryotes are mutually exclusive. However, the core structures and conformational dynamics of ribosomes that are responsible for the translation steps that take place after initiation are ancient and conserved across the domains of life. We wanted to explore whether an undiscovered RNA-based signal might be able to use these conserved features, bypassing mechanisms specific to each domain of life, and initiate protein synthesis in both bacteria and eukaryotes. Although structured internal ribosome entry site (IRES) RNAs can manipulate ribosomes to initiate translation in eukaryotic cells, an analogous RNA structure-based mechanism has not been observed in bacteria. Here we report our discovery that a eukaryotic viral IRES can initiate translation in live bacteria. We solved the crystal structure of this IRES bound to a bacterial ribosome to 3.8 Å resolution, revealing that despite differences between bacterial and eukaryotic ribosomes this IRES binds directly to both and occupies the space normally used by transfer RNAs. Initiation in both bacteria and eukaryotes depends on the structure of the IRES RNA, but in bacteria this RNA uses a different mechanism that includes a form of ribosome repositioning after initial recruitment. This IRES RNA bridges billions of years of evolutionary divergence and provides an example of an RNA structure-based translation initiation signal capable of operating in two domains of life.

  4. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

    Directory of Open Access Journals (Sweden)

    Jonathan A Eisen

    2006-09-01

    Full Text Available The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC, which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases, using diverse resources (e.g., proteases and transporters, and generating structural complexity (e.g., kinesins and dyneins. In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates, no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from

  5. Transcription factor IID in the Archaea: sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein of eukaryotes

    Science.gov (United States)

    Marsh, T. L.; Reich, C. I.; Whitelock, R. B.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1994-01-01

    The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.

  6. Comparative and functional genomics of Legionella identified eukaryotic like proteins as key players in host-pathogen interactions

    Directory of Open Access Journals (Sweden)

    Laura eGomez-Valero

    2011-10-01

    Full Text Available Although best known for its ability to cause severe pneumonia in people whose immune defenses are weakened, Legionella pneumophila and Legionella longbeachae are two species of a large genus of bacteria that are ubiquitous in nature, where they parasitize protozoa. Adaptation to the host environment and exploitation of host cell functions are critical for the success of these intracellular pathogens. The establishment and publication of the complete genome sequences of L. pneumophila and L. longbeachae isolates paved the way for major breakthroughs in understanding the biology of these organisms. In this review we present the knowledge gained from the analyses and comparison of the complete genome sequences of different L. pneumophila and L. longbeachae strains. Emphasis is given on putative virulence and Legionella life cycle related functions, such as the identification of an extended array of eukaryotic-like proteins, many of which have been shown to modulate host cell functions to the pathogen's advantage. Surprisingly, many of the eukaryotic domain proteins identified in L. pneumophila as well as many substrates of the Dot/Icm type IV secretion system essential for intracellular replication are different between these two species, although they cause the same disease. Finally, evolutionary aspects regarding the eukaryotic like proteins in Legionella are discussed.

  7. Eukaryotic ribonucleases P/MRP: the crystal structure of the P3 domain.

    Science.gov (United States)

    Perederina, Anna; Esakova, Olga; Quan, Chao; Khanova, Elena; Krasilnikov, Andrey S

    2010-02-17

    Ribonuclease (RNase) P is a site-specific endoribonuclease found in all kingdoms of life. Typical RNase P consists of a catalytic RNA component and a protein moiety. In the eukaryotes, the RNase P lineage has split into two, giving rise to a closely related enzyme, RNase MRP, which has similar components but has evolved to have different specificities. The eukaryotic RNases P/MRP have acquired an essential helix-loop-helix protein-binding RNA domain P3 that has an important function in eukaryotic enzymes and distinguishes them from bacterial and archaeal RNases P. Here, we present a crystal structure of the P3 RNA domain from Saccharomyces cerevisiae RNase MRP in a complex with RNase P/MRP proteins Pop6 and Pop7 solved to 2.7 A. The structure suggests similar structural organization of the P3 RNA domains in RNases P/MRP and possible functions of the P3 domains and proteins bound to them in the stabilization of the holoenzymes' structures as well as in interactions with substrates. It provides the first insight into the structural organization of the eukaryotic enzymes of the RNase P/MRP family.

  8. Crystal structures of two eukaryotic nucleases involved in RNA metabolism

    DEFF Research Database (Denmark)

    Jonstrup, Anette Thyssen; Midtgaard, Søren Fuglsang; Van, Lan Bich

    RNA serves a number of functions in the cell: mRNAs are the carriers of information between gene and protein, tRNAs and rRNAs are involved in the synthesis of proteins, whereas a number of additional RNA species are responsible for other functions in the cell. The quality of the different RNAs...... RNAs. We have solved the structures of two nucleases involved in 3'-5' degradation of RNA; the S. pombe Pop2p and the S. cerevisiae Rrp6p. Pop2p is part of the main cytoplasmatic deadenylation complex in yeast, which also contains the nuclease Ccr4p. Deadenylation, where the poly(A)-tail is removed...... specific transcripts. Here, we present the crystal structure of the S. pombe Pop2p protein to 1.4 Å resolution. The high resolution structure provides a clear picture of the active site architecture. Structural alignment of single nucleotides and poly(A)-oligonucleotides from earlier co-crystal structures...

  9. Maximum likelihood phylogenetic reconstruction from high-resolution whole-genome data and a tree of 68 eukaryotes.

    Science.gov (United States)

    Lin, Yu; Hu, Fei; Tang, Jijun; Moret, Bernard M E

    2013-01-01

    The rapid accumulation of whole-genome data has renewed interest in the study of the evolution of genomic architecture, under such events as rearrangements, duplications, losses. Comparative genomics, evolutionary biology, and cancer research all require tools to elucidate the mechanisms, history, and consequences of those evolutionary events, while phylogenetics could use whole-genome data to enhance its picture of the Tree of Life. Current approaches in the area of phylogenetic analysis are limited to very small collections of closely related genomes using low-resolution data (typically a few hundred syntenic blocks); moreover, these approaches typically do not include duplication and loss events. We describe a maximum likelihood (ML) approach for phylogenetic analysis that takes into account genome rearrangements as well as duplications, insertions, and losses. Our approach can handle high-resolution genomes (with 40,000 or more markers) and can use in the same analysis genomes with very different numbers of markers. Because our approach uses a standard ML reconstruction program (RAxML), it scales up to large trees. We present the results of extensive testing on both simulated and real data showing that our approach returns very accurate results very quickly. In particular, we analyze a dataset of 68 high-resolution eukaryotic genomes, with from 3,000 to 42,000 genes, from the eGOB database; the analysis, including bootstrapping, takes just 3 hours on a desktop system and returns a tree in agreement with all well supported branches, while also suggesting resolutions for some disputed placements.

  10. Genome-wide computational identification of microRNAs and their targets in the deep-branching eukaryote Giardia lamblia.

    Science.gov (United States)

    Zhang, Yan-Qiong; Chen, Dong-Liang; Tian, Hai-Feng; Zhang, Bao-Hong; Wen, Jian-Fan

    2009-10-01

    Using a combined computational program, we identified 50 potential microRNAs (miRNAs) in Giardia lamblia, one of the most primitive unicellular eukaryotes. These miRNAs are unique to G. lamblia and no homologues have been found in other organisms; miRNAs, currently known in other species, were not found in G. lamblia. This suggests that miRNA biogenesis and miRNA-mediated gene regulation pathway may evolve independently, especially in evolutionarily distant lineages. A majority (43) of the predicted miRNAs are located at one single locus; however, some miRNAs have two or more copies in the genome. Among the 58 miRNA genes, 28 are located in the intergenic regions whereas 30 are present in the anti-sense strands of the protein-coding sequences. Five predicted miRNAs are expressed in G. lamblia trophozoite cells evidenced by expressed sequence tags or RT-PCR. Thirty-seven identified miRNAs may target 50 protein-coding genes, including seven variant-specific surface proteins (VSPs). Our findings provide a clue that miRNA-mediated gene regulation may exist in the early stage of eukaryotic evolution, suggesting that it is an important regulation system ubiquitous in eukaryotes.

  11. Three-dimensional structural analysis of eukaryotic flagella/cilia by electron cryo-tomography

    International Nuclear Information System (INIS)

    Bui, Khanh Huy; Pigino, Gaia; Ishikawa, Takashi

    2011-01-01

    Based on the molecular architecture revealed by electron cryo-tomography, the mechanism of the bending motion of eukaryotic flagella/cilia is discussed. Electron cryo-tomography is a potential approach to analyzing the three-dimensional conformation of frozen hydrated biological macromolecules using electron microscopy. Since projections of each individual object illuminated from different orientations are merged, electron tomography is capable of structural analysis of such heterogeneous environments as in vivo or with polymorphism, although radiation damage and the missing wedge are severe problems. Here, recent results on the structure of eukaryotic flagella, which is an ATP-driven bending organelle, from green algae Chlamydomonas are presented. Tomographic analysis reveals asymmetric molecular arrangements, especially that of the dynein motor proteins, in flagella, giving insight into the mechanism of planar asymmetric bending motion. Methodological challenges to obtaining higher-resolution structures from this technique are also discussed

  12. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Camps 2.0: exploring the sequence and structure space of prokaryotic, eukaryotic, and viral membrane proteins.

    Science.gov (United States)

    Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij

    2012-03-01

    Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.

  14. An alternative method for cDNA cloning from surrogate eukaryotic cells transfected with the corresponding genomic DNA.

    Science.gov (United States)

    Hu, Lin-Yong; Cui, Chen-Chen; Song, Yu-Jie; Wang, Xiang-Guo; Jin, Ya-Ping; Wang, Ai-Hua; Zhang, Yong

    2012-07-01

    cDNA is widely used in gene function elucidation and/or transgenics research but often suitable tissues or cells from which to isolate mRNA for reverse transcription are unavailable. Here, an alternative method for cDNA cloning is described and tested by cloning the cDNA of human LALBA (human alpha-lactalbumin) from genomic DNA. First, genomic DNA containing all of the coding exons was cloned from human peripheral blood and inserted into a eukaryotic expression vector. Next, by delivering the plasmids into either 293T or fibroblast cells, surrogate cells were constructed. Finally, the total RNA was extracted from the surrogate cells and cDNA was obtained by RT-PCR. The human LALBA cDNA that was obtained was compared with the corresponding mRNA published in GenBank. The comparison showed that the two sequences were identical. The novel method for cDNA cloning from surrogate eukaryotic cells described here uses well-established techniques that are feasible and simple to use. We anticipate that this alternative method will have widespread applications.

  15. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  16. Informational laws of genome structures

    Science.gov (United States)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  17. Intermittency as a universal characteristic of the complete chromosome DNA sequences of eukaryotes: From protozoa to human genomes

    Science.gov (United States)

    Rybalko, S.; Larionov, S.; Poptsova, M.; Loskutov, A.

    2011-10-01

    Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are considered. Using the proposed deterministic models with intermittency and symbolic dynamics we describe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental duplications, tandem repeats, and other complex sequence structures. It is shown that the recently discovered gene number balance on the strands is not of a random nature, and certain subsystems of a complete chromosome DNA sequence exhibit the properties of deterministic chaos.

  18. New insights into the structural organization of eukaryotic and prokaryotic cytoskeletons using cryo-electron tomography

    International Nuclear Information System (INIS)

    Kuerner, Julia; Medalia, Ohad; Linaroudis, Alexandros A.; Baumeister, Wolfgang

    2004-01-01

    Cryo-electron tomography (cryo-ET) is an emerging imaging technology that combines the potential of three-dimensional (3-D) imaging at molecular resolution (<5 nm) with a close-to-life preservation of the specimen. In conjunction with pattern recognition techniques, it enables us to map the molecular landscape inside cells. The application of cryo-ET to intact cells provides novel insights into the structure and the spatial organization of the cytoskeleton in prokaryotic and eukaryotic cells

  19. Structural genomics in endocrinology

    NARCIS (Netherlands)

    Smit, J. W.; Romijn, J. A.

    2001-01-01

    Traditionally, endocrine research evolved from the phenotypical characterisation of endocrine disorders to the identification of underlying molecular pathophysiology. This approach has been, and still is, extremely successful. The introduction of genomics and proteomics has resulted in a reversal of

  20. Solution structure of an archaeal DNA binding protein with an eukaryotic zinc finger fold.

    Directory of Open Access Journals (Sweden)

    Florence Guillière

    Full Text Available While the basal transcription machinery in archaea is eukaryal-like, transcription factors in archaea and their viruses are usually related to bacterial transcription factors. Nevertheless, some of these organisms show predicted classical zinc fingers motifs of the C2H2 type, which are almost exclusively found in proteins of eukaryotes and most often associated with transcription regulators. In this work, we focused on the protein AFV1p06 from the hyperthermophilic archaeal virus AFV1. The sequence of the protein consists of the classical eukaryotic C2H2 motif with the fourth histidine coordinating zinc missing, as well as of N- and C-terminal extensions. We showed that the protein AFV1p06 binds zinc and solved its solution structure by NMR. AFV1p06 displays a zinc finger fold with a novel structure extension and disordered N- and C-termini. Structure calculations show that a glutamic acid residue that coordinates zinc replaces the fourth histidine of the C2H2 motif. Electromobility gel shift assays indicate that the protein binds to DNA with different affinities depending on the DNA sequence. AFV1p06 is the first experimentally characterised archaeal zinc finger protein with a DNA binding activity. The AFV1p06 protein family has homologues in diverse viruses of hyperthermophilic archaea. A phylogenetic analysis points out a common origin of archaeal and eukaryotic C2H2 zinc fingers.

  1. Structural studies demonstrating a bacteriophage-like replication cycle of the eukaryote-infecting Paramecium bursaria chlorella virus-1.

    Directory of Open Access Journals (Sweden)

    Elad Milrot

    2017-08-01

    Full Text Available A fundamental stage in viral infection is the internalization of viral genomes in host cells. Although extensively studied, the mechanisms and factors responsible for the genome internalization process remain poorly understood. Here we report our observations, derived from diverse imaging methods on genome internalization of the large dsDNA Paramecium bursaria chlorella virus-1 (PBCV-1. Our studies reveal that early infection stages of this eukaryotic-infecting virus occurs by a bacteriophage-like pathway, whereby PBCV-1 generates a hole in the host cell wall and ejects its dsDNA genome in a linear, base-pair-by-base-pair process, through a membrane tunnel generated by the fusion of the virus internal membrane with the host membrane. Furthermore, our results imply that PBCV-1 DNA condensation that occurs shortly after infection probably plays a role in genome internalization, as hypothesized for the infection of some bacteriophages. The subsequent perforation of the host photosynthetic membranes presumably enables trafficking of viral genomes towards host nuclei. Previous studies established that at late infection stages PBCV-1 generates cytoplasmic organelles, termed viral factories, where viral assembly takes place, a feature characteristic of many large dsDNA viruses that infect eukaryotic organisms. PBCV-1 thus appears to combine a bacteriophage-like mechanism during early infection stages with a eukaryotic-like infection pathway in its late replication cycle.

  2. Functional 5' UTR mRNA structures in eukaryotic translation regulation and how to find them.

    Science.gov (United States)

    Leppek, Kathrin; Das, Rhiju; Barna, Maria

    2018-03-01

    RNA molecules can fold into intricate shapes that can provide an additional layer of control of gene expression beyond that of their sequence. In this Review, we discuss the current mechanistic understanding of structures in 5' untranslated regions (UTRs) of eukaryotic mRNAs and the emerging methodologies used to explore them. These structures may regulate cap-dependent translation initiation through helicase-mediated remodelling of RNA structures and higher-order RNA interactions, as well as cap-independent translation initiation through internal ribosome entry sites (IRESs), mRNA modifications and other specialized translation pathways. We discuss known 5' UTR RNA structures and how new structure probing technologies coupled with prospective validation, particularly compensatory mutagenesis, are likely to identify classes of structured RNA elements that shape post-transcriptional control of gene expression and the development of multicellular organisms.

  3. Structural basis for the initiation of eukaryotic transcription-coupled DNA repair.

    Science.gov (United States)

    Xu, Jun; Lahiri, Indrajit; Wang, Wei; Wier, Adam; Cianfrocco, Michael A; Chong, Jenny; Hare, Alissa A; Dervan, Peter B; DiMaio, Frank; Leschziner, Andres E; Wang, Dong

    2017-11-30

    Eukaryotic transcription-coupled repair (TCR) is an important and well-conserved sub-pathway of nucleotide excision repair that preferentially removes DNA lesions from the template strand that block translocation of RNA polymerase II (Pol II). Cockayne syndrome group B (CSB, also known as ERCC6) protein in humans (or its yeast orthologues, Rad26 in Saccharomyces cerevisiae and Rhp26 in Schizosaccharomyces pombe) is among the first proteins to be recruited to the lesion-arrested Pol II during the initiation of eukaryotic TCR. Mutations in CSB are associated with the autosomal-recessive neurological disorder Cockayne syndrome, which is characterized by progeriod features, growth failure and photosensitivity. The molecular mechanism of eukaryotic TCR initiation remains unclear, with several long-standing unanswered questions. How cells distinguish DNA lesion-arrested Pol II from other forms of arrested Pol II, the role of CSB in TCR initiation, and how CSB interacts with the arrested Pol II complex are all unknown. The lack of structures of CSB or the Pol II-CSB complex has hindered our ability to address these questions. Here we report the structure of the S. cerevisiae Pol II-Rad26 complex solved by cryo-electron microscopy. The structure reveals that Rad26 binds to the DNA upstream of Pol II, where it markedly alters its path. Our structural and functional data suggest that the conserved Swi2/Snf2-family core ATPase domain promotes the forward movement of Pol II, and elucidate key roles for Rad26 in both TCR and transcription elongation.

  4. Comparative Genomic Analysis Reveals a Diverse Repertoire of Genes Involved in Prokaryote-Eukaryote Interactions within the Pseudovibrio Genus.

    Science.gov (United States)

    Romano, Stefano; Fernàndez-Guerra, Antonio; Reen, F Jerry; Glöckner, Frank O; Crowley, Susan P; O'Sullivan, Orla; Cotter, Paul D; Adams, Claire; Dobson, Alan D W; O'Gara, Fergal

    2016-01-01

    Strains of the Pseudovibrio genus have been detected worldwide, mainly as part of bacterial communities associated with marine invertebrates, particularly sponges. This recurrent association has been considered as an indication of a symbiotic relationship between these microbes and their host. Until recently, the availability of only two genomes, belonging to closely related strains, has limited the knowledge on the genomic and physiological features of the genus to a single phylogenetic lineage. Here we present 10 newly sequenced genomes of Pseudovibrio strains isolated from marine sponges from the west coast of Ireland, and including the other two publicly available genomes we performed an extensive comparative genomic analysis. Homogeneity was apparent in terms of both the orthologous genes and the metabolic features shared amongst the 12 strains. At the genomic level, a key physiological difference observed amongst the isolates was the presence only in strain P. axinellae AD2 of genes encoding proteins involved in assimilatory nitrate reduction, which was then proved experimentally. We then focused on studying those systems known to be involved in the interactions with eukaryotic and prokaryotic cells. This analysis revealed that the genus harbors a large diversity of toxin-like proteins, secretion systems and their potential effectors. Their distribution in the genus was not always consistent with the phylogenetic relationship of the strains. Finally, our analyses identified new genomic islands encoding potential toxin-immunity systems, previously unknown in the genus. Our analyses shed new light on the Pseudovibrio genus, indicating a large diversity of both metabolic features and systems for interacting with the host. The diversity in both distribution and abundance of these systems amongst the strains underlines how metabolically and phylogenetically similar bacteria may use different strategies to interact with the host and find a niche within its

  5. Dispersed repetitive sequences in eukaryotic genomes and their possible biological significance

    International Nuclear Information System (INIS)

    Georgiev, G.P.; Kramerov, D.A.; Ryskov, A.P.; Skryabin, K.G.; Lukanidin, E.M.

    1983-01-01

    In this paper is described the properties of a novel mouse mdg-like element, the A2 sequence, which is the most abundant repetitive sequence. We also characterized an ubiquitous B2 sequence that represents, after B1, the dominant family among the short interspersed repeats of the mouse genome. The existence of some putative transposition intermediates was shown for repeats of both A and B types of the mouse genome. These are closed circular DNA of the A type and small polyadenylated B + RNAs. The fundamental question that arises is whether these sequences are simply selfish DNA capable of transpositions or do they fulfill some useful biological functions within the genome. 66 references, 11 figures, 1 table

  6. Comparative genomic analysis of multi-subunit tethering complexes demonstrates an ancient pan-eukaryotic complement and sculpting in Apicomplexa.

    Directory of Open Access Journals (Sweden)

    Christen M Klinger

    Full Text Available Apicomplexa are obligate intracellular parasites that cause tremendous disease burden world-wide. They utilize a set of specialized secretory organelles in their invasive process that require delivery of components for their biogenesis and function, yet the precise mechanisms underpinning such processes remain unclear. One set of potentially important components is the multi-subunit tethering complexes (MTCs, factors increasingly implicated in all aspects of vesicle-target interactions. Prompted by the results of previous studies indicating a loss of membrane trafficking factors in Apicomplexa, we undertook a bioinformatic analysis of MTC conservation. Building on knowledge of the ancient presence of most MTC proteins, we demonstrate the near complete retention of MTCs in the newly available genomes for Guillardiatheta and Bigelowiellanatans. The latter is a key taxonomic sampling point as a basal sister taxa to the group including Apicomplexa. We also demonstrate an ancient origin of the CORVET complex subunits Vps8 and Vps3, as well as the TRAPPII subunit Tca17. Having established that the lineage leading to Apicomplexa did at one point possess the complete eukaryotic complement of MTC components, we undertook a deeper taxonomic investigation in twelve apicomplexan genomes. We observed excellent conservation of the VpsC core of the HOPS and CORVET complexes, as well as the core TRAPP subunits, but sparse conservation of TRAPPII, COG, Dsl1, and HOPS/CORVET-specific subunits. However, those subunits that we did identify appear to be expressed with similar patterns to the fully conserved MTC proteins, suggesting that they may function as minimal complexes or with analogous partners. Strikingly, we failed to identify any subunits of the exocyst complex in all twelve apicomplexan genomes, as well as the dinoflagellate Perkinsus marinus. Overall, we demonstrate reduction of MTCs in Apicomplexa and their ancestors, consistent with modification during

  7. The candidate phylum Poribacteria by single-cell genomics: new insights into phylogeny, cell-compartmentation, eukaryote-like repeat proteins, and other genomic features.

    Directory of Open Access Journals (Sweden)

    Janine Kamke

    Full Text Available The candidate phylum Poribacteria is one of the most dominant and widespread members of the microbial communities residing within marine sponges. Cell compartmentalization had been postulated along with their discovery about a decade ago and their phylogenetic association to the Planctomycetes, Verrucomicrobia, Chlamydiae superphylum was proposed soon thereafter. In the present study we revised these features based on genomic data obtained from six poribacterial single cells. We propose that Poribacteria form a distinct monophyletic phylum contiguous to the PVC superphylum together with other candidate phyla. Our genomic analyses supported the possibility of cell compartmentalization in form of bacterial microcompartments. Further analyses of eukaryote-like protein domains stressed the importance of such proteins with features including tetratricopeptide repeats, leucin rich repeats as well as low density lipoproteins receptor repeats, the latter of which are reported here for the first time from a sponge symbiont. Finally, examining the most abundant protein domain family on poribacterial genomes revealed diverse phyH family proteins, some of which may be related to dissolved organic posphorus uptake.

  8. Structure of Prokaryotic Polyamine Deacetylase Reveals Evolutionary Functional Relationships with Eukaryotic Histone Deacetylases

    Energy Technology Data Exchange (ETDEWEB)

    P Lombardi; H Angell; D Whittington; E Flynn; K Rajashankar; D Christianson

    2011-12-31

    Polyamines are a ubiquitous class of polycationic small molecules that can influence gene expression by binding to nucleic acids. Reversible polyamine acetylation regulates nucleic acid binding and is required for normal cell cycle progression and proliferation. Here, we report the structures of Mycoplana ramosa acetylpolyamine amidohydrolase (APAH) complexed with a transition state analogue and a hydroxamate inhibitor and an inactive mutant complexed with two acetylpolyamine substrates. The structure of APAH is the first of a histone deacetylase-like oligomer and reveals that an 18-residue insert in the L2 loop promotes dimerization and the formation of an 18 {angstrom} long 'L'-shaped active site tunnel at the dimer interface, accessible only to narrow and flexible substrates. The importance of dimerization for polyamine deacetylase function leads to the suggestion that a comparable dimeric or double-domain histone deacetylase could catalyze polyamine deacetylation reactions in eukaryotes.

  9. Genome-wide analysis of the phosphoinositide kinome from two ciliates reveals novel evolutionary links for phosphoinositide kinases in eukaryotic cells.

    Directory of Open Access Journals (Sweden)

    George Leondaritis

    Full Text Available BACKGROUND: The complexity of phosphoinositide signaling in higher eukaryotes is partly due to expansion of specific families and types of phosphoinositide kinases (PIKs that can generate all phosphoinositides via multiple routes. This is particularly evident in the PI3Ks and PIPKs, and it is considered an evolutionary trait associated with metazoan diversification. Yet, there are limited comprehensive studies on the PIK repertoire of free living unicellular organisms. METHODOLOGY/PRINCIPAL FINDINGS: We undertook a genome-wide analysis of putative PIK genes in two free living ciliated cells, Tetrahymena and Paramecium. The Tetrahymena thermophila and Paramecium tetraurelia genomes were probed with representative kinases from all families and types. Putative homologs were verified by EST, microarray and deep RNA sequencing database searches and further characterized for domain structure, catalytic efficiency, expression patterns and phylogenetic relationships. In total, we identified and characterized 22 genes in the Tetrahymena thermophila genome and 62 highly homologues genes in Paramecium tetraurelia suggesting a tight evolutionary conservation in the ciliate lineage. Comparison to the kinome of fungi reveals a significant expansion of PIK genes in ciliates. CONCLUSIONS/SIGNIFICANCE: Our study highlights four important aspects concerning ciliate and other unicellular PIKs. First, ciliate-specific expansion of PI4KIII-like genes. Second, presence of class I PI3Ks which, at least in Tetrahymena, are associated with a metazoan-type machinery for PIP3 signaling. Third, expansion of divergent PIPK enzymes such as the recently described type IV transmembrane PIPKs. Fourth, presence of possible type II PIPKs and presumably inactive PIKs (hence, pseudo-PIKs not previously described. Taken together, our results provide a solid framework for future investigation of the roles of PIKs in ciliates and indicate that novel functions and novel regulatory

  10. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

    Science.gov (United States)

    Merchant, Nirav

    2016-01-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957

  11. Structure of a Eukaryotic CLC Transporter Defines an Intermediate State in the Transport Cycle

    International Nuclear Information System (INIS)

    Feng, Liang; Campbell, Ernest B.; Hsiung, Yichun; MacKinnon, Roderick

    2010-01-01

    CLC proteins transport chloride (Cl - ) ions across cell membranes to control the electrical potential of muscle cells, transfer electrolytes across epithelia, and control the pH and electrolyte composition of intracellular organelles. Some members of this protein family are Cl - ion channels, whereas others are secondary active transporters that exchange Cl - ions and protons (H + ) with a 2:1 stoichiometry. We have determined the structure of a eukaryotic CLC transporter at 3.5 angstrom resolution. Cytoplasmic cystathionine beta-synthase (CBS) domains are strategically positioned to regulate the ion-transport pathway, and many disease-causing mutations in human CLCs reside on the CBS-transmembrane interface. Comparison with prokaryotic CLC shows that a gating glutamate residue changes conformation and suggests a basis for 2:1 Cl - /H + exchange and a simple mechanistic connection between CLC channels and transporters.

  12. Structure of a eukaryotic CLC transporter defines an intermediate state in the transport cycle

    Science.gov (United States)

    Feng, Liang; Campbell, Ernest B.; Hsiung, Yichun; MacKinnon, Roderick

    2011-01-01

    CLC proteins transport Cl− ions across cell membranes to control the electrical potential of muscle cells, transfer electrolytes across epithelia, and control the pH and electrolyte composition of intracellular organelles. Some members of this protein family are Cl− ion channels, while others are secondary active transporters that exchange Cl− ions and H+ with a 2:1 stoichiometry. We have determined the structure of a eukaryotic CLC transporter at 3.5 Å resolution. Cytoplasmic CBS domains are strategically positioned to regulate the ion transport pathway, and many disease-causing mutations in human CLCs reside on the CBS-transmembrane interface. Comparison with prokaryotic CLC shows that a gating glutamate changes conformation and suggests a basis for 2:1 Cl−/H+ exchange and a simple mechanistic connection between CLC channels and transporters. PMID:20929736

  13. Structural-Functional Organization of the Eukaryotic Cell Nucleus and Transcription Regulation: Introduction to This Special Issue of Biochemistry (Moscow).

    Science.gov (United States)

    Razin, S V

    2018-04-01

    This issue of Biochemistry (Moscow) is devoted to the cell nucleus and mechanisms of transcription regulation. Over the years, biochemical processes in the cell nucleus have been studied in isolation, outside the context of their spatial organization. Now it is clear that segregation of functional processes within a compartmentalized cell nucleus is very important for the implementation of basic genetic processes. The functional compartmentalization of the cell nucleus is closely related to the spatial organization of the genome, which in turn plays a key role in the operation of epigenetic mechanisms. In this issue of Biochemistry (Moscow), we present a selection of review articles covering the functional architecture of the eukaryotic cell nucleus, the mechanisms of genome folding, the role of stochastic processes in establishing 3D architecture of the genome, and the impact of genome spatial organization on transcription regulation.

  14. Functional Insights from Structural Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Forouhar,F.; Kuzin, A.; Seetharaman, J.; Lee, I.; Zhou, W.; Abashidze, M.; Chen, Y.; Montelione, G.; Tong, L.; et al

    2007-01-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).

  15. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  16. 2004 Structural, Function and Evolutionary Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  17. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation.

    Science.gov (United States)

    Audit, Benjamin; Zaghloul, Lamia; Baker, Antoine; Arneodo, Alain; Chen, Chun-Long; d'Aubenton-Carafa, Yves; Thermes, Claude

    2013-01-01

    In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.

  18. Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters

    Directory of Open Access Journals (Sweden)

    Gagniuc Paul

    2012-09-01

    Full Text Available Abstract Background The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species? Results In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from Arabidopsis thaliana, Drosophila melanogaster, Homo sapiens and Oryza sativa, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on

  19. Prokaryotic and eukaryotic features observed on the secondary structures of Giardia SSU rRNAs and its phylogenetic implications.

    Science.gov (United States)

    Hwang, Ui Wook

    2007-04-01

    Phylogenetic position of a diplomonad protist Giardia, a principle cause of diarrhea, among eukaryotes has been vigorously debated so far. Through the comparisons of primary and secondary structures of SSU rRNAs of G. intestinalis, G. microti, G. ardeae, and G. muris, I found two major indel regions (a 6-nt indel and a 22-26-nt indel), which correspond to the helix 10 of the V2 region and helices E23-8 to E23-9 of the V4 region, respectively. As generally shown in eukaryotes, G. intestinalis and G. microti have commonly a relatively longer helix 10 (a 7-bp stem and a 4-nt loop), and also the eukaryote-specific helices E23-6 to E23-9. On the other hand, G. muris and G. ardeae have a shorter helix 10: a 2-bp stem and a 6-nt loop in G. ardeae and a 3-bp stem and a 6-nt loop in G. muris. In the V4, they have a single long helix (like the P23-1 helix in prokaryotes) instead of the helices E23-6 to E23-9. Among the four Giardia species, co-appearance of prokaryote- and eukaryote-typical features might be significant evidence to suggest that Giardia (Archezoa) is a living fossil showing an "intermediate stage" during the evolution from prokaryotes to eukaryotes.

  20. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Science.gov (United States)

    Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-04-08

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for

  1. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Directory of Open Access Journals (Sweden)

    Minou Nowrousian

    2010-04-01

    Full Text Available Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data

  2. Structural and dynamic characterization of eukaryotic gene regulatory protein domains in solution

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Andrew Loyd [Univ. of California, Berkeley, CA (United States). Dept. of Chemistry

    1996-05-01

    Solution NMR was primarily used to characterize structure and dynamics in two different eukaryotic protein systems: the δ-Al-ε activation domain from c-jun and the Drosophila RNA-binding protein Sex-lethal. The second system is the Drosophila Sex-lethal (Sxl) protein, an RNA-binding protein which is the ``master switch`` in sex determination. Sxl contains two adjacent RNA-binding domains (RBDs) of the RNP consensus-type. The NMR spectrum of the second RBD (Sxl-RBD2) was assigned using multidimensional heteronuclear NMR, and an intermediate-resolution family of structures was calculated from primarily NOE distance restraints. The overall fold was determined to be similar to other RBDs: a βαβ-βαβ pattern of secondary structure, with the two helices packed against a 4-stranded anti-parallel β-sheet. In addition 15N T1, T2, and 15N/1H NOE relaxation measurements were carried out to characterize the backbone dynamics of Sxl-RBD2 in solution. RNA corresponding to the polypyrimidine tract of transformer pre-mRNA was generated and titrated into 3 different Sxl-RBD protein constructs. Combining Sxl-RBD1+2 (bht RBDs) with this RNA formed a specific, high affinity protein/RNA complex that is amenable to further NMR characterization. The backbone 1H, 13C, and 15N resonances of Sxl-RBD1+2 were assigned using a triple-resonance approach, and 15N relaxation experiments were carried out to characterize the backbone dynamics of this complex. The changes in chemical shift in Sxl-RBD1+2 upon binding RNA are observed using Sxl-RBD2 as a substitute for unbound Sxl-RBD1+2. This allowed the binding interface to be qualitatively mapped for the second domain.

  3. The ARTT motif and a unified structural understanding of substraterecognition in ADP ribosylating bacterial toxins and eukaryotic ADPribosyltransferases

    Energy Technology Data Exchange (ETDEWEB)

    Han, S.; Tainer, J.A.

    2001-08-01

    ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core has been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within the ARTT

  4. Origin and evolution of the self-organizing cytoskeleton in the network of eukaryotic organelles.

    Science.gov (United States)

    Jékely, Gáspár

    2014-09-02

    The eukaryotic cytoskeleton evolved from prokaryotic cytomotive filaments. Prokaryotic filament systems show bewildering structural and dynamic complexity and, in many aspects, prefigure the self-organizing properties of the eukaryotic cytoskeleton. Here, the dynamic properties of the prokaryotic and eukaryotic cytoskeleton are compared, and how these relate to function and evolution of organellar networks is discussed. The evolution of new aspects of filament dynamics in eukaryotes, including severing and branching, and the advent of molecular motors converted the eukaryotic cytoskeleton into a self-organizing "active gel," the dynamics of which can only be described with computational models. Advances in modeling and comparative genomics hold promise of a better understanding of the evolution of the self-organizing cytoskeleton in early eukaryotes, and its role in the evolution of novel eukaryotic functions, such as amoeboid motility, mitosis, and ciliary swimming. Copyright © 2014 Cold Spring Harbor Laboratory Press; all rights reserved.

  5. Genomics technologies to study structural variations in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Cardone Maria Francesca

    2016-01-01

    Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.

  6. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    International Nuclear Information System (INIS)

    Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

    2011-01-01

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  7. Functional 5′ UTR mRNA structures in eukaryotic translation regulation and how to find them

    Science.gov (United States)

    Leppek, Kathrin; Das, Rhiju; Barna, Maria

    2017-01-01

    RNA molecules can fold into intricate shapes that can provide an additional layer of control of gene expression beyond that of their sequence. In this Review, we discuss the current mechanistic understanding of structures in 5′ untranslated regions (UTRs) of eukaryotic mRNAs and the emerging methodologies used to explore them. These structures may regulate cap-dependent translation initiation through helicase-mediated remodelling of RNA structures and higher-order RNA interactions, as well as cap-independent translation initiation through internal ribosome entry sites (IRESs), mRNA modifications and other specialized translation pathways. We discuss known 5′ UTR RNA structures and how new structure probing technologies coupled with prospective validation, particularly compensatory mutagenesis, are likely to identify classes of structured RNA elements that shape post-transcriptional control of gene expression and the development of multicellular organisms. PMID:29165424

  8. Genome-wide analyses and functional classification of proline repeat-rich proteins: potential role of eIF5A in eukaryotic evolution.

    Directory of Open Access Journals (Sweden)

    Ajeet Mandal

    Full Text Available The eukaryotic translation factor, eIF5A has been recently reported as a sequence-specific elongation factor that facilitates peptide bond formation at consecutive prolines in Saccharomyces cerevisiae, as its ortholog elongation factor P (EF-P does in bacteria. We have searched the genome databases of 35 representative organisms from six kingdoms of life for PPP (Pro-Pro-Pro and/or PPG (Pro-Pro-Gly-encoding genes whose expression is expected to depend on eIF5A. We have made detailed analyses of proteome data of 5 selected species, Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, Mus musculus and Homo sapiens. The PPP and PPG motifs are low in the prokaryotic proteomes. However, their frequencies markedly increase with the biological complexity of eukaryotic organisms, and are higher in newly derived proteins than in those orthologous proteins commonly shared in all species. Ontology classifications of S. cerevisiae and human genes encoding the highest level of polyprolines reveal their strong association with several specific biological processes, including actin/cytoskeletal associated functions, RNA splicing/turnover, DNA binding/transcription and cell signaling. Previously reported phenotypic defects in actin polarity and mRNA decay of eIF5A mutant strains are consistent with the proposed role for eIF5A in the translation of the polyproline-containing proteins. Of all the amino acid tandem repeats (≥3 amino acids, only the proline repeat frequency correlates with functional complexity of the five organisms examined. Taken together, these findings suggest the importance of proline repeat-rich proteins and a potential role for eIF5A and its hypusine modification pathway in the course of eukaryotic evolution.

  9. Archaeal MCM Proteins as an Analog for the Eukaryotic Mcm2–7 Helicase to Reveal Essential Features of Structure and Function

    Science.gov (United States)

    Miller, Justin M.; Enemark, Eric J.

    2015-01-01

    In eukaryotes, the replicative helicase is the large multisubunit CMG complex consisting of the Mcm2–7 hexameric ring, Cdc45, and the tetrameric GINS complex. The Mcm2–7 ring assembles from six different, related proteins and forms the core of this complex. In archaea, a homologous MCM hexameric ring functions as the replicative helicase at the replication fork. Archaeal MCM proteins form thermostable homohexamers, facilitating their use as models of the eukaryotic Mcm2–7 helicase. Here we review archaeal MCM helicase structure and function and how the archaeal findings relate to the eukaryotic Mcm2–7 ring. PMID:26539061

  10. SINEs, evolution and genome structure in the opossum.

    Science.gov (United States)

    Gu, Wanjun; Ray, David A; Walker, Jerilyn A; Barnes, Erin W; Gentles, Andrew J; Samollow, Paul B; Jurka, Jerzy; Batzer, Mark A; Pollock, David D

    2007-07-01

    Short INterspersed Elements (SINEs) are non-autonomous retrotransposons, usually between 100 and 500 base pairs (bp) in length, which are ubiquitous components of eukaryotic genomes. Their activity, distribution, and evolution can be highly informative on genomic structure and evolutionary processes. To determine recent activity, we amplified more than one hundred SINE1 loci in a panel of 43 M. domestica individuals derived from five diverse geographic locations. The SINE1 family has expanded recently enough that many loci were polymorphic, and the SINE1 insertion-based genetic distances among populations reflected geographic distance. Genome-wide comparisons of SINE1 densities and GC content revealed that high SINE1 density is associated with high GC content in a few long and many short spans. Young SINE1s, whether fixed or polymorphic, showed an unbiased GC content preference for insertion, indicating that the GC preference accumulates over long time periods, possibly in periodic bursts. SINE1 evolution is thus broadly similar to human Alu evolution, although it has an independent origin. High GC content adjacent to SINE1s is strongly correlated with bias towards higher AT to GC substitutions and lower GC to AT substitutions. This is consistent with biased gene conversion, and also indicates that like chickens, but unlike eutherian mammals, GC content heterogeneity (isochore structure) is reinforced by substitution processes in the M. domestica genome. Nevertheless, both high and low GC content regions are apparently headed towards lower GC content equilibria, possibly due to a relative shift to lower recombination rates in the recent Monodelphis ancestral lineage. Like eutherians, metatherian (marsupial) mammals have evolved high CpG substitution rates, but this is apparently a convergence in process rather than a shared ancestral state.

  11. Genome of Phaeocystis globosa virus PgV-16T highlights the common ancestry of the largest known DNA viruses infecting eukaryotes

    Science.gov (United States)

    Santini, Sebastien; Jeudy, Sandra; Bartoli, Julia; Poirot, Olivier; Lescot, Magali; Abergel, Chantal; Barbe, Valérie; Wommack, K. Eric; Noordeloos, Anna A. M.; Brussaard, Corina P. D.; Claverie, Jean-Michel

    2013-01-01

    Large dsDNA viruses are involved in the population control of many globally distributed species of eukaryotic phytoplankton and have a prominent role in bloom termination. The genus Phaeocystis (Haptophyta, Prymnesiophyceae) includes several high-biomass-forming phytoplankton species, such as Phaeocystis globosa, the blooms of which occur mostly in the coastal zone of the North Atlantic and the North Sea. Here, we report the 459,984-bp-long genome sequence of P. globosa virus strain PgV-16T, encoding 434 proteins and eight tRNAs and, thus, the largest fully sequenced genome to date among viruses infecting algae. Surprisingly, PgV-16T exhibits no phylogenetic affinity with other viruses infecting microalgae (e.g., phycodnaviruses), including those infecting Emiliania huxleyi, another ubiquitous bloom-forming haptophyte. Rather, PgV-16T belongs to an emerging clade (the Megaviridae) clustering the viruses endowed with the largest known genomes, including Megavirus, Mimivirus (both infecting acanthamoeba), and a virus infecting the marine microflagellate grazer Cafeteria roenbergensis. Seventy-five percent of the best matches of PgV-16T–predicted proteins correspond to two viruses [Organic Lake phycodnavirus (OLPV)1 and OLPV2] from a hypersaline lake in Antarctica (Organic Lake), the hosts of which are unknown. As for OLPVs and other Megaviridae, the PgV-16T sequence data revealed the presence of a virophage-like genome. However, no virophage particle was detected in infected P. globosa cultures. The presence of many genes found only in Megaviridae in its genome and the presence of an associated virophage strongly suggest that PgV-16T shares a common ancestry with the largest known dsDNA viruses, the host range of which already encompasses the earliest diverging branches of domain Eukarya. PMID:23754393

  12. Origins and evolution of viruses of eukaryotes: The ultimate modularity

    International Nuclear Information System (INIS)

    Koonin, Eugene V.; Dolja, Valerian V.; Krupovic, Mart

    2015-01-01

    Viruses and other selfish genetic elements are dominant entities in the biosphere, with respect to both physical abundance and genetic diversity. Various selfish elements parasitize on all cellular life forms. The relative abundances of different classes of viruses are dramatically different between prokaryotes and eukaryotes. In prokaryotes, the great majority of viruses possess double-stranded (ds) DNA genomes, with a substantial minority of single-stranded (ss) DNA viruses and only limited presence of RNA viruses. In contrast, in eukaryotes, RNA viruses account for the majority of the virome diversity although ssDNA and dsDNA viruses are common as well. Phylogenomic analysis yields tangible clues for the origins of major classes of eukaryotic viruses and in particular their likely roots in prokaryotes. Specifically, the ancestral genome of positive-strand RNA viruses of eukaryotes might have been assembled de novo from genes derived from prokaryotic retroelements and bacteria although a primordial origin of this class of viruses cannot be ruled out. Different groups of double-stranded RNA viruses derive either from dsRNA bacteriophages or from positive-strand RNA viruses. The eukaryotic ssDNA viruses apparently evolved via a fusion of genes from prokaryotic rolling circle-replicating plasmids and positive-strand RNA viruses. Different families of eukaryotic dsDNA viruses appear to have originated from specific groups of bacteriophages on at least two independent occasions. Polintons, the largest known eukaryotic transposons, predicted to also form virus particles, most likely, were the evolutionary intermediates between bacterial tectiviruses and several groups of eukaryotic dsDNA viruses including the proposed order “Megavirales” that unites diverse families of large and giant viruses. Strikingly, evolution of all classes of eukaryotic viruses appears to have involved fusion between structural and replicative gene modules derived from different sources

  13. Origins and evolution of viruses of eukaryotes: The ultimate modularity

    Energy Technology Data Exchange (ETDEWEB)

    Koonin, Eugene V., E-mail: koonin@ncbi.nlm.nih.gov [National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 (United States); Dolja, Valerian V., E-mail: doljav@science.oregonstate.edu [Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331 (United States); Krupovic, Mart, E-mail: krupovic@pasteur.fr [Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Department of Microbiology, Paris 75015 (France)

    2015-05-15

    Viruses and other selfish genetic elements are dominant entities in the biosphere, with respect to both physical abundance and genetic diversity. Various selfish elements parasitize on all cellular life forms. The relative abundances of different classes of viruses are dramatically different between prokaryotes and eukaryotes. In prokaryotes, the great majority of viruses possess double-stranded (ds) DNA genomes, with a substantial minority of single-stranded (ss) DNA viruses and only limited presence of RNA viruses. In contrast, in eukaryotes, RNA viruses account for the majority of the virome diversity although ssDNA and dsDNA viruses are common as well. Phylogenomic analysis yields tangible clues for the origins of major classes of eukaryotic viruses and in particular their likely roots in prokaryotes. Specifically, the ancestral genome of positive-strand RNA viruses of eukaryotes might have been assembled de novo from genes derived from prokaryotic retroelements and bacteria although a primordial origin of this class of viruses cannot be ruled out. Different groups of double-stranded RNA viruses derive either from dsRNA bacteriophages or from positive-strand RNA viruses. The eukaryotic ssDNA viruses apparently evolved via a fusion of genes from prokaryotic rolling circle-replicating plasmids and positive-strand RNA viruses. Different families of eukaryotic dsDNA viruses appear to have originated from specific groups of bacteriophages on at least two independent occasions. Polintons, the largest known eukaryotic transposons, predicted to also form virus particles, most likely, were the evolutionary intermediates between bacterial tectiviruses and several groups of eukaryotic dsDNA viruses including the proposed order “Megavirales” that unites diverse families of large and giant viruses. Strikingly, evolution of all classes of eukaryotic viruses appears to have involved fusion between structural and replicative gene modules derived from different sources

  14. Current Perspectives of Telomerase Structure and Function in Eukaryotes with Emerging Views on Telomerase in Human Parasites.

    Science.gov (United States)

    Dey, Abhishek; Chakrabarti, Kausik

    2018-01-24

    Replicative capacity of a cell is strongly correlated with telomere length regulation. Aberrant lengthening or reduction in the length of telomeres can lead to health anomalies, such as cancer or premature aging. Telomerase is a master regulator for maintaining replicative potential in most eukaryotic cells. It does so by controlling telomere length at chromosome ends. Akin to cancer cells, most single-cell eukaryotic pathogens are highly proliferative and require persistent telomerase activity to maintain constant length of telomere and propagation within their host. Although telomerase is key to unlimited cellular proliferation in both cases, not much was known about the role of telomerase in human parasites (malaria, Trypanosoma , etc.) until recently. Since telomerase regulation is mediated via its own structural components, interactions with catalytic reverse transcriptase and several factors that can recruit and assemble telomerase to telomeres in a cell cycle-dependent manner, we compare and discuss here recent findings in telomerase biology in cancer, aging and parasitic diseases to give a broader perspective of telomerase function in human diseases.

  15. Consistent mutational paths predict eukaryotic thermostability

    Directory of Open Access Journals (Sweden)

    van Noort Vera

    2013-01-01

    Full Text Available Abstract Background Proteomes of thermophilic prokaryotes have been instrumental in structural biology and successfully exploited in biotechnology, however many proteins required for eukaryotic cell function are absent from bacteria or archaea. With Chaetomium thermophilum, Thielavia terrestris and Thielavia heterothallica three genome sequences of thermophilic eukaryotes have been published. Results Studying the genomes and proteomes of these thermophilic fungi, we found common strategies of thermal adaptation across the different kingdoms of Life, including amino acid biases and a reduced genome size. A phylogenetics-guided comparison of thermophilic proteomes with those of other, mesophilic Sordariomycetes revealed consistent amino acid substitutions associated to thermophily that were also present in an independent lineage of thermophilic fungi. The most consistent pattern is the substitution of lysine by arginine, which we could find in almost all lineages but has not been extensively used in protein stability engineering. By exploiting mutational paths towards the thermophiles, we could predict particular amino acid residues in individual proteins that contribute to thermostability and validated some of them experimentally. By determining the three-dimensional structure of an exemplar protein from C. thermophilum (Arx1, we could also characterise the molecular consequences of some of these mutations. Conclusions The comparative analysis of these three genomes not only enhances our understanding of the evolution of thermophily, but also provides new ways to engineer protein stability.

  16. Universal internucleotide statistics in full genomes: a footprint of the DNA structure and packaging?

    Directory of Open Access Journals (Sweden)

    Mikhail I Bogachev

    Full Text Available Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same [Formula: see text]-exponential form. While in prokaryotes a single [Formula: see text]-exponential function makes the best fit, in eukaryotes the PDF contains additionally a second [Formula: see text]-exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first [Formula: see text]-exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second [Formula: see text]-exponential is a specific marker of the large-scale eukaryotic DNA organization.

  17. Structural genomic variation in ischemic stroke

    Science.gov (United States)

    Matarin, Mar; Simon-Sanchez, Javier; Fung, Hon-Chung; Scholz, Sonja; Gibbs, J. Raphael; Hernandez, Dena G.; Crews, Cynthia; Britton, Angela; Wavrant De Vrieze, Fabienne; Brott, Thomas G.; Brown, Robert D.; Worrall, Bradford B.; Silliman, Scott; Case, L. Douglas; Hardy, John A.; Rich, Stephen S.; Meschia, James F.; Singleton, Andrew B.

    2008-01-01

    Technological advances in molecular genetics allow rapid and sensitive identification of genomic copy number variants (CNVs). This, in turn, has sparked interest in the function such variation may play in disease. While a role for copy number mutations as a cause of Mendelian disorders is well established, it is unclear whether CNVs may affect risk for common complex disorders. We sought to investigate whether CNVs may modulate risk for ischemic stroke (IS) and to provide a catalog of CNVs in patients with this disorder by analyzing copy number metrics produced as a part of our previous genome-wide single-nucleotide polymorphism (SNP)-based association study of ischemic stroke in a North American white population. We examined CNVs in 263 patients with ischemic stroke (IS). Each identified CNV was compared with changes identified in 275 neurologically normal controls. Our analysis identified 247 CNVs, corresponding to 187 insertions (76%; 135 heterozygous; 25 homozygous duplications or triplications; 2 heterosomic) and 60 deletions (24%; 40 heterozygous deletions;3 homozygous deletions; 14 heterosomic deletions). Most alterations (81%) were the same as, or overlapped with, previously reported CNVs. We report here the first genome-wide analysis of CNVs in IS patients. In summary, our study did not detect any common genomic structural variation unequivocally linked to IS, although we cannot exclude that smaller CNVs or CNVs in genomic regions poorly covered by this methodology may confer risk for IS. The application of genome-wide SNP arrays now facilitates the evaluation of structural changes through the entire genome as part of a genome-wide genetic association study. PMID:18288507

  18. Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  19. Higher order structure in the 3'-minor domain of small subunit ribosomal RNAs from a gram negative bacterium, a gram positive bacterium and a eukaryote

    DEFF Research Database (Denmark)

    Douthwaite, S; Christensen, A; Garrett, R A

    1983-01-01

    of additional higher order structure in the renatured free RNA. It can be concluded that a high level of conservation of higher order structure has occurred during the evolution of the gram negative and gram positive eubacteria and the eukaryote in both the double helical regions and the "unstructured" regions...

  20. Quantitation of base substitutions in eukaryotic 5S rRNA: selection for the maintenance of RNA secondary structure.

    Science.gov (United States)

    Curtiss, W C; Vournakis, J N

    1984-01-01

    Eukaryotic 5S rRNA sequences from 34 diverse species were compared by the following method: (1) The sequences were aligned; (2) the positions of substitutions were located by comparison of all possible pairs of sequences; (3) the substitution sites were mapped to an assumed general base pairing model; and (4) the R-Y model of base stacking was used to study stacking pattern relationships in the structure. An analysis of the sequence and structure variability in each region of the molecule is presented. It was found that the degree of base substitution varies over a wide range, from absolute conservation to occurrence of over 90% of the possible observable substitutions. The substitutions are located primarily in stem regions of the 5S rRNA secondary structure. More than 88% of the substitutions in helical regions maintain base pairing. The disruptive substitutions are primarily located at the edges of helical regions, resulting in shortening of the helical regions and lengthening of the adjacent nonpaired regions. Base stacking patterns determined by the R-Y model are mapped onto the general secondary structure. Intrastrand and interstrand stacking could stabilize alternative coaxial structures and limit the conformational flexibility of nonpaired regions. Two short contiguous regions are 100% conserved in all species. This may reflect evolutionary constraints imposed at the DNA level by the requirement for binding of a 5S gene transcription initiation factor during gene expression.

  1. The Jigsaw Puzzle of mRNA Translation Initiation in Eukaryotes: A Decade of Structures Unraveling the Mechanics of the Process.

    Science.gov (United States)

    Hashem, Yaser; Frank, Joachim

    2018-03-01

    Translation initiation in eukaryotes is a highly regulated and rate-limiting process. It results in the assembly and disassembly of numerous transient and intermediate complexes involving over a dozen eukaryotic initiation factors (eIFs). This process culminates in the accommodation of a start codon marking the beginning of an open reading frame at the appropriate ribosomal site. Although this process has been extensively studied by hundreds of groups for nearly half a century, it has been only recently, especially during the last decade, that we have gained deeper insight into the mechanics of the eukaryotic translation initiation process. This advance in knowledge is due in part to the contributions of structural biology, which have shed light on the molecular mechanics underlying the different functions of various eukaryotic initiation factors. In this review, we focus exclusively on the contribution of structural biology to the understanding of the eukaryotic initiation process, a long-standing jigsaw puzzle that is just starting to yield the bigger picture. Expected final online publication date for the Annual Review of Biophysics Volume 47 is May 20, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

  2. Using Genomics for Natural Product Structure Elucidation.

    Science.gov (United States)

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  3. Structural analyses of Legionella LepB reveal a new GAP fold that catalytically mimics eukaryotic RasGAP.

    Science.gov (United States)

    Yu, Qin; Hu, Liyan; Yao, Qing; Zhu, Yongqun; Dong, Na; Wang, Da-Cheng; Shao, Feng

    2013-06-01

    Rab GTPases are emerging targets of diverse bacterial pathogens. Here, we perform biochemical and structural analyses of LepB, a Rab GTPase-activating protein (GAP) effector from Legionella pneumophila. We map LepB GAP domain to residues 313-618 and show that the GAP domain is Rab1 specific with a catalytic activity higher than the canonical eukaryotic TBC GAP and the newly identified VirA/EspG family of bacterial RabGAP effectors. Exhaustive mutation analyses identify Arg444 as the arginine finger, but no catalytically essential glutamine residues. Crystal structures of LepB313-618 alone and the GAP domain of Legionella drancourtii LepB in complex with Rab1-GDP-AlF3 support the catalytic role of Arg444, and also further reveal a 3D architecture and a GTPase-binding mode distinct from all known GAPs. Glu449, structurally equivalent to TBC RabGAP glutamine finger in apo-LepB, undergoes a drastic movement upon Rab1 binding, which induces Rab1 Gln70 side-chain flipping towards GDP-AlF3 through a strong ionic interaction. This conformationally rearranged Gln70 acts as the catalytic cis-glutamine, therefore uncovering an unexpected RasGAP-like catalytic mechanism for LepB. Our studies highlight an extraordinary structural and catalytic diversity of RabGAPs, particularly those from bacterial pathogens.

  4. Interrogating the druggable genome with structural informatics.

    Science.gov (United States)

    Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A

    2006-08-01

    Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

  5. 3D structure of eukaryotic flagella/cilia by cryo-electron tomography.

    Science.gov (United States)

    Ishikawa, Takashi

    2013-01-01

    Flagella/cilia are motile organelles with more than 400 proteins. To understand the mechanism of such complex systems, we need methods to describe molecular arrange-ments and conformations three-dimensionally in vivo. Cryo-electron tomography enabled us such a 3D structural analysis. Our group has been working on 3D structure of flagella/cilia using this method and revealed highly ordered and beautifully organized molecular arrangement. 3D structure gave us insights into the mechanism to gener-ate bending motion with well defined waveforms. In this review, I summarize our recent structural studies on fla-gella/cilia by cryo-electron tomography, mainly focusing on dynein microtubule-based ATPase motor proteins and the radial spoke, a regulatory protein complex.

  6. Crystal Structure of a Eukaryotic GEN1 Resolving Enzyme Bound to DNA

    Directory of Open Access Journals (Sweden)

    Yijin Liu

    2015-12-01

    Full Text Available We present the crystal structure of the junction-resolving enzyme GEN1 bound to DNA at 2.5 Å resolution. The structure of the GEN1 protein reveals it to have an elaborated FEN-XPG family fold that is modified for its role in four-way junction resolution. The functional unit in the crystal is a monomer of active GEN1 bound to the product of resolution cleavage, with an extensive DNA binding interface for both helical arms. Within the crystal lattice, a GEN1 dimer interface juxtaposes two products, whereby they can be reconnected into a four-way junction, the structure of which agrees with that determined in solution. The reconnection requires some opening of the DNA structure at the center, in agreement with permanganate probing and 2-aminopurine fluorescence. The structure shows that a relaxation of the DNA structure accompanies cleavage, suggesting how second-strand cleavage is accelerated to ensure productive resolution of the junction.

  7. National Academy of Sciences and Academy of Sciences of the USSR workshop on structure of the eucaryotic genome and regulation of its expression. Final report

    Energy Technology Data Exchange (ETDEWEB)

    1990-12-31

    This report provides a brief overview of the Workshop on Structure of the Eukaryotic Genome and Regulation of its Expression held in Tbilisi, Georgia, USSR. The report describes the presentations made at the meeting but also goes on to describe the state of molecular biology and genetics research in the Soviet Union and makes recommendations on how to improve future such meetings.

  8. National Academy of Sciences and Academy of Sciences of the USSR workshop on structure of the eucaryotic genome and regulation of its expression

    Energy Technology Data Exchange (ETDEWEB)

    1990-01-01

    This report provides a brief overview of the Workshop on Structure of the Eukaryotic Genome and Regulation of its Expression held in Tbilisi, Georgia, USSR. The report describes the presentations made at the meeting but also goes on to describe the state of molecular biology and genetics research in the Soviet Union and makes recommendations on how to improve future such meetings.

  9. How often do they have sex? A comparative analysis of the population structure of seven eukaryotic microbial pathogens.

    Directory of Open Access Journals (Sweden)

    Nicolás Tomasini

    Full Text Available The model of predominant clonal evolution (PCE proposed for micropathogens does not state that genetic exchange is totally absent, but rather, that it is too rare to break the prevalent PCE pattern. However, the actual impact of this "residual" genetic exchange should be evaluated. Multilocus Sequence Typing (MLST is an excellent tool to explore the problem. Here, we compared online available MLST datasets for seven eukaryotic microbial pathogens: Trypanosoma cruzi, the Fusarium solani complex, Aspergillus fumigatus, Blastocystis subtype 3, the Leishmania donovani complex, Candida albicans and Candida glabrata. We first analyzed phylogenetic relationships among genotypes within each dataset. Then, we examined different measures of branch support and incongruence among loci as signs of genetic structure and levels of past recombination. The analyses allow us to identify three types of genetic structure. The first was characterized by trees with well-supported branches and low levels of incongruence suggesting well-structured populations and PCE. This was the case for the T. cruzi and F. solani datasets. The second genetic structure, represented by Blastocystis spp., A. fumigatus and the L. donovani complex datasets, showed trees with weakly-supported branches but low levels of incongruence among loci, whereby genetic structuration was not clearly defined by MLST. Finally, trees showing weakly-supported branches and high levels of incongruence among loci were observed for Candida species, suggesting that genetic exchange has a higher evolutionary impact in these mainly clonal yeast species. Furthermore, simulations showed that MLST may fail to show right clustering in population datasets even in the absence of genetic exchange. In conclusion, these results make it possible to infer variable impacts of genetic exchange in populations of predominantly clonal micro-pathogens. Moreover, our results reveal different problems of MLST to determine the

  10. 3D structure of eukaryotic flagella in a quiescent state revealed by cryo-electron tomography

    Science.gov (United States)

    Nicastro, Daniela; McIntosh, J. Richard; Baumeister, Wolfgang

    2005-01-01

    We have used cryo-electron tomography to investigate the 3D structure and macromolecular organization of intact, frozen-hydrated sea urchin sperm flagella in a quiescent state. The tomographic reconstructions provide information at a resolution better than 6 nm about the in situ arrangements of macromolecules that are key for flagellar motility. We have visualized the heptameric rings of the motor domains in the outer dynein arm complex and determined that they lie parallel to the plane that contains the axes of neighboring flagellar microtubules. Both the material associated with the central pair of microtubules and the radial spokes display a plane of symmetry that helps to explain the planar beat pattern of these flagella. Cryo-electron tomography has proven to be a powerful technique for helping us understand the relationships between flagellar structure and function and the design of macromolecular machines in situ. PMID:16246999

  11. Structure of a eukaryotic voltage-gated sodium channel at near-atomic resolution.

    Science.gov (United States)

    Shen, Huaizong; Zhou, Qiang; Pan, Xiaojing; Li, Zhangqiang; Wu, Jianping; Yan, Nieng

    2017-03-03

    Voltage-gated sodium (Na v ) channels are responsible for the initiation and propagation of action potentials. They are associated with a variety of channelopathies and are targeted by multiple pharmaceutical drugs and natural toxins. Here, we report the cryogenic electron microscopy structure of a putative Na v channel from American cockroach (designated Na v PaS) at 3.8 angstrom resolution. The voltage-sensing domains (VSDs) of the four repeats exhibit distinct conformations. The entrance to the asymmetric selectivity filter vestibule is guarded by heavily glycosylated and disulfide bond-stabilized extracellular loops. On the cytoplasmic side, a conserved amino-terminal domain is placed below VSD I , and a carboxy-terminal domain binds to the III-IV linker. The structure of Na v PaS establishes an important foundation for understanding function and disease mechanism of Na v and related voltage-gated calcium channels. Copyright © 2017, American Association for the Advancement of Science.

  12. Labeling of eukaryotic messenger RNA 5' terminus with phosphorus -32: use of tobacco acid pyrophosphatase for removal of cap structures

    International Nuclear Information System (INIS)

    Lockard, R.E.; Rieser, L.; Vournakis, J.N.

    1981-01-01

    In recent years, there has been a growing appreciation of the potential applications of 5'- 32 P-end-labeled mRNA, not only for screening recombinant clones and mapping gene structure, but also for revealing possible nucleotide sequence and structural signals within mRNA molecules themselves, which may be important for eukaryotic mRNA processing and turnover and for controlling differential rates of translational initiation. Three major problems, however, have retarded progress in this area, lack of methods for efficient and reproducible removal of m7G5ppp5'-cap structures, which maintain the integrity of an RNA molecule; inability to generate a sufficient amount of labeled mRNA, owing to the limited availability of most pure mRNA species; and the frequent problem of RNA degradation during in vitro end-labeling owing to RNAse contamination. The procedures presented here permit one to decap and label minute quantities of mRNA, effectively. Tobacco acid pyrophosphatase is relatively efficient in removing cap structures from even nanogram quantities of available mRNA, and enough radioactivity can be easily generated from minute amounts ofintact mRNA with very high-specific-activity [gamma- 32 P]ATP and the inhibition of ribonuclease contamination with diethylpyrocarbonate. These procedures can be modified and applied to almost any other type of RNA molecule as well. In Section III of this volume, we explore in detail how effectively 5'-end-labeled mRNA can be used not only for nucleotide sequence analysis, but also for mapping mRNA secondary structure

  13. Gene Composer in a structural genomics environment

    International Nuclear Information System (INIS)

    Lorimer, Don; Raymond, Amy; Mixon, Mark; Burgin, Alex; Staker, Bart; Stewart, Lance

    2011-01-01

    For structural biology applications, protein-construct engineering is guided by comparative sequence analysis and structural information, which allow the researcher to better define domain boundaries for terminal deletions and nonconserved regions for surface mutants. A database software application called Gene Composer has been developed to facilitate construct design. The structural genomics effort at the Seattle Structural Genomics Center for Infectious Disease (SSGCID) requires the manipulation of large numbers of amino-acid sequences and the underlying DNA sequences which are to be cloned into expression vectors. To improve efficiency in high-throughput protein structure determination, a database software package, Gene Composer, has been developed which facilitates the information-rich design of protein constructs and their underlying gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bioinformatics steps used in modern structure-guided protein engineering and synthetic gene engineering. An example of the structure determination of H1N1 RNA-dependent RNA polymerase PB2 subunit is given

  14. Structural genomic variations and Parkinson's disease.

    Science.gov (United States)

    Bandrés-Ciga, Sara; Ruz, Clara; Barrero, Francisco J; Escamilla-Sevilla, Francisco; Pelegrina, Javier; Vives, Francisco; Duran, Raquel

    2017-10-01

    Parkinson's disease (PD) is the second most common neurodegenerative disease, whose prevalence is projected to be between 8.7 and 9.3 million by 2030. Until about 20 years ago, PD was considered to be the textbook example of a "non-genetic" disorder. Nowadays, PD is generally considered a multifactorial disorder that arises from the combination and complex interaction of genes and environmental factors. To date, a total of 7 genes including SNCA, LRRK2, PARK2, DJ-1, PINK 1, VPS35 and ATP13A2 have been seen to cause unequivocally Mendelian PD. Also, variants with incomplete penetrance in the genes LRRK2 and GBA are considered to be strong risk factors for PD worldwide. Although genetic studies have provided valuable insights into the pathogenic mechanisms underlying PD, the role of structural variation in PD has been understudied in comparison with other genomic variations. Structural genomic variations might substantially account for such genetic substrates yet to be discovered. The present review aims to provide an overview of the structural genomic variants implicated in the pathogenesis of PD.

  15. Comparative genome analysis of three eukaryotic parasites with differing abilities to transform leukocytes reveals key mediators of theileria-induced leukocyte transformation

    KAUST Repository

    Hayashida, Kyoko

    2012-09-04

    We sequenced the genome of Theileria orientalis, a tick-borne apicomplexan protozoan parasite of cattle. The focus of this study was a comparative genome analysis of T. orientalis relative to other highly pathogenic Theileria species, T. parva and T. annulata. T. parva and T. annulata induce transformation of infected cells of lymphocyte or macrophage/monocyte lineages; in contrast, T. orientalis does not induce uncontrolled proliferation of infected leukocytes and multiplies predominantly within infected erythrocytes. While synteny across homologous chromosomes of the three Theileria species was found to be well conserved overall, subtelomeric structures were found to differ substantially, as T. orientalis lacks the large tandemly arrayed subtelomere-encoded variable secreted protein-encoding gene family. Moreover, expansion of particular gene families by gene duplication was found in the genomes of the two transforming Theileria species, most notably, the TashAT/TpHN and Tar/Tpr gene families. Gene families that are present only in T. parva and T. annulata and not in T. orientalis, Babesia bovis, or Plasmo-dium were also identified. Identification of differences between the genome sequences of Theileria species with different abilities to transform and immortalize bovine leukocytes will provide insight into proteins and mechanisms that have evolved to induce and regulate this process. The T. orientalis genome database is available at http://totdb.czc.hokudai.ac.jp/. 2012 Hayashida et al. T.

  16. Patterns of prokaryotic lateral gene transfers affecting parasitic microbial eukaryotes

    DEFF Research Database (Denmark)

    Alsmark, Cecilia; Foster, Peter G; Sicheritz-Pontén, Thomas

    2013-01-01

    BACKGROUND: The influence of lateral gene transfer on gene origins and biology in eukaryotes is poorly understood compared with those of prokaryotes. A number of independent investigations focusing on specific genes, individual genomes, or specific functional categories from various eukaryotes have...... approach to systematically investigate lateral gene transfer affecting the proteomes of thirteen, mainly parasitic, microbial eukaryotes, representing four of the six eukaryotic super-groups. All of the genomes investigated have been significantly affected by prokaryote-to-eukaryote lateral gene transfers...... indicated that lateral gene transfer does indeed affect eukaryotic genomes. However, the lack of common methodology and criteria in these studies makes it difficult to assess the general importance and influence of lateral gene transfer on eukaryotic genome evolution. RESULTS: We used a phylogenomic...

  17. RNase MRP and the RNA processing cascade in the eukaryotic ancestor.

    Science.gov (United States)

    Woodhams, Michael D; Stadler, Peter F; Penny, David; Collins, Lesley J

    2007-02-08

    Within eukaryotes there is a complex cascade of RNA-based macromolecules that process other RNA molecules, especially mRNA, tRNA and rRNA. An example is RNase MRP processing ribosomal RNA (rRNA) in ribosome biogenesis. One hypothesis is that this complexity was present early in eukaryotic evolution; an alternative is that an initial simpler network later gained complexity by gene duplication in lineages that led to animals, fungi and plants. Recently there has been a rapid increase in support for the complexity-early theory because the vast majority of these RNA-processing reactions are found throughout eukaryotes, and thus were likely to be present in the last common ancestor of living eukaryotes, herein called the Eukaryotic Ancestor. We present an overview of the RNA processing cascade in the Eukaryotic Ancestor and investigate in particular, RNase MRP which was previously thought to have evolved later in eukaryotes due to its apparent limited distribution in fungi and animals and plants. Recent publications, as well as our own genomic searches, find previously unknown RNase MRP RNAs, indicating that RNase MRP has a wide distribution in eukaryotes. Combining secondary structure and promoter region analysis of RNAs for RNase MRP, along with analysis of the target substrate (rRNA), allows us to discuss this distribution in the light of eukaryotic evolution. We conclude that RNase MRP can now be placed in the RNA-processing cascade of the Eukaryotic Ancestor, highlighting the complexity of RNA-processing in early eukaryotes. Promoter analyses of MRP-RNA suggest that regulation of the critical processes of rRNA cleavage can vary, showing that even these key cellular processes (for which we expect high conservation) show some species-specific variability. We present our consensus MRP-RNA secondary structure as a useful model for further searches.

  18. Structural Molecular Components of Septate Junctions in Cnidarians Point to the Origin of Epithelial Junctions in Eukaryotes

    KAUST Repository

    Ganot, P.

    2014-09-21

    Septate junctions (SJs) insure barrier properties and control paracellular diffusion of solutes across epithelia in invertebrates. However, the origin and evolution of their molecular constituents in Metazoa have not been firmly established. Here, we investigated the genomes of early branching metazoan representatives to reconstruct the phylogeny of the molecular components of SJs. Although Claudins and SJ cytoplasmic adaptor components appeared successively throughout metazoan evolution, the structural components of SJs arose at the time of Placozoa/Cnidaria/Bilateria radiation. We also show that in the scleractinian coral Stylophora pistillata, the structural SJ component Neurexin IV colocalizes with the cortical actin network at the apical border of the cells, at the place of SJs. We propose a model for SJ components in Cnidaria. Moreover, our study reveals an unanticipated diversity of SJ structural component variants in cnidarians. This diversity correlates with gene-specific expression in calcifying and noncalcifying tissues, suggesting specific paracellular pathways across the cell layers of these diploblastic animals.

  19. The COG database: an updated version includes eukaryotes

    Directory of Open Access Journals (Sweden)

    Sverdlov Alexander V

    2003-09-01

    Full Text Available Abstract Background The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Results We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens, one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe, and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the

  20. Local chromatin structure of heterochromatin regulates repeated DNA stability, nucleolus structure, and genome integrity

    Energy Technology Data Exchange (ETDEWEB)

    Peng, Jamy C. [Univ. of California, Berkeley, CA (United States)

    2007-01-01

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  1. DNA to DNA transcription might exist in eukaryotic cells

    OpenAIRE

    Li, Gao-De

    2016-01-01

    Till now, in biological sciences, the term, transcription, mainly refers to DNA to RNA transcription. But our recently published experimental findings obtained from Plasmodium falciparum strongly suggest the existence of DNA to DNA transcription in the genome of eukaryotic cells, which could shed some light on the functions of certain noncoding DNA in the human and other eukaryotic genomes.

  2. Comparative genomics of phylogenetically diverse unicellular eukaryotes provide new insights into the genetic basis for the evolution of the programmed cell death machinery.

    Science.gov (United States)

    Nedelcu, Aurora M

    2009-03-01

    Programmed cell death (PCD) represents a significant component of normal growth and development in multicellular organisms. Recently, PCD-like processes have been reported in single-celled eukaryotes, implying that some components of the PCD machinery existed early in eukaryotic evolution. This study provides a comparative analysis of PCD-related sequences across more than 50 unicellular genera from four eukaryotic supergroups: Unikonts, Excavata, Chromalveolata, and Plantae. A complex set of PCD-related sequences that correspond to domains or proteins associated with all main functional classes--from ligands and receptors to executors of PCD--was found in many unicellular lineages. Several PCD domains and proteins previously thought to be restricted to animals or land plants are also present in unicellular species. Noteworthy, the yeast, Saccharomyces cerevisiae--used as an experimental model system for PCD research, has a rather reduced set of PCD-related sequences relative to other unicellular species. The phylogenetic distribution of the PCD-related sequences identified in unicellular lineages suggests that the genetic basis for the evolution of the complex PCD machinery present in extant multicellular lineages has been established early in the evolution of eukaryotes. The shaping of the PCD machinery in multicellular lineages involved the duplication, co-option, recruitment, and shuffling of domains already present in their unicellular ancestors.

  3. How natural a kind is "eukaryote?".

    Science.gov (United States)

    Doolittle, W Ford

    2014-06-02

    Systematics balances uneasily between realism and nominalism, uncommitted as to whether biological taxa are discoveries or inventions. If the former, they might be taken as natural kinds. I briefly review some philosophers' concepts of natural kinds and then argue that several of these apply well enough to "eukaryote." Although there are some sticky issues around genomic chimerism and when eukaryotes first appeared, if we allow for degrees in the naturalness of kinds, existing eukaryotes rank highly, higher than prokaryotes. Most biologists feel this intuitively: All I attempt to do here is provide some conceptual justification. Copyright © 2014 Cold Spring Harbor Laboratory Press; all rights reserved.

  4. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome.

    Directory of Open Access Journals (Sweden)

    Jian Li

    Full Text Available The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR mediated by low-copy repeats (LCRs. Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ~1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR-mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.

  5. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  6. Automatic generation of gene finders for eukaryotic species

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Krogh, A.

    2006-01-01

    and quality of reliable gene annotation grows. Results We present a procedure, Agene, that automatically generates a species-specific gene predictor from a set of reliable mRNA sequences and a genome. We apply a Hidden Markov model (HMM) that implements explicit length distribution modelling for all gene......Background The number of sequenced eukaryotic genomes is rapidly increasing. This means that over time it will be hard to keep supplying customised gene finders for each genome. This calls for procedures to automatically generate species-specific gene finders and to re-train them as the quantity...... structure blocks using acyclic discrete phase type distributions. The state structure of the each HMM is generated dynamically from an array of sub-models to include only gene features represented in the training set. Conclusion Acyclic discrete phase type distributions are well suited to model sequence...

  7. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure.

    Science.gov (United States)

    Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P

    2017-12-19

    While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.

  8. Transfer of DNA from Bacteria to Eukaryotes

    Directory of Open Access Journals (Sweden)

    Benoît Lacroix

    2016-07-01

    Full Text Available Historically, the members of the Agrobacterium genus have been considered the only bacterial species naturally able to transfer and integrate DNA into the genomes of their eukaryotic hosts. Yet, increasing evidence suggests that this ability to genetically transform eukaryotic host cells might be more widespread in the bacterial world. Indeed, analyses of accumulating genomic data reveal cases of horizontal gene transfer from bacteria to eukaryotes and suggest that it represents a significant force in adaptive evolution of eukaryotic species. Specifically, recent reports indicate that bacteria other than Agrobacterium, such as Bartonella henselae (a zoonotic pathogen, Rhizobium etli (a plant-symbiotic bacterium related to Agrobacterium, or even Escherichia coli, have the ability to genetically transform their host cells under laboratory conditions. This DNA transfer relies on type IV secretion systems (T4SSs, the molecular machines that transport macromolecules during conjugative plasmid transfer and also during transport of proteins and/or DNA to the eukaryotic recipient cells. In this review article, we explore the extent of possible transfer of genetic information from bacteria to eukaryotic cells as well as the evolutionary implications and potential applications of this transfer.

  9. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene

  10. Structural biology at York Structural Biology Laboratory; laboratory information management systems for structural genomics

    Czech Academy of Sciences Publication Activity Database

    Dohnálek, Jan

    2005-01-01

    Roč. 12, č. 1 (2005), s. 3 ISSN 1211-5894. [Meeting of Structural Biologists /4./. 10.03.2005-12.03.2005, Nové Hrady] R&D Projects: GA MŠk(CZ) 1K05008 Keywords : structural biology * LIMS * structural genomics Subject RIV: CD - Macromolecular Chemistry

  11. Structural Genomics of Minimal Organisms: Pipeline and Results

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  12. Visualization of RNA structure models within the Integrative Genomics Viewer.

    Science.gov (United States)

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  13. Eukaryotic acquisition of a bacterial operon

    Science.gov (United States)

    The yeast Saccharomyces cerevisiae is one of the champions of basic biomedical research due to its compact eukaryotic genome and ease of experimental manipulation. Despite these immense strengths, its impact on understanding the genetic basis of natural phenotypic variation has been limited by strai...

  14. Whole genome sequencing and assembly of Eukaryotic microbes isolated from ISS environmental surface Kirovograd region soil Chernobyl Nuclear Power Plant and Chernobyl Exclusion Zone

    Data.gov (United States)

    National Aeronautics and Space Administration — The whole-genome sequences of eight fungal strains that were selected for exposure to microgravity at the International Space Station are presented here. These...

  15. Pathgroups, a dynamic data structure for genome reconstruction problems.

    Science.gov (United States)

    Zheng, Chunfang

    2010-07-01

    Ancestral gene order reconstruction problems, including the median problem, quartet construction, small phylogeny, guided genome halving and genome aliquoting, are NP hard. Available heuristics dedicated to each of these problems are computationally costly for even small instances. We present a data structure enabling rapid heuristic solution to all these ancestral genome reconstruction problems. A generic greedy algorithm with look-ahead based on an automatically generated priority system suffices for all the problems using this data structure. The efficiency of the algorithm is due to fast updating of the structure during run time and to the simplicity of the priority scheme. We illustrate with the first rapid algorithm for quartet construction and apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny. http://albuquerque.bioinformatics.uottawa.ca/pathgroup/Quartet.html chunfang313@gmail.com Supplementary data are available at Bioinformatics online.

  16. Eukaryotic DNA Replicases

    KAUST Repository

    Zaher, Manal S.; Oke, Muse; Hamdan, Samir

    2014-01-01

    The current model of the eukaryotic DNA replication fork includes three replicative DNA polymerases, polymerase α/primase complex (Pol α), polymerase δ (Pol δ), and polymerase ε (Pol ε). The primase synthesizes 8–12 nucleotide RNA primers that are extended by the DNA polymerization activity of Pol α into 30–35 nucleotide RNA-DNA primers. Replication factor C (RFC) opens the polymerase clamp-like processivity factor, proliferating cell nuclear antigen (PCNA), and loads it onto the primer-template. Pol δ utilizes PCNA to mediate highly processive DNA synthesis, while Pol ε has intrinsic high processivity that is modestly stimulated by PCNA. Pol ε replicates the leading strand and Pol δ replicates the lagging strand in a division of labor that is not strict. The three polymerases are comprised of multiple subunits and share unifying features in their large catalytic and B subunits. The remaining subunits are evolutionarily not related and perform diverse functions. The catalytic subunits are members of family B, which are distinguished by their larger sizes due to inserts in their N- and C-terminal regions. The sizes of these inserts vary among the three polymerases, and their functions remain largely unknown. Strikingly, the quaternary structures of Pol α, Pol δ, and Pol ε are arranged similarly. The catalytic subunits adopt a globular structure that is linked via its conserved C-terminal region to the B subunit. The remaining subunits are linked to the catalytic and B subunits in a highly flexible manner.

  17. Eukaryotic DNA Replicases

    KAUST Repository

    Zaher, Manal S.

    2014-11-21

    The current model of the eukaryotic DNA replication fork includes three replicative DNA polymerases, polymerase α/primase complex (Pol α), polymerase δ (Pol δ), and polymerase ε (Pol ε). The primase synthesizes 8–12 nucleotide RNA primers that are extended by the DNA polymerization activity of Pol α into 30–35 nucleotide RNA-DNA primers. Replication factor C (RFC) opens the polymerase clamp-like processivity factor, proliferating cell nuclear antigen (PCNA), and loads it onto the primer-template. Pol δ utilizes PCNA to mediate highly processive DNA synthesis, while Pol ε has intrinsic high processivity that is modestly stimulated by PCNA. Pol ε replicates the leading strand and Pol δ replicates the lagging strand in a division of labor that is not strict. The three polymerases are comprised of multiple subunits and share unifying features in their large catalytic and B subunits. The remaining subunits are evolutionarily not related and perform diverse functions. The catalytic subunits are members of family B, which are distinguished by their larger sizes due to inserts in their N- and C-terminal regions. The sizes of these inserts vary among the three polymerases, and their functions remain largely unknown. Strikingly, the quaternary structures of Pol α, Pol δ, and Pol ε are arranged similarly. The catalytic subunits adopt a globular structure that is linked via its conserved C-terminal region to the B subunit. The remaining subunits are linked to the catalytic and B subunits in a highly flexible manner.

  18. Autophagy in unicellular eukaryotes

    NARCIS (Netherlands)

    Kiel, J.A.K.W.

    2010-01-01

    Cells need a constant supply of precursors to enable the production of macromolecules to sustain growth and survival. Unlike metazoans, unicellular eukaryotes depend exclusively on the extracellular medium for this supply. When environmental nutrients become depleted, existing cytoplasmic components

  19. Structural dynamics of retroviral genome and the packaging.

    Science.gov (United States)

    Miyazaki, Yasuyuki; Miyake, Ariko; Nomaguchi, Masako; Adachi, Akio

    2011-01-01

    Retroviruses can cause diseases such as AIDS, leukemia, and tumors, but are also used as vectors for human gene therapy. All retroviruses, except foamy viruses, package two copies of unspliced genomic RNA into their progeny viruses. Understanding the molecular mechanisms of retroviral genome packaging will aid the design of new anti-retroviral drugs targeting the packaging process and improve the efficacy of retroviral vectors. Retroviral genomes have to be specifically recognized by the cognate nucleocapsid domain of the Gag polyprotein from among an excess of cellular and spliced viral mRNA. Extensive virological and structural studies have revealed how retroviral genomic RNA is selectively packaged into the viral particles. The genomic area responsible for the packaging is generally located in the 5' untranslated region (5' UTR), and contains dimerization site(s). Recent studies have shown that retroviral genome packaging is modulated by structural changes of RNA at the 5' UTR accompanied by the dimerization. In this review, we focus on three representative retroviruses, Moloney murine leukemia virus, human immunodeficiency virus type 1 and 2, and describe the molecular mechanism of retroviral genome packaging.

  20. Structural dynamics of retroviral genome and the packaging

    Directory of Open Access Journals (Sweden)

    Yasuyuki eMiyazaki

    2011-12-01

    Full Text Available Retroviruses can cause diseases such as AIDS, leukemia and tumors, but are also used as vectors for human gene therapy. All retroviruses, except foamy viruses, package two copies of unspliced genomic RNA into their progeny viruses. Understanding the molecular mechanisms of retroviral genome packaging will aid the design of new anti-retroviral drugs targeting the packaging process and improve the efficacy of retroviral vectors. Retroviral genomes have to be specifically recognized by the cognate nucleocapsid (NC domain of the Gag polyprotein from among an excess of cellular and spliced viral mRNA. Extensive virological and structural studies have revealed how retroviral genomic RNA is selectively packaged into the viral particles. The genomic area responsible for the packaging is generally located in the 5’ untranslated region (5’ UTR, and contains dimerization site(s. Recent studies have shown that retroviral genome packaging is modulated by structural changes of RNA at the 5’ UTR accompanied by the dimerization. In this review, we focus on three representative retroviruses, Moloney murine leukemia virus (MoMLV, human immunodeficiency virus type 1 (HIV-1 and 2 (HIV-2, and describe the molecular mechanism of retroviral genome packaging.

  1. Functional RNA structures throughout the Hepatitis C Virus genome.

    Science.gov (United States)

    Adams, Rebecca L; Pirakitikulr, Nathan; Pyle, Anna Marie

    2017-06-01

    The single-stranded Hepatitis C Virus (HCV) genome adopts a set of elaborate RNA structures that are involved in every stage of the viral lifecycle. Recent advances in chemical probing, sequencing, and structural biology have facilitated analysis of RNA folding on a genome-wide scale, revealing novel structures and networks of interactions. These studies have underscored the active role played by RNA in every function of HCV and they open the door to new types of RNA-targeted therapeutics. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.; Lobzin, V.V.

    2004-01-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions

  3. Structural genomics of infectious disease drug targets: the SSGCID

    International Nuclear Information System (INIS)

    Stacy, Robin; Begley, Darren W.; Phan, Isabelle; Staker, Bart L.; Van Voorhis, Wesley C.; Varani, Gabriele; Buchko, Garry W.; Stewart, Lance J.; Myler, Peter J.

    2011-01-01

    An introduction and overview of the focus, goals and overall mission of the Seattle Structural Genomics Center for Infectious Disease (SSGCID) is given. The Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium of researchers at Seattle BioMed, Emerald BioStructures, the University of Washington and Pacific Northwest National Laboratory that was established to apply structural genomics approaches to drug targets from infectious disease organisms. The SSGCID is currently funded over a five-year period by the National Institute of Allergy and Infectious Diseases (NIAID) to determine the three-dimensional structures of 400 proteins from a variety of Category A, B and C pathogens. Target selection engages the infectious disease research and drug-therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. The protein-expression systems, purified proteins, ligand screens and three-dimensional structures produced by SSGCID constitute a valuable resource for drug-discovery research, all of which is made freely available to the greater scientific community. This issue of Acta Crystallographica Section F, entirely devoted to the work of the SSGCID, covers the details of the high-throughput pipeline and presents a series of structures from a broad array of pathogenic organisms. Here, a background is provided on the structural genomics of infectious disease, the essential components of the SSGCID pipeline are discussed and a survey of progress to date is presented

  4. Structure of a prokaryotic sodium channel pore reveals essential gating elements and an outer ion binding site common to eukaryotic channels.

    Science.gov (United States)

    Shaya, David; Findeisen, Felix; Abderemane-Ali, Fayal; Arrigoni, Cristina; Wong, Stephanie; Nurva, Shailika Reddy; Loussouarn, Gildas; Minor, Daniel L

    2014-01-23

    Voltage-gated sodium channels (NaVs) are central elements of cellular excitation. Notwithstanding advances from recent bacterial NaV (BacNaV) structures, key questions about gating and ion selectivity remain. Here, we present a closed conformation of NaVAe1p, a pore-only BacNaV derived from NaVAe1, a BacNaV from the arsenite oxidizer Alkalilimnicola ehrlichei found in Mono Lake, California, that provides insight into both fundamental properties. The structure reveals a pore domain in which the pore-lining S6 helix connects to a helical cytoplasmic tail. Electrophysiological studies of full-length BacNaVs show that two elements defined by the NaVAe1p structure, an S6 activation gate position and the cytoplasmic tail "neck", are central to BacNaV gating. The structure also reveals the selectivity filter ion entry site, termed the "outer ion" site. Comparison with mammalian voltage-gated calcium channel (CaV) selectivity filters, together with functional studies, shows that this site forms a previously unknown determinant of CaV high-affinity calcium binding. Our findings underscore commonalities between BacNaVs and eukaryotic voltage-gated channels and provide a framework for understanding gating and ion permeation in this superfamily. © 2013. Published by Elsevier Ltd. All rights reserved.

  5. In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures

    Energy Technology Data Exchange (ETDEWEB)

    Acquisti, Claudia; Allegrini, Paolo E-mail: allegrip@ilc.cnr.it; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi

    2004-04-01

    We investigate on a possible way to connect the presence of low-complexity sequences (LCS) in DNA genomes and the non-stationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called non-stationarity entropic index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation.

  6. In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures

    Science.gov (United States)

    Acquisti, Claudia; Allegrini, Paolo; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi

    2004-04-01

    We investigate on a possible way to connect the presence of Low-Complexity Sequences (LCS) in DNA genomes and the nonstationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called Non-Stationarity Entropic Index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation.

  7. In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures

    International Nuclear Information System (INIS)

    Acquisti, Claudia; Allegrini, Paolo; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi

    2004-01-01

    We investigate on a possible way to connect the presence of low-complexity sequences (LCS) in DNA genomes and the non-stationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called non-stationarity entropic index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation

  8. Structural Genomics and Drug Discovery for Infectious Diseases

    International Nuclear Information System (INIS)

    Anderson, W.F.

    2009-01-01

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  9. Structural determinants and mechanism of HIV-1 genome packaging.

    Science.gov (United States)

    Lu, Kun; Heng, Xiao; Summers, Michael F

    2011-07-22

    Like all retroviruses, the human immunodeficiency virus selectively packages two copies of its unspliced RNA genome, both of which are utilized for strand-transfer-mediated recombination during reverse transcription-a process that enables rapid evolution under environmental and chemotherapeutic pressures. The viral RNA appears to be selected for packaging as a dimer, and there is evidence that dimerization and packaging are mechanistically coupled. Both processes are mediated by interactions between the nucleocapsid domains of a small number of assembling viral Gag polyproteins and RNA elements within the 5'-untranslated region of the genome. A number of secondary structures have been predicted for regions of the genome that are responsible for packaging, and high-resolution structures have been determined for a few small RNA fragments and protein-RNA complexes. However, major questions regarding the RNA structures (and potentially the structural changes) that are responsible for dimeric genome selection remain unanswered. Here, we review efforts that have been made to identify the molecular determinants and mechanism of human immunodeficiency virus type 1 genome packaging. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Multi-scale structural community organisation of the human genome.

    Science.gov (United States)

    Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

    2017-04-11

    Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.

  11. The Impact of Structural Genomics: Expectations and Outcomes

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Structural Genomics (SG) projects aim to expand our structural knowledge of biological macromolecules, while lowering the average costs of structure determination. We quantitatively analyzed the novelty, cost, and impact of structures solved by SG centers, and contrast these results with traditional structural biology. The first structure from a protein family is particularly important to reveal the fold and ancient relationships to other proteins. In the last year, approximately half of such structures were solved at a SG center rather than in a traditional laboratory. Furthermore, the cost of solving a structure at the most efficient U.S. center has now dropped to one-quarter the estimated cost of solving a structure by traditional methods. However, top structural biology laboratories are much more efficient than the average, and comparable to SG centers despite working on very challenging structures. Moreover, traditional structural biology papers are cited significantly more often, suggesting greater current impact.

  12. Defensins: antifungal lessons from eukaryotes

    Directory of Open Access Journals (Sweden)

    Patrícia M. Silva

    2014-03-01

    Full Text Available Over the last years, antimicrobial peptides (AMPs have been the focus of intense research towards the finding of a viable alternative to current antifungal drugs. Defensins are one of the major families of AMPs and the most represented among all eukaryotic groups, providing an important first line of host defense against pathogenic microorganisms. Several of these cysteine-stabilized peptides present a relevant effect against fungi. Defensins are the AMPs with the broader distribution across all eukaryotic kingdoms, namely, Fungi, Plantæ and Animalia, and were recently shown to have an ancestor in a bacterial organism. As a part of the host defense, defensins act as an important vehicle of information between innate and adaptive immune system and have a role in immunomodulation. This multidimensionality represents a powerful host shield, hard for microorganisms to overcome using single approach resistance strategies. Pathogenic fungi resistance to conventional antimycotic drugs is becoming a major problem. Defensins, as other AMPs, have shown to be an effective alternative to the current antimycotic therapies, demonstrating potential as novel therapeutic agents or drug leads. In this review, we summarize the current knowledge on some eukaryotic defensins with antifungal action. An overview of the main targets in the fungal cell and the mechanism of action of these AMPs (namely, the selectivity for some fungal membrane components are presented. Additionally, recent works on antifungal defensins structure, activity and citotoxicity are also reviewed.

  13. GOBASE: an organelle genome database

    OpenAIRE

    O?Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

    2008-01-01

    The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (?913 000) and chloroplast-encoded sequences (?250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing...

  14. Atypical mitochondrial inheritance patterns in eukaryotes.

    Science.gov (United States)

    Breton, Sophie; Stewart, Donald T

    2015-10-01

    Mitochondrial DNA (mtDNA) is predominantly maternally inherited in eukaryotes. Diverse molecular mechanisms underlying the phenomenon of strict maternal inheritance (SMI) of mtDNA have been described, but the evolutionary forces responsible for its predominance in eukaryotes remain to be elucidated. Exceptions to SMI have been reported in diverse eukaryotic taxa, leading to the prediction that several distinct molecular mechanisms controlling mtDNA transmission are present among the eukaryotes. We propose that these mechanisms will be better understood by studying the deviations from the predominating pattern of SMI. This minireview summarizes studies on eukaryote species with unusual or rare mitochondrial inheritance patterns, i.e., other than the predominant SMI pattern, such as maternal inheritance of stable heteroplasmy, paternal leakage of mtDNA, biparental and strictly paternal inheritance, and doubly uniparental inheritance of mtDNA. The potential genes and mechanisms involved in controlling mitochondrial inheritance in these organisms are discussed. The linkage between mitochondrial inheritance and sex determination is also discussed, given that the atypical systems of mtDNA inheritance examined in this minireview are frequently found in organisms with uncommon sexual systems such as gynodioecy, monoecy, or andromonoecy. The potential of deviations from SMI for facilitating a better understanding of a number of fundamental questions in biology, such as the evolution of mtDNA inheritance, the coevolution of nuclear and mitochondrial genomes, and, perhaps, the role of mitochondria in sex determination, is considerable.

  15. Evolutionary genomics and population structure of Entamoeba histolytica

    Directory of Open Access Journals (Sweden)

    Koushik Das

    2014-11-01

    Full Text Available Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba.

  16. Structural Studies of RNA Helicases Involved in Eukaryotic Pre-mRNA Splicing, Ribosome Biogenesis, and Translation Initiation

    DEFF Research Database (Denmark)

    He, Yangzi

    and ligates the neighbouring exons to generate mature mRNAs. Prp43 is an RNA helicase of the DEAH/RHA family. In yeast, once mRNAs are released, Prp43 catalyzes the disassembly of spliceosomes. The 18S, 5.8S and 25S rRNAs are transcribed as a single polycistronic transcript—the 35S pre......-rRNA. It is nucleolytically cleaved and chemically modified to generate mature rRNAs, which assemble with ribosomal proteins to form the ribosome. Prp43 is required for the processing of the 18S rRNA. Using X-ray crystallography, I determined a high resolution structure of Prp43 bound to ADP, the first structure of a DEAH....../RHA helicase. It defined the conserved structural features of all DEAH/RHA helicases, and unveiled a novel nucleotide binding site. Additionally a preliminary low resolution structure of a ternary complex comprising Prp43, a non-hydrolyzable ATP analogue, and a single-stranded RNA, was obtained. The ribosome...

  17. Mammalian translation elongation factor eEF1A2: X-ray structure and new features of GDP/GTP exchange mechanism in higher eukaryotes.

    Science.gov (United States)

    Crepin, Thibaut; Shalak, Vyacheslav F; Yaremchuk, Anna D; Vlasenko, Dmytro O; McCarthy, Andrew; Negrutskii, Boris S; Tukalo, Michail A; El'skaya, Anna V

    2014-11-10

    Eukaryotic elongation factor eEF1A transits between the GTP- and GDP-bound conformations during the ribosomal polypeptide chain elongation. eEF1A*GTP establishes a complex with the aminoacyl-tRNA in the A site of the 80S ribosome. Correct codon-anticodon recognition triggers GTP hydrolysis, with subsequent dissociation of eEF1A*GDP from the ribosome. The structures of both the 'GTP'- and 'GDP'-bound conformations of eEF1A are unknown. Thus, the eEF1A-related ribosomal mechanisms were anticipated only by analogy with the bacterial homolog EF-Tu. Here, we report the first crystal structure of the mammalian eEF1A2*GDP complex which indicates major differences in the organization of the nucleotide-binding domain and intramolecular movements of eEF1A compared to EF-Tu. Our results explain the nucleotide exchange mechanism in the mammalian eEF1A and suggest that the first step of eEF1A*GDP dissociation from the 80S ribosome is the rotation of the nucleotide-binding domain observed after GTP hydrolysis. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  19. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob H

    2014-01-01

    BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However......, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome...

  20. cDNA structure, genomic organization and expression patterns of ...

    African Journals Online (AJOL)

    Visfatin was a newly identified adipocytokine, which was involved in various physiologic and pathologic processes of organisms. The cDNA structure, genomic organization and expression patterns of silver Prussian carp visfatin were described in this report. The silver Prussian carp visfatin cDNA cloned from the liver was ...

  1. Eukaryotic Cell Panorama

    Science.gov (United States)

    Goodsell, David S.

    2011-01-01

    Diverse biological data may be used to create illustrations of molecules in their cellular context. This report describes the scientific results that support an illustration of a eukaryotic cell, enlarged by one million times to show the distribution and arrangement of macromolecules. The panoramic cross section includes eight panels that extend…

  2. Eukaryotic cell flattening

    Science.gov (United States)

    Bae, Albert; Westendorf, Christian; Erlenkamper, Christoph; Galland, Edouard; Franck, Carl; Bodenschatz, Eberhard; Beta, Carsten

    2010-03-01

    Eukaryotic cell flattening is valuable for improving microscopic observations, ranging from bright field to total internal reflection fluorescence microscopy. In this talk, we will discuss traditional overlay techniques, and more modern, microfluidic based flattening, which provides a greater level of control. We demonstrate these techniques on the social amoebae Dictyostelium discoideum, comparing the advantages and disadvantages of each method.

  3. Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences

    Directory of Open Access Journals (Sweden)

    Makiko Suwa

    2011-04-01

    Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.

  4. Norovirus translation requires an interaction between the C Terminus of the genome-linked viral protein VPg and eukaryotic translation initiation factor 4G.

    Science.gov (United States)

    Chung, Liliane; Bailey, Dalan; Leen, Eoin N; Emmott, Edward P; Chaudhry, Yasmin; Roberts, Lisa O; Curry, Stephen; Locker, Nicolas; Goodfellow, Ian G

    2014-08-01

    Viruses have evolved a variety of mechanisms to usurp the host cell translation machinery to enable translation of the viral genome in the presence of high levels of cellular mRNAs. Noroviruses, a major cause of gastroenteritis in man, have evolved a mechanism that relies on the interaction of translation initiation factors with the virus-encoded VPg protein covalently linked to the 5' end of the viral RNA. To further characterize this novel mechanism of translation initiation, we have used proteomics to identify the components of the norovirus translation initiation factor complex. This approach revealed that VPg binds directly to the eIF4F complex, with a high affinity interaction occurring between VPg and eIF4G. Mutational analyses indicated that the C-terminal region of VPg is important for the VPg-eIF4G interaction; viruses with mutations that alter or disrupt this interaction are debilitated or non-viable. Our results shed new light on the unusual mechanisms of protein-directed translation initiation. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. [Structure-functional organization of eukaryotic high-affinity copper importer CTR1 determines its ability to transport copper, silver and cisplatin].

    Science.gov (United States)

    Skvortsov, A N; Zatulovskiĭ, E A; Puchkova, L V

    2012-01-01

    It was shown recently, that high affinity Cu(I) importer eukaryotic protein CTR1 can also transport in vitro abiogenic Ag(I) ions and anticancer drug cisplatin. At present there is no rational explanation how CTR1 can transfer platinum group, which is different by coordination properties from highly similar Cu(I) and Ag(I). To understand this phenomenon we analyzed 25 sequences of chordate CTR1 proteins, and found out conserved patterns of organization of N-terminal extracellular part of CTR1 which correspond to initial metal binding. Extracellular copper-binding motifs were qualified by their coordination properties. It was shown that relative position of Met- and His-rich copper-binding motifs in CTR1 predisposes the extracellular CTR1 part to binding of copper, silver and cisplatin. Relation between tissue-specific expression of CTR1 gene, steady-state copper concentration, and silver and platinum accumulation in organs of mice in vivo was analyzed. Significant positive but incomplete correlation exists between these variables. Basing on structural and functional peculiarities of N-terminal part of CTR1 a hypothesis of coupled transport of copper and cisplatin has been suggested, which avoids the disagreement between CTR1-mediated cisplatin transport in vitro, and irreversible binding of platinum to Met-rich peptides.

  6. Anaerobic energy metabolism in unicellular photosynthetic eukaryotes.

    Science.gov (United States)

    Atteia, Ariane; van Lis, Robert; Tielens, Aloysius G M; Martin, William F

    2013-02-01

    Anaerobic metabolic pathways allow unicellular organisms to tolerate or colonize anoxic environments. Over the past ten years, genome sequencing projects have brought a new light on the extent of anaerobic metabolism in eukaryotes. A surprising development has been that free-living unicellular algae capable of photoautotrophic lifestyle are, in terms of their enzymatic repertoire, among the best equipped eukaryotes known when it comes to anaerobic energy metabolism. Some of these algae are marine organisms, common in the oceans, others are more typically soil inhabitants. All these species are important from the ecological (O(2)/CO(2) budget), biotechnological, and evolutionary perspectives. In the unicellular algae surveyed here, mixed-acid type fermentations are widespread while anaerobic respiration, which is more typical of eukaryotic heterotrophs, appears to be rare. The presence of a core anaerobic metabolism among the algae provides insights into its evolutionary origin, which traces to the eukaryote common ancestor. The predicted fermentative enzymes often exhibit an amino acid extension at the N-terminus, suggesting that these proteins might be compartmentalized in the cell, likely in the chloroplast or the mitochondrion. The green algae Chlamydomonas reinhardtii and Chlorella NC64 have the most extended set of fermentative enzymes reported so far. Among the eukaryotes with secondary plastids, the diatom Thalassiosira pseudonana has the most pronounced anaerobic capabilities as yet. From the standpoints of genomic, transcriptomic, and biochemical studies, anaerobic energy metabolism in C. reinhardtii remains the best characterized among photosynthetic protists. This article is part of a Special Issue entitled: The evolutionary aspects of bioenergetic systems. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Evolution of the Exon-Intron Structure in Ciliate Genomes.

    Directory of Open Access Journals (Sweden)

    Vladyslav S Bondarenko

    Full Text Available A typical eukaryotic gene is comprised of alternating stretches of regions, exons and introns, retained in and spliced out a mature mRNA, respectively. Although the length of introns may vary substantially among organisms, a large fraction of genes contains short introns in many species. Notably, some Ciliates (Paramecium and Nyctotherus possess only ultra-short introns, around 25 bp long. In Paramecium, ultra-short introns with length divisible by three (3n are under strong evolutionary pressure and have a high frequency of in-frame stop codons, which, in the case of intron retention, cause premature termination of mRNA translation and consequent degradation of the mis-spliced mRNA by the nonsense-mediated decay mechanism. Here, we analyzed introns in five genera of Ciliates, Paramecium, Tetrahymena, Ichthyophthirius, Oxytricha, and Stylonychia. Introns can be classified into two length classes in Tetrahymena and Ichthyophthirius (with means 48 bp, 69 bp, and 55 bp, 64 bp, respectively, but, surprisingly, comprise three distinct length classes in Oxytricha and Stylonychia (with means 33-35 bp, 47-51 bp, and 78-80 bp. In most ranges of the intron lengths, 3n introns are underrepresented and have a high frequency of in-frame stop codons in all studied species. Introns of Paramecium, Tetrahymena, and Ichthyophthirius are preferentially located at the 5' and 3' ends of genes, whereas introns of Oxytricha and Stylonychia are strongly skewed towards the 5' end. Analysis of evolutionary conservation shows that, in each studied genome, a significant fraction of intron positions is conserved between the orthologs, but intron lengths are not correlated between the species. In summary, our study provides a detailed characterization of introns in several genera of Ciliates and highlights some of their distinctive properties, which, together, indicate that splicing spellchecking is a universal and evolutionarily conserved process in the biogenesis of short

  8. Chromatin structure and evolution in the human genome

    Directory of Open Access Journals (Sweden)

    Dunlop Malcolm G

    2007-05-01

    Full Text Available Abstract Background Evolutionary rates are not constant across the human genome but genes in close proximity have been shown to experience similar levels of divergence and selection. The higher-order organisation of chromosomes has often been invoked to explain such phenomena but previously there has been insufficient data on chromosome structure to investigate this rigorously. Using the results of a recent genome-wide analysis of open and closed human chromatin structures we have investigated the global association between divergence, selection and chromatin structure for the first time. Results In this study we have shown that, paradoxically, synonymous site divergence (dS at non-CpG sites is highest in regions of open chromatin, primarily as a result of an increased number of transitions, while the rates of other traditional measures of mutation (intergenic, intronic and ancient repeat divergence as well as SNP density are highest in closed regions of the genome. Analysis of human-chimpanzee divergence across intron-exon boundaries indicates that although genes in relatively open chromatin generally display little selection at their synonymous sites, those in closed regions show markedly lower divergence at their fourfold degenerate sites than in neighbouring introns and intergenic regions. Exclusion of known Exonic Splice Enhancer hexamers has little affect on the divergence observed at fourfold degenerate sites across chromatin categories; however, we show that closed chromatin is enriched with certain classes of ncRNA genes whose RNA secondary structure may be particularly important. Conclusion We conclude that, overall, non-CpG mutation rates are lowest in open regions of the genome and that regions of the genome with a closed chromatin structure have the highest background mutation rate. This might reflect lower rates of DNA damage or enhanced DNA repair processes in regions of open chromatin. Our results also indicate that dS is a poor

  9. Crystal structure of the regulatory subunit of archaeal initiation factor 2B (aIF2B) from hyperthermophilic archaeon Pyrococcus horikoshii OT3: a proposed structure of the regulatory subcomplex of eukaryotic IF2B

    International Nuclear Information System (INIS)

    Kakuta, Yoshimitsu; Tahara, Maino; Maetani, Shigehiro; Yao, Min; Tanaka, Isao; Kimura, Makoto

    2004-01-01

    Eukaryotic translation initiation factor 2B (eIF2B) is the guanine-nucleotide exchange factor for eukaryotic initiation factor 2 (eIF2). eIF2B is a heteropentameric protein composed of α-ε subunits. The α, β, and δ subunits form a regulatory subcomplex, while the γ and ε form a catalytic subcomplex. Archaea possess homologues of α, β, and δ subunits of eIF2B. Here, we report the three-dimensional structure of an archaeal regulatory subunit (aIF2Bα) from the hyperthermophilic archaeon Pyrococcus horikoshii OT3 determined by X-ray crystallography at 2.2 A resolution. aIF2Bα consists of two subdomains, an N-domain (residues 1-95) and a C-domain (residues 96-276), connected by a long α-helix (α5: 78-106). The N-domain contains a five helix bundle structure, while the C-domain folds into the α/β structure, thus showing similarity to D-ribose-5-phosphate isomerase structure. The presence of two molecules in the crystallographic asymmetric unit and the gel filtration analysis suggest a dimeric structure of aIF2Bα in solution, interacting with each other by C-domains. Furthermore, the crystallographic 3-fold symmetry generates a homohexameric structure of aIF2Bα; the interaction is primarily mediated by the long α-helix at the N-domains. This structure suggests an architecture of the three subunits, α, β, and δ, in the regulatory subcomplex within eIF2B

  10. Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.

    Science.gov (United States)

    Liu, Bernard A

    2017-01-01

    Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.

  11. Elucidation of Operon Structures across Closely Related Bacterial Genomes

    Science.gov (United States)

    Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components. PMID:24959722

  12. Genomic analysis of the hierarchical structure of regulatory networks

    Science.gov (United States)

    Yu, Haiyuan; Gerstein, Mark

    2006-01-01

    A fundamental question in biology is how the cell uses transcription factors (TFs) to coordinate the expression of thousands of genes in response to various stimuli. The relationships between TFs and their target genes can be modeled in terms of directed regulatory networks. These relationships, in turn, can be readily compared with commonplace “chain-of-command” structures in social networks, which have characteristic hierarchical layouts. Here, we develop algorithms for identifying generalized hierarchies (allowing for various loop structures) and use these approaches to illuminate extensive pyramid-shaped hierarchical structures existing in the regulatory networks of representative prokaryotes (Escherichia coli) and eukaryotes (Saccharomyces cerevisiae), with most TFs at the bottom levels and only a few master TFs on top. These masters are situated near the center of the protein–protein interaction network, a different type of network from the regulatory one, and they receive most of the input for the whole regulatory hierarchy through protein interactions. Moreover, they have maximal influence over other genes, in terms of affecting expression-level changes. Surprisingly, however, TFs at the bottom of the regulatory hierarchy are more essential to the viability of the cell. Finally, one might think master TFs achieve their wide influence through directly regulating many targets, but TFs with most direct targets are in the middle of the hierarchy. We find, in fact, that these midlevel TFs are “control bottlenecks” in the hierarchy, and this great degree of control for “middle managers” has parallels in efficient social structures in various corporate and governmental settings. PMID:17003135

  13. The reduced kinome of Ostreococcus tauri: core eukaryotic signalling components in a tractable model species.

    Science.gov (United States)

    Hindle, Matthew M; Martin, Sarah F; Noordally, Zeenat B; van Ooijen, Gerben; Barrios-Llerena, Martin E; Simpson, T Ian; Le Bihan, Thierry; Millar, Andrew J

    2014-08-02

    The current knowledge of eukaryote signalling originates from phenotypically diverse organisms. There is a pressing need to identify conserved signalling components among eukaryotes, which will lead to the transfer of knowledge across kingdoms. Two useful properties of a eukaryote model for signalling are (1) reduced signalling complexity, and (2) conservation of signalling components. The alga Ostreococcus tauri is described as the smallest free-living eukaryote. With less than 8,000 genes, it represents a highly constrained genomic palette. Our survey revealed 133 protein kinases and 34 protein phosphatases (1.7% and 0.4% of the proteome). We conducted phosphoproteomic experiments and constructed domain structures and phylogenies for the catalytic protein-kinases. For each of the major kinases families we review the completeness and divergence of O. tauri representatives in comparison to the well-studied kinomes of the laboratory models Arabidopsis thaliana and Saccharomyces cerevisiae, and of Homo sapiens. Many kinase clades in O. tauri were reduced to a single member, in preference to the loss of family diversity, whereas TKL and ABC1 clades were expanded. We also identified kinases that have been lost in A. thaliana but retained in O. tauri. For three, contrasting eukaryotic pathways - TOR, MAPK, and the circadian clock - we established the subset of conserved components and demonstrate conserved sites of substrate phosphorylation and kinase motifs. We conclude that O. tauri satisfies our two central requirements. Several of its kinases are more closely related to H. sapiens orthologs than S. cerevisiae is to H. sapiens. The greatly reduced kinome of O. tauri is therefore a suitable model for signalling in free-living eukaryotes.

  14. Structured Matrix Completion with Applications to Genomic Data Integration.

    Science.gov (United States)

    Cai, Tianxi; Cai, T Tony; Zhang, Anru

    2016-01-01

    Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  15. Impact of nuclear organization and chromatin structure on DNA repair and genome stability

    International Nuclear Information System (INIS)

    Batte, Amandine

    2016-01-01

    The non-random organization of the eukaryotic cell nucleus and the folding of genome in chromatin more or less condensed can influence many functions related to DNA metabolism, including genome stability. Double-strand breaks (DSBs) are the most deleterious DNA damages for the cells. To preserve genome integrity, eukaryotic cells thus developed DSB repair mechanisms conserved from yeast to human, among which homologous recombination (HR) that uses an intact homologous sequence to repair a broken chromosome. HR can be separated in two sub-pathways: Gene Conversion (GC) transfers genetic information from one molecule to its homologous and Break Induced Replication (BIR) establishes a replication fork than can proceed until the chromosome end. My doctorate work was focused on the contribution of the chromatin context and 3D genome organization on DSB repair. In S. cerevisiae, nuclear organization and heterochromatin spreading at sub-telomeres can be modified through the overexpression of the Sir3 or sir3A2Q mutant proteins. We demonstrated that reducing the physical distance between homologous sequences increased GC rates, reinforcing the notion that homology search is a limiting step for recombination. We also showed that hetero-chromatinization of DSB site fine-tunes DSB resection, limiting the loss of the DSB ends required to perform homology search and complete HR. Finally, we noticed that the presence of heterochromatin at the donor locus decreased both GC and BIR efficiencies, probably by affecting strand invasion. This work highlights new regulatory pathways of DNA repair. (author) [fr

  16. Unicellular eukaryotes as models in cell and molecular biology: critical appraisal of their past and future value.

    Science.gov (United States)

    Simon, Martin; Plattner, Helmut

    2014-01-01

    Unicellular eukaryotes have been appreciated as model systems for the analysis of crucial questions in cell and molecular biology. This includes Dictyostelium (chemotaxis, amoeboid movement, phagocytosis), Tetrahymena (telomere structure, telomerase function), Paramecium (variant surface antigens, exocytosis, phagocytosis cycle) or both ciliates (ciliary beat regulation, surface pattern formation), Chlamydomonas (flagellar biogenesis and beat), and yeast (S. cerevisiae) for innumerable aspects. Nowadays many problems may be tackled with "higher" eukaryotic/metazoan cells for which full genomic information as well as domain databases, etc., were available long before protozoa. Established molecular tools, commercial antibodies, and established pharmacology are additional advantages available for higher eukaryotic cells. Moreover, an increasing number of inherited genetic disturbances in humans have become elucidated and can serve as new models. Among lower eukaryotes, yeast will remain a standard model because of its peculiarities, including its reduced genome and availability in the haploid form. But do protists still have a future as models? This touches not only the basic understanding of biology but also practical aspects of research, such as fund raising. As we try to scrutinize, due to specific advantages some protozoa should and will remain favorable models for analyzing novel genes or specific aspects of cell structure and function. Outstanding examples are epigenetic phenomena-a field of rising interest. © 2014 Elsevier Inc. All rights reserved.

  17. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes

    DEFF Research Database (Denmark)

    Parker, Brian John; Moltke, Ida; Roth, Adam

    2011-01-01

    a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein...

  18. Solution Structure of Archaeoglobus fulgidis Peptidyl-tRNA Hydrolase(Pth2) Provides Evidence for an Extensive Conserved Family of Pth2 Enzymes in Archaea, Bacteria and Eukaryotes.

    Energy Technology Data Exchange (ETDEWEB)

    Powers, Robert; Mirkovic, Nebojsa; Goldsmith-Fischman, Sharon; Acton, Thomas; Chiang, Yiwen; Huang, Yuanpeng; Ma, LiChung; Rajan, Paranji K.; Cort, John R.; Kennedy, Michael A.; Liu, Jinfeng; Rost, Burkhard; Honig, Barry; Murray, Diana; Montelione, Gaetano

    2005-11-01

    The solution structure of protein AF2095 from the thermophilic archaea Archaeglobus fulgidis, a 123-residue (13.6 kDa) protein, has been determined by NMR methods. The structure of AF2095 is comprised of four a-helices and a mixed b-sheet consisting of four parallel and anti-parallel b-strands, where the a-helices sandwich the b-sheet. Sequence and structural comparison of AF2095 with proteins from Homo sapiens, Methanocaldococcus jannaschii and Sulfolobus solfataricus, reveals that AF2095 is a peptidyl-tRNA hydrolase (Pth2). This structural comparison also identifies putative catalytic residues and a tRNA interaction region for AF2095. The structure of AF2095 is also similar to the structure of protein TA0108 from archaea Thermoplasma acidophilum, which is deposited in the Protein Database but not functionally annotated. The NMR structure of AF2095 has been further leveraged to obtain good quality structural models for 55 other proteins. Although earlier studies have proposed that the Pth2 protein family is restricted to archeal and eukaryotic organisms, the similarity of the AF2095 structure to human Pth2, the conservation of key active-site residues, and the good quality of the resulting homology models demonstrate a large family of homologous Pth2 proteins that are conserved in eukaryotic, archaeal and bacterial organisms, providing novel insights in the evolution of the Pth and Pth2 enzyme families.

  19. Refining the structure and content of clinical genomic reports.

    Science.gov (United States)

    Dorschner, Michael O; Amendola, Laura M; Shirts, Brian H; Kiedrowski, Lesli; Salama, Joseph; Gordon, Adam S; Fullerton, Stephanie M; Tarczy-Hornoch, Peter; Byers, Peter H; Jarvik, Gail P

    2014-03-01

    To effectively articulate the results of exome and genome sequencing we refined the structure and content of molecular test reports. To communicate results of a randomized control trial aimed at the evaluation of exome sequencing for clinical medicine, we developed a structured narrative report. With feedback from genetics and non-genetics professionals, we developed separate indication-specific and incidental findings reports. Standard test report elements were supplemented with research study-specific language, which highlighted the limitations of exome sequencing and provided detailed, structured results, and interpretations. The report format we developed to communicate research results can easily be transformed for clinical use by removal of research-specific statements and disclaimers. The development of clinical reports for exome sequencing has shown that accurate and open communication between the clinician and laboratory is ideally an ongoing process to address the increasing complexity of molecular genetic testing. © 2014 Wiley Periodicals, Inc.

  20. Macromolecular structure determination in the post-genome era

    CERN Document Server

    Kuhn, P

    2001-01-01

    Recent advances in genetics, molecular biology and crystallographic instrumentation and methodology have led to a revolution in the field of Structural Molecular Biology (SMB). These combined advances have paved the way to a more complete and detailed understanding of the biological macromolecules that make up an organism, both in terms of their individual functions and also the interactions between them. In this paper we describe a large-scale, genomic approach to the three-dimensional structure determination of macromolecules and their complexes, using high-throughput methodology to streamline all aspects of the process. This task requires the development of automated high-intensity synchrotron beam lines for X-ray diffraction data collection from single crystal samples. Furthermore, these beam lines must be operated within a sophisticated software and hardware environment, which is capable of delivering a completely automated structure determination pipeline. The SMB resource at SSRL is developing a system...

  1. Eukaryotic DNA Replication Fork.

    Science.gov (United States)

    Burgers, Peter M J; Kunkel, Thomas A

    2017-06-20

    This review focuses on the biogenesis and composition of the eukaryotic DNA replication fork, with an emphasis on the enzymes that synthesize DNA and repair discontinuities on the lagging strand of the replication fork. Physical and genetic methodologies aimed at understanding these processes are discussed. The preponderance of evidence supports a model in which DNA polymerase ε (Pol ε) carries out the bulk of leading strand DNA synthesis at an undisturbed replication fork. DNA polymerases α and δ carry out the initiation of Okazaki fragment synthesis and its elongation and maturation, respectively. This review also discusses alternative proposals, including cellular processes during which alternative forks may be utilized, and new biochemical studies with purified proteins that are aimed at reconstituting leading and lagging strand DNA synthesis separately and as an integrated replication fork.

  2. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  3. Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention.

    Science.gov (United States)

    Johnston, Iain G; Williams, Ben P

    2016-02-24

    Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. The origin of the eukaryotic cell

    Science.gov (United States)

    Hartman, H.

    1984-01-01

    The endosymbiotic hypothesis for the origin of the eukaryotic cell has been applied to the origin of the mitochondria and chloroplasts. However as has been pointed out by Mereschowsky in 1905, it should also be applied to the nucleus as well. If the nucleus, mitochondria and chloroplasts are endosymbionts, then it is likely that the organism that did the engulfing was not a DNA-based organism. In fact, it is useful to postulate that this organism was a primitive RNA-based organism. This hypothesis would explain the preponderance of RNA viruses found in eukaryotic cells. The centriole and basal body do not have a double membrane or DNA. Like all MTOCs (microtubule organising centres), they have a structural or morphic RNA implicated in their formation. This would argue for their origin in the early RNA-based organism rather than in an endosymbiotic event involving bacteria. Finally, the eukaryotic cell uses RNA in ways quite unlike bacteria, thus pointing to a greater emphasis of RNA in both control and structure in the cell. The origin of the eukaryotic cell may tell us why it rather than its prokaryotic relative evolved into the metazoans who are reading this paper.

  5. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  6. Structural constraints in the packaging of bluetongue virus genomic segments.

    Science.gov (United States)

    Burkhardt, Christiane; Sung, Po-Yu; Celma, Cristina C; Roy, Polly

    2014-10-01

    The mechanism used by bluetongue virus (BTV) to ensure the sorting and packaging of its 10 genomic segments is still poorly understood. In this study, we investigated the packaging constraints for two BTV genomic segments from two different serotypes. Segment 4 (S4) of BTV serotype 9 was mutated sequentially and packaging of mutant ssRNAs was investigated by two newly developed RNA packaging assay systems, one in vivo and the other in vitro. Modelling of the mutated ssRNA followed by biochemical data analysis suggested that a conformational motif formed by interaction of the 5' and 3' ends of the molecule was necessary and sufficient for packaging. A similar structural signal was also identified in S8 of BTV serotype 1. Furthermore, the same conformational analysis of secondary structures for positive-sense ssRNAs was used to generate a chimeric segment that maintained the putative packaging motif but contained unrelated internal sequences. This chimeric segment was packaged successfully, confirming that the motif identified directs the correct packaging of the segment. © 2014 The Authors.

  7. Identification of genomic indels and structural variations using split reads

    Directory of Open Access Journals (Sweden)

    Urban Alexander E

    2011-07-01

    Full Text Available Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC, a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read. All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions. A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models. This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions. We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole

  8. Secure web book to store structural genomics research data.

    Science.gov (United States)

    Manjasetty, Babu A; Höppner, Klaus; Mueller, Uwe; Heinemann, Udo

    2003-01-01

    Recently established collaborative structural genomics programs aim at significantly accelerating the crystal structure analysis of proteins. These large-scale projects require efficient data management systems to ensure seamless collaboration between different groups of scientists working towards the same goal. Within the Berlin-based Protein Structure Factory, the synchrotron X-ray data collection and the subsequent crystal structure analysis tasks are located at BESSY, a third-generation synchrotron source. To organize file-based communication and data transfer at the BESSY site of the Protein Structure Factory, we have developed the web-based BCLIMS, the BESSY Crystallography Laboratory Information Management System. BCLIMS is a relational data management system which is powered by MySQL as the database engine and Apache HTTP as the web server. The database interface routines are written in Python programing language. The software is freely available to academic users. Here we describe the storage, retrieval and manipulation of laboratory information, mainly pertaining to the synchrotron X-ray diffraction experiments and the subsequent protein structure analysis, using BCLIMS.

  9. Training set optimization under population structure in genomic selection.

    Science.gov (United States)

    Isidro, Julio; Jannink, Jean-Luc; Akdemir, Deniz; Poland, Jesse; Heslot, Nicolas; Sorrells, Mark E

    2015-01-01

    Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.

  10. Use of the Operon Structure of the C. elegans Genome as a Tool to Identify Functionally Related Proteins

    Directory of Open Access Journals (Sweden)

    Silvia Dossena

    2013-12-01

    Full Text Available One of the most pressing challenges in the post genomic era is the identification and characterization of protein-protein interactions (PPIs, as these are essential in understanding the cellular physiology of health and disease. Experimental techniques suitable for characterizing PPIs (X-ray crystallography or nuclear magnetic resonance spectroscopy, among others are usually laborious, time-consuming and often difficult to apply to membrane proteins, and therefore require accurate prediction of the candidate interacting partners. High-throughput experimental methods (yeast two-hybrid and affinity purification succumb to the same shortcomings, and can also lead to high rates of false positive and negative results. Therefore, reliable tools for predicting PPIs are needed. The use of the operon structure in the eukaryote Caenorhabditis elegans genome is a valuable, though underserved, tool for identifying physically or functionally interacting proteins. Based on the concept that genes organized in the same operon may encode physically or functionally related proteins, this algorithm is easy to be applied and, importantly, gives a limited number of candidate partners of a given protein, allowing for focused experimental verification. Moreover, this approach can be successfully used to predict PPIs in the human system, including those of membrane proteins.

  11. Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

    Science.gov (United States)

    Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk

    Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.

  12. RNA structural constraints in the evolution of the influenza A virus genome NP segment

    NARCIS (Netherlands)

    A.P. Gultyaev (Alexander); A. Tsyganov-Bodounov (Anton); M.I. Spronken (Monique); S. Van Der Kooij (Sander); R.A.M. Fouchier (Ron); R.C.L. Olsthoorn (René)

    2014-01-01

    textabstractConserved RNA secondary structures were predicted in the nucleoprotein (NP) segment of the influenza A virus genome using comparative sequence and structure analysis. A number of structural elements exhibiting nucleotide covariations were identified over the whole segment length,

  13. An algorithm for detecting eukaryotic sequences in metagenomic ...

    Indian Academy of Sciences (India)

    species but also from accidental contamination from the genome of eukaryotic host cells. The latter scenario generally occurs in the case of host-associated metagenomes, e.g. microbes living in human gut. In such cases, one needs to identify and remove contaminating host DNA sequences, since the latter sequences will ...

  14. Uncoupling of Sister Replisomes during Eukaryotic DNA Replication

    NARCIS (Netherlands)

    Yardimci, Hasan; Loveland, Anna B.; Habuchi, Satoshi; van Oijen, Antoine M.; Walter, Johannes C.

    2010-01-01

    The duplication of eukaryotic genomes involves the replication of DNA from multiple origins of replication. In S phase, two sister replisomes assemble at each active origin, and they replicate DNA in opposite directions. Little is known about the functional relationship between sister replisomes.

  15. Macromolecular structure determination in the post-genome era

    International Nuclear Information System (INIS)

    Kuhn, P.; Soltis, S.M.

    2001-01-01

    Recent advances in genetics, molecular biology and crystallographic instrumentation and methodology have led to a revolution in the field of Structural Molecular Biology (SMB). These combined advances have paved the way to a more complete and detailed understanding of the biological macromolecules that make up an organism, both in terms of their individual functions and also the interactions between them. In this paper we describe a large-scale, genomic approach to the three-dimensional structure determination of macromolecules and their complexes, using high-throughput methodology to streamline all aspects of the process. This task requires the development of automated high-intensity synchrotron beam lines for X-ray diffraction data collection from single crystal samples. Furthermore, these beam lines must be operated within a sophisticated software and hardware environment, which is capable of delivering a completely automated structure determination pipeline. The SMB resource at SSRL is developing a system for the structure determination steps of this process, starting with the initial characterization of the frozen sample, followed by data collection, data reduction, phase determination, and model building. This paper focuses on the data collection elements of this high-throughput system

  16. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...

  17. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  18. Insular Celtic population structure and genomic footprints of migration.

    Directory of Open Access Journals (Sweden)

    Ross P Byrne

    2018-01-01

    Full Text Available Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.

  19. Eukaryotic systematics: a user's guide for cell biologists and parasitologists.

    Science.gov (United States)

    Walker, Giselle; Dorrell, Richard G; Schlacht, Alexander; Dacks, Joel B

    2011-11-01

    Single-celled parasites like Entamoeba, Trypanosoma, Phytophthora and Plasmodium wreak untold havoc on human habitat and health. Understanding the position of the various protistan pathogens in the larger context of eukaryotic diversity informs our study of how these parasites operate on a cellular level, as well as how they have evolved. Here, we review the literature that has brought our understanding of eukaryotic relationships from an idea of parasites as primitive cells to a crystallized view of diversity that encompasses 6 major divisions, or supergroups, of eukaryotes. We provide an updated taxonomic scheme (for 2011), based on extensive genomic, ultrastructural and phylogenetic evidence, with three differing levels of taxonomic detail for ease of referencing and accessibility (see supplementary material at Cambridge Journals On-line). Two of the most pressing issues in cellular evolution, the root of the eukaryotic tree and the evolution of photosynthesis in complex algae, are also discussed along with ideas about what the new generation of genome sequencing technologies may contribute to the field of eukaryotic systematics. We hope that, armed with this user's guide, cell biologists and parasitologists will be encouraged about taking an increasingly evolutionary point of view in the battle against parasites representing real dangers to our livelihoods and lives.

  20. On the Diversification of the Translation Apparatus across Eukaryotes

    Directory of Open Access Journals (Sweden)

    Greco Hernández

    2012-01-01

    Full Text Available Diversity is one of the most remarkable features of living organisms. Current assessments of eukaryote biodiversity reaches 1.5 million species, but the true figure could be several times that number. Diversity is ingrained in all stages and echelons of life, namely, the occupancy of ecological niches, behavioral patterns, body plans and organismal complexity, as well as metabolic needs and genetics. In this review, we will discuss that diversity also exists in a key biochemical process, translation, across eukaryotes. Translation is a fundamental process for all forms of life, and the basic components and mechanisms of translation in eukaryotes have been largely established upon the study of traditional, so-called model organisms. By using modern genome-wide, high-throughput technologies, recent studies of many nonmodel eukaryotes have unveiled a surprising diversity in the configuration of the translation apparatus across eukaryotes, showing that this apparatus is far from being evolutionarily static. For some of the components of this machinery, functional differences between different species have also been found. The recent research reviewed in this article highlights the molecular and functional diversification the translational machinery has undergone during eukaryotic evolution. A better understanding of all aspects of organismal diversity is key to a more profound knowledge of life.

  1. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops

    Directory of Open Access Journals (Sweden)

    Gendrault-Jacquemard A

    2005-07-01

    Full Text Available Abstract Background Public databases now contain multitude of complete bacterial genomes, including several genomes of the same species. The available data offers new opportunities to address questions about bacterial genome evolution, a task that requires reliable fine comparison data of closely related genomes. Recent analyses have shown, using pairwise whole genome alignments, that it is possible to segment bacterial genomes into a common conserved backbone and strain-specific sequences called loops. Results Here, we generalize this approach and propose a strategy that allows systematic and non-biased genome segmentation based on multiple genome alignments. Segmentation analyses, as applied to 13 different bacterial species, confirmed the feasibility of our approach to discern the 'mosaic' organization of bacterial genomes. Segmentation results are available through a Web interface permitting functional analysis, extraction and visualization of the backbone/loops structure of documented genomes. To illustrate the potential of this approach, we performed a precise analysis of the mosaic organization of three E. coli strains and functional characterization of the loops. Conclusion The segmentation results including the backbone/loops structure of 13 bacterial species genomes are new and available for use by the scientific community at the URL: http://genome.jouy.inra.fr/mosaic.

  2. A novel rat genomic simple repeat DNA with RNA-homology shows triplex (H-DNA)-like structure and tissue-specific RNA expression

    International Nuclear Information System (INIS)

    Dey, Indranil; Rath, Pramod C.

    2005-01-01

    Mammalian genome contains a wide variety of repetitive DNA sequences of relatively unknown function. We report a novel 227 bp simple repeat DNA (3.3 DNA) with a d {(GA) 7 A (AG) 7 } dinucleotide mirror repeat from the rat (Rattus norvegicus) genome. 3.3 DNA showed 75-85% homology with several eukaryotic mRNAs due to (GA/CU) n dinucleotide repeats by nBlast search and a dispersed distribution in the rat genome by Southern blot hybridization with [ 32 P]3.3 DNA. The d {(GA) 7 A (AG) 7 } mirror repeat formed a triplex (H-DNA)-like structure in vitro. Two large RNAs of 9.1 and 7.5 kb were detected by [ 32 P]3.3 DNA in rat brain by Northern blot hybridization indicating expression of such simple sequence repeats at RNA level in vivo. Further, several cDNAs were isolated from a rat cDNA library by [ 32 P]3.3 DNA probe. Three such cDNAs showed tissue-specific RNA expression in rat. pRT 4.1 cDNA showed strong expression of a 2.39 kb RNA in brain and spleen, pRT 5.5 cDNA showed strong expression of a 2.8 kb RNA in brain and a 3.9 kb RNA in lungs, and pRT 11.4 cDNA showed weak expression of a 2.4 kb RNA in lungs. Thus, genomic simple sequence repeats containing d (GA/CT) n dinucleotides are transcriptionally expressed and regulated in rat tissues. Such d (GA/CT) n dinucleotide repeats may form structural elements (e.g., triplex) which may be sites for functional regulation of genomic coding sequences as well as RNAs. This may be a general function of such transcriptionally active simple sequence repeats widely dispersed in mammalian genome

  3. Massive expansion of the calpain gene family in unicellular eukaryotes

    Directory of Open Access Journals (Sweden)

    Zhao Sen

    2012-09-01

    Full Text Available Abstract Background Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists. Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Results Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. Conclusions The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.

  4. Massive expansion of the calpain gene family in unicellular eukaryotes.

    Science.gov (United States)

    Zhao, Sen; Liang, Zhe; Demko, Viktor; Wilson, Robert; Johansen, Wenche; Olsen, Odd-Arne; Shalchian-Tabrizi, Kamran

    2012-09-29

    Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists). Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.

  5. Enzymes involved in organellar DNA replication in photosynthetic eukaryotes.

    Science.gov (United States)

    Moriyama, Takashi; Sato, Naoki

    2014-01-01

    Plastids and mitochondria possess their own genomes. Although the replication mechanisms of these organellar genomes remain unclear in photosynthetic eukaryotes, several organelle-localized enzymes related to genome replication, including DNA polymerase, DNA primase, DNA helicase, DNA topoisomerase, single-stranded DNA maintenance protein, DNA ligase, primer removal enzyme, and several DNA recombination-related enzymes, have been identified. In the reference Eudicot plant Arabidopsis thaliana, the replication-related enzymes of plastids and mitochondria are similar because many of them are dual targeted to both organelles, whereas in the red alga Cyanidioschyzon merolae, plastids and mitochondria contain different replication machinery components. The enzymes involved in organellar genome replication in green plants and red algae were derived from different origins, including proteobacterial, cyanobacterial, and eukaryotic lineages. In the present review, we summarize the available data for enzymes related to organellar genome replication in green plants and red algae. In addition, based on the type and distribution of replication enzymes in photosynthetic eukaryotes, we discuss the transitional history of replication enzymes in the organelles of plants.

  6. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia.

    Science.gov (United States)

    Morrison, Hilary G; McArthur, Andrew G; Gillin, Frances D; Aley, Stephen B; Adam, Rodney D; Olsen, Gary J; Best, Aaron A; Cande, W Zacheus; Chen, Feng; Cipriano, Michael J; Davids, Barbara J; Dawson, Scott C; Elmendorf, Heidi G; Hehl, Adrian B; Holder, Michael E; Huse, Susan M; Kim, Ulandt U; Lasek-Nesselquist, Erica; Manning, Gerard; Nigam, Anuranjini; Nixon, Julie E J; Palm, Daniel; Passamaneck, Nora E; Prabhu, Anjali; Reich, Claudia I; Reiner, David S; Samuelson, John; Svard, Staffan G; Sogin, Mitchell L

    2007-09-28

    The genome of the eukaryotic protist Giardia lamblia, an important human intestinal parasite, is compact in structure and content, contains few introns or mitochondrial relics, and has simplified machinery for DNA replication, transcription, RNA processing, and most metabolic pathways. Protein kinases comprise the single largest protein class and reflect Giardia's requirement for a complex signal transduction network for coordinating differentiation. Lateral gene transfer from bacterial and archaeal donors has shaped Giardia's genome, and previously unknown gene families, for example, cysteine-rich structural proteins, have been discovered. Unexpectedly, the genome shows little evidence of heterozygosity, supporting recent speculations that this organism is sexual. This genome sequence will not only be valuable for investigating the evolution of eukaryotes, but will also be applied to the search for new therapeutics for this parasite.

  7. Towards New Antifolates Targeting Eukaryotic Opportunistic Infections

    Energy Technology Data Exchange (ETDEWEB)

    Liu, J.; Bolstad, D; Bolstad, E; Wright, D; Anderson, A

    2009-01-01

    Trimethoprim, an antifolate commonly prescribed in combination with sulfamethoxazole, potently inhibits several prokaryotic species of dihydrofolate reductase (DHFR). However, several eukaryotic pathogenic organisms are resistant to trimethoprim, preventing its effective use as a therapeutic for those infections. We have been building a program to reengineer trimethoprim to more potently and selectively inhibit eukaryotic species of DHFR as a viable strategy for new drug discovery targeting several opportunistic pathogens. We have developed a series of compounds that exhibit potent and selective inhibition of DHFR from the parasitic protozoa Cryptosporidium and Toxoplasma as well as the fungus Candida glabrata. A comparison of the structures of DHFR from the fungal species Candida glabrata and Pneumocystis suggests that the compounds may also potently inhibit Pneumocystis DHFR.

  8. Inorganic phosphate uptake in unicellular eukaryotes.

    Science.gov (United States)

    Dick, Claudia F; Dos-Santos, André L A; Meyer-Fernandes, José R

    2014-07-01

    Inorganic phosphate (Pi) is an essential nutrient for all organisms. The route of Pi utilization begins with Pi transport across the plasma membrane. Here, we analyzed the gene sequences and compared the biochemical profiles, including kinetic and modulator parameters, of Pi transporters in unicellular eukaryotes. The objective of this review is to evaluate the recent findings regarding Pi uptake mechanisms in microorganisms, such as the fungi Neurospora crassa and Saccharomyces cerevisiae and the parasite protozoans Trypanosoma cruzi, Trypanosoma rangeli, Leishmania infantum and Plasmodium falciparum. Pi uptake is the key step of Pi homeostasis and in the subsequent signaling event in eukaryotic microorganisms. Biochemical and structural studies are important for clarifying mechanisms of Pi homeostasis, as well as Pi sensor and downstream pathways, and raise possibilities for future studies in this field. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Producing genome structure populations with the dynamic and automated PGS software.

    Science.gov (United States)

    Hua, Nan; Tjong, Harianto; Shin, Hanjun; Gong, Ke; Zhou, Xianghong Jasmine; Alber, Frank

    2018-05-01

    Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin-chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.

  10. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  11. Organizational heterogeneity of vertebrate genomes.

    Science.gov (United States)

    Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

    2012-01-01

    Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.

  12. Organizational heterogeneity of vertebrate genomes.

    Directory of Open Access Journals (Sweden)

    Svetlana Frenkel

    Full Text Available Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.

  13. topIb, a phylogenetic hallmark gene of Thaumarchaeota encodes a functional eukaryote-like topoisomerase IB.

    Science.gov (United States)

    Dahmane, Narimane; Gadelle, Danièle; Delmas, Stéphane; Criscuolo, Alexis; Eberhard, Stephan; Desnoues, Nicole; Collin, Sylvie; Zhang, Hongliang; Pommier, Yves; Forterre, Patrick; Sezonov, Guennadi

    2016-04-07

    Type IB DNA topoisomerases can eliminate torsional stresses produced during replication and transcription. These enzymes are found in all eukaryotes and a short version is present in some bacteria and viruses. Among prokaryotes, the long eukaryotic version is only observed in archaea of the phylum Thaumarchaeota. However, the activities and the roles of these topoisomerases have remained an open question. Here, we demonstrate that all available thaumarchaeal genomes contain a topoisomerase IB gene that defines a monophyletic group closely related to the eukaryotic enzymes. We show that the topIB gene is expressed in the model thaumarchaeon Nitrososphaera viennensis and we purified the recombinant enzyme from the uncultivated thaumarchaeon Candidatus Caldiarchaeum subterraneum. This enzyme is active in vitro at high temperature, making it the first thermophilic topoisomerase IB characterized so far. We have compared this archaeal type IB enzyme to its human mitochondrial and nuclear counterparts. The archaeal enzyme relaxes both negatively and positively supercoiled DNA like the eukaryotic enzymes. However, its pattern of DNA cleavage specificity is different and it is resistant to camptothecins (CPTs) and non-CPT Top1 inhibitors, LMP744 and lamellarin D. This newly described thermostable topoisomerases IB should be a promising new model for evolutionary, mechanistic and structural studies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Functional and evolutionary analysis of alternatively spliced genes is consistent with an early eukaryotic origin of alternative splicing

    Directory of Open Access Journals (Sweden)

    Penny David

    2007-10-01

    Full Text Available Abstract Background Alternative splicing has been reported in various eukaryotic groups including plants, apicomplexans, diatoms, amoebae, animals and fungi. However, whether widespread alternative splicing has evolved independently in the different eukaryotic groups or was inherited from their last common ancestor, and may therefore predate multicellularity, is still unknown. To better understand the origin and evolution of alternative splicing and its usage in diverse organisms, we studied alternative splicing in 12 eukaryotic species, comparing rates of alternative splicing across genes of different functional classes, cellular locations, intron/exon structures and evolutionary origins. Results For each species, we find that genes from most functional categories are alternatively spliced. Ancient genes (shared between animals, fungi and plants show high levels of alternative splicing. Genes with products expressed in the nucleus or plasma membrane are generally more alternatively spliced while those expressed in extracellular location show less alternative splicing. We find a clear correspondence between incidence of alternative splicing and intron number per gene both within and between genomes. In general, we find several similarities in patterns of alternative splicing across these diverse eukaryotes. Conclusion Along with previous studies indicating intron-rich genes with weak intron boundary consensus and complex spliceosomes in ancestral organisms, our results suggest that at least a simple form of alternative splicing may already have been present in the unicellular ancestor of plants, fungi and animals. A role for alternative splicing in the evolution of multicellularity then would largely have arisen by co-opting the preexisting process.

  15. Eukaryotic transcription factors

    DEFF Research Database (Denmark)

    Staby, Lasse; O'Shea, Charlotte; Willemoës, Martin

    2017-01-01

    Gene-specific transcription factors (TFs) are key regulatory components of signaling pathways, controlling, for example, cell growth, development, and stress responses. Their biological functions are determined by their molecular structures, as exemplified by their structured DNA-binding domains...... regions with function-related, short sequence motifs and molecular recognition features with structural propensities. This review focuses on molecular aspects of TFs, which represent paradigms of ID-related features. Through specific examples, we review how the ID-associated flexibility of TFs enables....... It is furthermore emphasized how classic biochemical concepts like allostery, conformational selection, induced fit, and feedback regulation are undergoing a revival with the appreciation of ID. The review also describes the most recent advances based on computational simulations of ID-based interaction mechanisms...

  16. Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots

    DEFF Research Database (Denmark)

    Kato, Yuki; Gorodkin, Jan; Havgaard, Jakob Hull

    2017-01-01

    . Methods: Here we present a fast and efficient method, DotcodeR, for detecting structurally similar RNAs in genomic sequences by comparing their corresponding coarse-grained secondary structure dot plots at string level. This allows us to perform an all-against-all scan of all window pairs from two genomes...... without alignment. Results: Our computational experiments with simulated data and real chromosomes demonstrate that the presented method has good sensitivity. Conclusions: DotcodeR can be useful as a pre-filter in a genomic comparative scan for structured RNAs....

  17. Crystal Structure of the Homing Endonuclease I-CvuI Provides a New Template for Genome Modification

    DEFF Research Database (Denmark)

    Molina, Rafael; Redondo, Pilar; López-Méndez, Blanca

    2015-01-01

    Homing endonucleases recognize and generate a DNA double-strand break, which has been used to promote gene targeting. These enzymes recognize long DNA stretches; they are highly sequence-specific enzymes and display a very low frequency of cleavage even in complete genomes. Although a large number...... of homing endonucleases have been identified, the landscape of possible target sequences is still very limited to cover the complexity of the whole eukaryotic genome. Therefore, the finding and molecular analysis of homing endonucleases identified but not yet characterized may widen the landscape...

  18. RNA Export through the NPC in Eukaryotes.

    Science.gov (United States)

    Okamura, Masumi; Inose, Haruko; Masuda, Seiji

    2015-03-20

    In eukaryotic cells, RNAs are transcribed in the nucleus and exported to the cytoplasm through the nuclear pore complex. The RNA molecules that are exported from the nucleus into the cytoplasm include messenger RNAs (mRNAs), ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), small nuclear RNAs (snRNAs), micro RNAs (miRNAs), and viral mRNAs. Each RNA is transported by a specific nuclear export receptor. It is believed that most of the mRNAs are exported by Nxf1 (Mex67 in yeast), whereas rRNAs, snRNAs, and a certain subset of mRNAs are exported in a Crm1/Xpo1-dependent manner. tRNAs and miRNAs are exported by Xpot and Xpo5. However, multiple export receptors are involved in the export of some RNAs, such as 60S ribosomal subunit. In addition to these export receptors, some adapter proteins are required to export RNAs. The RNA export system of eukaryotic cells is also used by several types of RNA virus that depend on the machineries of the host cell in the nucleus for replication of their genome, therefore this review describes the RNA export system of two representative viruses. We also discuss the NPC anchoring-dependent mRNA export factors that directly recruit specific genes to the NPC.

  19. From structure prediction to genomic screens for novel non-coding RNAs

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.

    2011-01-01

    Abstract: Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction....... This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other....

  20. Genomic evolution of 11 type strains within family Planctomycetaceae.

    Directory of Open Access Journals (Sweden)

    Min Guo

    Full Text Available The species in family Planctomycetaceae are ideal groups for investigating the origin of eukaryotes. Their cells are divided by a lipidic intracytoplasmic membrane and they share a number of eukaryote-like molecular characteristics. However, their genomic structures, potential abilities, and evolutionary status are still unknown. In this study, we searched for common protein families and a core genome/pan genome based on 11 sequenced species in family Planctomycetaceae. Then, we constructed phylogenetic tree based on their 832 common protein families. We also annotated the 11 genomes using the Clusters of Orthologous Groups database. Moreover, we predicted and reconstructed their core/pan metabolic pathways using the KEGG (Kyoto Encyclopedia of Genes and Genomes orthology system. Subsequently, we identified genomic islands (GIs and structural variations (SVs among the five complete genomes and we specifically investigated the integration of two Planctomycetaceae plasmids in all 11 genomes. The results indicate that Planctomycetaceae species share diverse genomic variations and unique genomic characteristics, as well as have huge potential for human applications.

  1. Archaeal “Dark Matter” and the Origin of Eukaryotes

    Science.gov (United States)

    Williams, Tom A.; Embley, T. Martin

    2014-01-01

    Current hypotheses about the history of cellular life are mainly based on analyses of cultivated organisms, but these represent only a small fraction of extant biodiversity. The sequencing of new environmental lineages therefore provides an opportunity to test, revise, or reject existing ideas about the tree of life and the origin of eukaryotes. According to the textbook three domains hypothesis, the eukaryotes emerge as the sister group to a monophyletic Archaea. However, recent analyses incorporating better phylogenetic models and an improved sampling of the archaeal domain have generally supported the competing eocyte hypothesis, in which core genes of eukaryotic cells originated from within the Archaea, with important implications for eukaryogenesis. Given this trend, it was surprising that a recent analysis incorporating new genomes from uncultivated Archaea recovered a strongly supported three domains tree. Here, we show that this result was due in part to the use of a poorly fitting phylogenetic model and also to the inclusion by an automated pipeline of genes of putative bacterial origin rather than nucleocytosolic versions for some of the eukaryotes analyzed. When these issues were resolved, analyses including the new archaeal lineages placed core eukaryotic genes within the Archaea. These results are consistent with a number of recent studies in which improved archaeal sampling and better phylogenetic models agree in supporting the eocyte tree over the three domains hypothesis. PMID:24532674

  2. Conservation and Variability of Meiosis Across the Eukaryotes.

    Science.gov (United States)

    Loidl, Josef

    2016-11-23

    Comparisons among a variety of eukaryotes have revealed considerable variability in the structures and processes involved in their meiosis. Nevertheless, conventional forms of meiosis occur in all major groups of eukaryotes, including early-branching protists. This finding confirms that meiosis originated in the common ancestor of all eukaryotes and suggests that primordial meiosis may have had many characteristics in common with conventional extant meiosis. However, it is possible that the synaptonemal complex and the delicate crossover control related to its presence were later acquisitions. Later still, modifications to meiotic processes occurred within different groups of eukaryotes. Better knowledge on the spectrum of derived and uncommon forms of meiosis will improve our understanding of many still mysterious aspects of the meiotic process and help to explain the evolutionary basis of functional adaptations to the meiotic program.

  3. Radiotaxons and reliability of a genome

    International Nuclear Information System (INIS)

    Korogodin, V.I.

    1982-01-01

    Radiosensitivity of cells (D 0 ) is considered with regard to the structural organization of the genome. The following terms are introduced: ''karyotaxon'', organisms with identical structural organization of the genome, and ''specific genome stability'' K=D 0 C, where C is the quantity of DNA in the cell nucleus; K is the amount of energy (eV) the sorption of which in DNA is necessary and sufficient for one elementary damage to occur. It was shown that Ksub(i)=const. within every karyotaxon ''i''. K 1 =100 eV for viruses, and K 4 =61000 eV for the highest level of genome organization (diploid eukaryotes including man). Potential mechanisms of increasing Ksub(i) with increasing level of genome organization and the role of this factor in evolution are discussed [ru

  4. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  5. Structurally Complex Organization of Repetitive DNAs in the Genome of Cobia (Rachycentron canadum).

    Science.gov (United States)

    Costa, Gideão W W F; Cioffi, Marcelo de B; Bertollo, Luiz A C; Molina, Wagner F

    2015-06-01

    Repetitive DNAs comprise the largest fraction of the eukaryotic genome. They include microsatellites or simple sequence repeats (SSRs), which play an important role in the chromosome differentiation among fishes. Rachycentron canadum is the only representative of the family Rachycentridae. This species has been focused on several multidisciplinary studies in view of its important potential for marine fish farming. In the present study, distinct classes of repetitive DNAs, with emphasis on SSRs, were mapped in the chromosomes of this species to improve the knowledge of its genome organization. Microsatellites exhibited a diversified distribution, both dispersed in euchromatin and clustered in the heterochromatin. The multilocus location of SSRs strengthened the heterochromatin heterogeneity in this species, as suggested by some previous studies. The colocalization of SSRs with retrotransposons and transposons pointed to a close evolutionary relationship between these repetitive sequences. A number of heterochromatic regions highlighted a greater complex organization than previously supposed, harboring a diversity of repetitive elements. In this sense, there was also evidence of colocalization of active genetic regions and different classes of repetitive DNAs in a common heterochromatic region, which offers a potential opportunity for further researches regarding the interaction of these distinct fractions in fish genomes.

  6. The genome of obligately intracellular Ehrlichia canis revealsthemes of complex membrane structure and immune evasion strategies

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.; Ivanova, N.; Francino, P.; Chain, P.; Shin, M.; Malfatti, S.; Larimer, F.; Copeland,A.; Detter, J.C.; Land, M.; Richardson, P.M.; Yu, X.J.; Walker, D.H.; McBride, J.W.; Kyrpides, N.C.

    2005-09-01

    Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).

  7. RPA-1 from Leishmania amazonensis (LaRPA-1) structurally differs from other eukaryote RPA-1 and interacts with telomeric DNA via its N-terminal OB-fold domain.

    Science.gov (United States)

    Pavani, R S; Fernandes, C; Perez, A M; Vasconcelos, E J R; Siqueira-Neto, J L; Fontes, M R; Cano, M I N

    2014-12-20

    Replication protein A-1 (RPA-1) is a single-stranded DNA-binding protein involved in DNA metabolism. We previously demonstrated the interaction between LaRPA-1 and telomeric DNA. Here, we expressed and purified truncated mutants of LaRPA-1 and used circular dichroism measurements and molecular dynamics simulations to demonstrate that the tertiary structure of LaRPA-1 differs from human and yeast RPA-1. LaRPA-1 interacts with telomeric ssDNA via its N-terminal OB-fold domain, whereas RPA from higher eukaryotes show different binding modes to ssDNA. Our results show that LaRPA-1 is evolutionary distinct from other RPA-1 proteins and can potentially be used for targeting trypanosomatid telomeres. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  8. Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

    Science.gov (United States)

    Spanò, M.; Lillo, F.; Miccichè, S.; Mantegna, R. N.

    2008-10-01

    By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups, hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact, we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

  9. Eukaryotic snoRNAs: a paradigm for gene expression flexibility.

    Science.gov (United States)

    Dieci, Giorgio; Preti, Milena; Montanini, Barbara

    2009-08-01

    Small nucleolar RNAs (snoRNAs) are one of the most ancient and numerous families of non-protein-coding RNAs (ncRNAs). The main function of snoRNAs - to guide site-specific rRNA modification - is the same in Archaea and all eukaryotic lineages. In contrast, as revealed by recent genomic and RNomic studies, their genomic organization and expression strategies are the most varied. Seemingly snoRNA coding units have adopted, in the course of evolution, all the possible ways of being transcribed, thus providing a unique paradigm of gene expression flexibility. By focusing on representative fungal, plant and animal genomes, we review here all the documented types of snoRNA gene organization and expression, and we provide a comprehensive account of snoRNA expressional freedom by precisely estimating the frequency, in each genome, of each type of genomic organization. We finally discuss the relevance of snoRNA genomic studies for our general understanding of ncRNA family evolution and expression in eukaryotes.

  10. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

    Science.gov (United States)

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.

  11. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  12. Multiple roles of genome-attached bacteriophage terminal proteins

    International Nuclear Information System (INIS)

    Redrejo-Rodríguez, Modesto; Salas, Margarita

    2014-01-01

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer

  13. Multiple roles of genome-attached bacteriophage terminal proteins

    Energy Technology Data Exchange (ETDEWEB)

    Redrejo-Rodríguez, Modesto; Salas, Margarita, E-mail: msalas@cbm.csic.es

    2014-11-15

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer.

  14. Beyond Agrobacterium-Mediated Transformation: Horizontal Gene Transfer from Bacteria to Eukaryotes.

    Science.gov (United States)

    Lacroix, Benoît; Citovsky, Vitaly

    2018-03-03

    Besides the massive gene transfer from organelles to the nuclear genomes, which occurred during the early evolution of eukaryote lineages, the importance of horizontal gene transfer (HGT) in eukaryotes remains controversial. Yet, increasing amounts of genomic data reveal many cases of bacterium-to-eukaryote HGT that likely represent a significant force in adaptive evolution of eukaryotic species. However, DNA transfer involved in genetic transformation of plants by Agrobacterium species has traditionally been considered as the unique example of natural DNA transfer and integration into eukaryotic genomes. Recent discoveries indicate that the repertoire of donor bacterial species and of recipient eukaryotic hosts potentially are much wider than previously thought, including donor bacterial species, such as plant symbiotic nitrogen-fixing bacteria (e.g., Rhizobium etli) and animal bacterial pathogens (e.g., Bartonella henselae, Helicobacter pylori), and recipient species from virtually all eukaryotic clades. Here, we review the molecular pathways and potential mechanisms of these trans-kingdom HGT events and discuss their utilization in biotechnology and research.

  15. Large-scale trends in the evolution of gene structures within 11 animal genomes.

    Directory of Open Access Journals (Sweden)

    Mark Yandell

    2006-03-01

    Full Text Available We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales--from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for "Comparative Genomics Library". Our results demonstrate that change in intron-exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities.

  16. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome

    NARCIS (Netherlands)

    Collins, Ryan L; Brand, Harrison; Redin, Claire E.; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T.; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A.; Lucente, Diane; Levy, Brynn; Sanders, Jan-Stephan; Wapner, Ronald J.; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E.

    2017-01-01

    Background: Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. Results: We sequenced 689 participants with autism spectrum disorder (ASD) and other

  17. From structure prediction to genomic screens for novel non-coding RNAs.

    Science.gov (United States)

    Gorodkin, Jan; Hofacker, Ivo L

    2011-08-01

    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  18. Structural constraints in the packaging of bluetongue virus genomic segments

    OpenAIRE

    Burkhardt, Christiane; Sung, Po-Yu; Celma, Cristina C.; Roy, Polly

    2014-01-01

    : The mechanism used by bluetongue virus (BTV) to ensure the sorting and packaging of its 10 genomic segments is still poorly understood. In this study, we investigated the packaging constraints for two BTV genomic segments from two different serotypes. Segment 4 (S4) of BTV serotype 9 was mutated sequentially and packaging of mutant ssRNAs was investigated by two newly developed RNA packaging assay systems, one in vivo and the other in vitro. Modelling of the mutated ssRNA followed by bioche...

  19. Leishmania naiffi and Leishmania guyanensis reference genomes highlight genome structure and gene evolution in the Viannia subgenus.

    Science.gov (United States)

    Coughlan, Simone; Taylor, Ali Shirley; Feane, Eoghan; Sanders, Mandy; Schonian, Gabriele; Cotton, James A; Downing, Tim

    2018-04-01

    The unicellular protozoan parasite Leishmania causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the Viannia subgenus predominates, so far only L. ( Viannia ) braziliensis and L. ( V. ) panamensis have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, L. ( V. ) naiffi and L. ( V. ) guyanensis genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for L. braziliensis M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of L. guyanensis and L. naiffi with 10 other Viannia genomes revealed four traits common to all Viannia : aneuploidy, 22 orthologous groups of genes absent in other Leishmania subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the Viannia , there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar L. lainsoni , L. naiffi had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate L. shawi M8408 had a possible minichromosome derived from the 3' end of chromosome 34 . This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse Viannia has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance.

  20. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution.

    Science.gov (United States)

    Rogozin, Igor B; Wolf, Yuri I; Sorokin, Alexander V; Mirkin, Boris G; Koonin, Eugene V

    2003-09-02

    Sequencing of eukaryotic genomes allows one to address major evolutionary problems, such as the evolution of gene structure. We compared the intron positions in 684 orthologous gene sets from 8 complete genomes of animals, plants, fungi, and protists and constructed parsimonious scenarios of evolution of the exon-intron structure for the respective genes. Approximately one-third of the introns in the malaria parasite Plasmodium falciparum are shared with at least one crown group eukaryote; this number indicates that these introns have been conserved through >1.5 billion years of evolution that separate Plasmodium from the crown group. Paradoxically, humans share many more introns with the plant Arabidopsis thaliana than with the fly or nematode. The inferred evolutionary scenario holds that the common ancestor of Plasmodium and the crown group and, especially, the common ancestor of animals, plants, and fungi had numerous introns. Most of these ancestral introns, which are retained in the genomes of vertebrates and plants, have been lost in fungi, nematodes, arthropods, and probably Plasmodium. In addition, numerous introns have been inserted into vertebrate and plant genes, whereas, in other lineages, intron gain was much less prominent.

  1. Lateral gene transfer between prokaryotes and multicellular eukaryotes: ongoing and significant?

    NARCIS (Netherlands)

    Ros, V.I.D.; Hurst, G.D.D.

    2009-01-01

    The expansion of genome sequencing projects has produced accumulating evidence for lateral transfer of genes between prokaryotic and eukaryotic genomes. However, it remains controversial whether these genes are of functional importance in their recipient host. Nikoh and Nakabachi, in a recent paper

  2. Ultrastructural diversity between centrioles of eukaryotes.

    Science.gov (United States)

    Gupta, Akshari; Kitagawa, Daiju

    2018-02-16

    Several decades of centriole research have revealed the beautiful symmetry present in these microtubule-based organelles, which are required to form centrosomes, cilia, and flagella in many eukaryotes. Centriole architecture is largely conserved across most organisms, however, individual centriolar features such as the central cartwheel or microtubule walls exhibit considerable variability when examined with finer resolution. Here, we review the ultrastructural characteristics of centrioles in commonly studied organisms, highlighting the subtle and not-so-subtle differences between specific structural components of these centrioles. Additionally, we survey some non-canonical centriole structures that have been discovered in various species, from the coaxial bicentrioles of protists and lower land plants to the giant irregular centrioles of the fungus gnat Sciara. Finally, we speculate on the functional significance of these differences between centrioles, and the contribution of individual structural elements such as the cartwheel or microtubules towards the stability of centrioles.Centriole structure, cartwheel, triplet microtubules, SAS-6, centrosome.

  3. Structural genomic variation as risk factor for idiopathic recurrent miscarriage

    DEFF Research Database (Denmark)

    Nagirnaja, Liina; Palta, Priit; Kasak, Laura

    2014-01-01

    Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs an...

  4. Structural modelling and phylogenetic analyses of PgeIF4A2 (Eukaryotic translation initiation factor) from Pennisetum glaucum reveal signature motifs with a role in stress tolerance and development.

    Science.gov (United States)

    Agarwal, Aakrati; Mudgil, Yashwanti; Pandey, Saurabh; Fartyal, Dhirendra; Reddy, Malireddy K

    2016-01-01

    Eukaryotic translation initiation factor 4A (eIF4A) is an indispensable component of the translation machinery and also play a role in developmental processes and stress alleviation in plants and animals. Different eIF4A isoforms are present in the cytosol of the cell, namely, eIF4A1, eIF4A2, and eIF4A3 and their expression is tightly regulated in cap-dependent translation. We revealed the structural model of PgeIF4A2 protein using the crystal structure of Homo sapiens eIF4A3 (PDB ID: 2J0S) as template by Modeller 9.12. The resultant PgeIF4A2 model structure was refined by PROCHECK, ProSA, Verify3D and RMSD that showed the model structure is reliable with 77 % amino acid sequence identity with template. Investigation revealed two conserved signatures for ATP-dependent RNA Helicase DEAD-box conserved site (VLDEADEML) and RNA helicase DEAD-box type, Q-motif in sheet-turn-helix and α-helical region respectively. All these conserved motifs are responsible for response during developmental stages and stress tolerance in plants.

  5. Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution.

    Science.gov (United States)

    Yap, Jia-Yee S; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y H; Wilton, Alan; Wilkins, Marc R; Rossetto, Maurizio; Delaney, Sven K

    2015-01-01

    The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.

  6. A genomic overview of the population structure of Salmonella.

    Directory of Open Access Journals (Sweden)

    Nabil-Fareed Alikhan

    2018-04-01

    Full Text Available For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST] corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST, core genome MLST (cgMLST, and whole genome MLST (wgMLST and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs. eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.

  7. A genomic overview of the population structure of Salmonella.

    Science.gov (United States)

    Alikhan, Nabil-Fareed; Zhou, Zhemin; Sergeant, Martin J; Achtman, Mark

    2018-04-01

    For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs) based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST]) corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST), core genome MLST (cgMLST), and whole genome MLST (wgMLST) and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs). eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.

  8. Specificity and evolvability in eukaryotic protein interaction networks.

    Directory of Open Access Journals (Sweden)

    Pedro Beltrao

    2007-02-01

    Full Text Available Progress in uncovering the protein interaction networks of several species has led to questions of what underlying principles might govern their organization. Few studies have tried to determine the impact of protein interaction network evolution on the observed physiological differences between species. Using comparative genomics and structural information, we show here that eukaryotic species have rewired their interactomes at a fast rate of approximately 10(-5 interactions changed per protein pair, per million years of divergence. For Homo sapiens this corresponds to 10(3 interactions changed per million years. Additionally we find that the specificity of binding strongly determines the interaction turnover and that different biological processes show significantly different link dynamics. In particular, human proteins involved in immune response, transport, and establishment of localization show signs of positive selection for change of interactions. Our analysis suggests that a small degree of molecular divergence can give rise to important changes at the network level. We propose that the power law distribution observed in protein interaction networks could be partly explained by the cell's requirement for different degrees of protein binding specificity.

  9. Gonococcal attachment to eukaryotic cells

    International Nuclear Information System (INIS)

    James, J.F.; Lammel, C.J.; Draper, D.L.; Brown, D.A.; Sweet, R.L.; Brooks, G.F.

    1983-01-01

    The attachment of Neisseria gonorrhoeae to eukaryotic cells grown in tissue culture was analyzed by use of light and electron microscopy and by labeling of the bacteria with [ 3 H]- and [ 14 C]adenine. Isogenic piliated and nonpiliated N. gonorrhoeae from opaque and transparent colonies were studied. The results of light microscopy studies showed that the gonococci attached to cells of human origin, including Flow 2000, HeLa 229, and HEp 2. Studies using radiolabeled gonococci gave comparable results. Piliated N. gonorrhoeae usually attached in larger numbers than nonpiliated organisms, and those from opaque colonies attached more often than isogenic variants from transparent colonies. Day-to-day variation in rate of attachment was observed. Scanning electron microscopy studies showed the gonococcal attachment to be specific for microvilli of the host cells. It is concluded that more N. gonorrhoeae from opaque colonies, as compared with isogenic variants from transparent colonies, attach to eukaryotic cells grown in tissue culture

  10. On the Archaeal Origins of Eukaryotes and the Challenges of Inferring Phenotype from Genotype.

    Science.gov (United States)

    Dey, Gautam; Thattai, Mukund; Baum, Buzz

    2016-07-01

    If eukaryotes arose through a merger between archaea and bacteria, what did the first true eukaryotic cell look like? A major step toward an answer came with the discovery of Lokiarchaeum, an archaeon whose genome encodes small GTPases related to those used by eukaryotes to regulate membrane traffic. Although 'Loki' cells have yet to be seen, their existence has prompted the suggestion that the archaeal ancestor of eukaryotes engulfed the future mitochondrion by phagocytosis. We propose instead that the archaeal ancestor was a relatively simple cell, and that eukaryotic cellular organization arose as the result of a gradual transfer of bacterial genes and membranes driven by an ever-closer symbiotic partnership between a bacterium and an archaeon. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  11. Phylogenetic analysis of P5 P-type ATPases, a eukaryotic lineage of secretory pathway pumps

    DEFF Research Database (Denmark)

    Møller, Annette; Asp, Torben; Holm, Preben Bach

    2008-01-01

    prokaryotic genome. Based on a protein alignment we could group the P5 ATPases into two subfamilies, P5A and P5B that, based on the number of negative charges in conserved trans-membrane segment 4, are likely to have different ion specificities. P5A ATPases are present in all eukaryotic genomes sequenced so......Eukaryotes encompass a remarkable variety of organisms and unresolved lineages. Different phylogenetic analyses have lead to conflicting conclusions as to the origin and associations between lineages and species. In this work, we investigated evolutionary relationship of a family of cation pumps...... exclusive for the secretory pathway of eukaryotes by combining the identification of lineage-specific genes with phylogenetic evolution of common genes. Sequences of P5 ATPases, which are regarded to be cation pumps in the endoplasmic reticulum (ER), were identified in all eukaryotic lineages but not in any...

  12. The genomic structure of the DMBT1 gene

    DEFF Research Database (Denmark)

    Mollenhauer, J; Holmskov, U; Wiemann, S

    1999-01-01

    Increasing evidence has accumulated for an involvement of the inactivation of tumour suppressor genes at chromosome 10q in the carcinogenesis of brain tumours, melanomas, and carcinomas of the lung, the prostate, the pancreas, and the endometrium. The gene DMBT1 (Deleted in Malignant Brain Tumours...... 1) is located at chromosome 10q25.3-q26.1, within one of the putative intervals for tumour suppressor genes. DMBT1 is a member of the scavenger-receptor cysteine-rich (SRCR) superfamily and displays homozygous deletions or lack of expression in glioblastoma multiforme, medulloblastoma......, and in gastrointestinal and lung cancers. Based on these properties, DMBT1 has been proposed to be a candidate tumour suppressor gene. We have determined the genomic sequence of DMBT1 to allow analyses of mutations. The gene has at least 54 exons that span a genomic region of about 80 kb. We have identified a putative...

  13. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    Science.gov (United States)

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  14. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    . These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D......Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution...

  15. The SGC beyond structural genomics: redefining the role of 3D structures by coupling genomic stratification with fragment-based discovery.

    Science.gov (United States)

    Bradley, Anthony R; Echalier, Aude; Fairhead, Michael; Strain-Damerell, Claire; Brennan, Paul; Bullock, Alex N; Burgess-Brown, Nicola A; Carpenter, Elisabeth P; Gileadi, Opher; Marsden, Brian D; Lee, Wen Hwa; Yue, Wyatt; Bountra, Chas; von Delft, Frank

    2017-11-08

    The ongoing explosion in genomics data has long since outpaced the capacity of conventional biochemical methodology to verify the large number of hypotheses that emerge from the analysis of such data. In contrast, it is still a gold-standard for early phenotypic validation towards small-molecule drug discovery to use probe molecules (or tool compounds), notwithstanding the difficulty and cost of generating them. Rational structure-based approaches to ligand discovery have long promised the efficiencies needed to close this divergence; in practice, however, this promise remains largely unfulfilled, for a host of well-rehearsed reasons and despite the huge technical advances spearheaded by the structural genomics initiatives of the noughties. Therefore the current, fourth funding phase of the Structural Genomics Consortium (SGC), building on its extensive experience in structural biology of novel targets and design of protein inhibitors, seeks to redefine what it means to do structural biology for drug discovery. We developed the concept of a Target Enabling Package (TEP) that provides, through reagents, assays and data, the missing link between genetic disease linkage and the development of usefully potent compounds. There are multiple prongs to the ambition: rigorously assessing targets' genetic disease linkages through crowdsourcing to a network of collaborating experts; establishing a systematic approach to generate the protocols and data that comprise each target's TEP; developing new, X-ray-based fragment technologies for generating high quality chemical matter quickly and cheaply; and exploiting a stringently open access model to build multidisciplinary partnerships throughout academia and industry. By learning how to scale these approaches, the SGC aims to make structures finally serve genomics, as originally intended, and demonstrate how 3D structures systematically allow new modes of druggability to be discovered for whole classes of targets. © 2017 The

  16. Patterns of intron gain and conservation in eukaryotic genes

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2007-10-01

    Full Text Available Abstract Background: The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions. Results: We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs. Conclusion: We obtained robust estimates of the contribution of parallel gain to the observed

  17. Development of a radiation track structure clustering algorithm for the prediction of DNA DSB yields and radiation induced cell death in Eukaryotic cells.

    Science.gov (United States)

    Douglass, Michael; Bezak, Eva; Penfold, Scott

    2015-04-21

    The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model developed in the current work. This model uses fundamental measurable microscopic quantities such as genome length rather than macroscopic radiobiological quantities such as alpha/beta ratios. This means that the model can be theoretically used under a wide range of conditions with a single set of input parameters once calibrated for a given cell line.

  18. Structure and genome organization of AFV2, a novel archaeal lipothrixvirus with unusual terminal and core structures

    DEFF Research Database (Denmark)

    Häring, Monika; Vestergaard, Gisle Alberg; Brügger, Kim

    2005-01-01

    A novel filamentous virus, AFV2, from the hyperthermophilic archaeal genus Acidianus shows structural similarity to lipothrixviruses but differs from them in its unusual terminal and core structures. The double-stranded DNA genome contains 31,787 bp and carries eight open reading frames homologous...

  19. The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial 'mobilome'.

    Science.gov (United States)

    Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

    2009-11-01

    Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The approximately 108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element

  20. G2S: A web-service for annotating genomic variants on 3D protein structures.

    Science.gov (United States)

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-01-27

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that support programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design conception and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Phylogenetic analysis of ferlin genes reveals ancient eukaryotic origins

    Directory of Open Access Journals (Sweden)

    Lek Monkol

    2010-07-01

    Full Text Available Abstract Background The ferlin gene family possesses a rare and identifying feature consisting of multiple tandem C2 domains and a C-terminal transmembrane domain. Much currently remains unknown about the fundamental function of this gene family, however, mutations in its two most well-characterised members, dysferlin and otoferlin, have been implicated in human disease. The availability of genome sequences from a wide range of species makes it possible to explore the evolution of the ferlin family, providing contextual insight into characteristic features that define the ferlin gene family in its present form in humans. Results Ferlin genes were detected from all species of representative phyla, with two ferlin subgroups partitioned within the ferlin phylogenetic tree based on the presence or absence of a DysF domain. Invertebrates generally possessed two ferlin genes (one with DysF and one without, with six ferlin genes in most vertebrates (three DysF, three non-DysF. Expansion of the ferlin gene family is evident between the divergence of lamprey (jawless vertebrates and shark (cartilaginous fish. Common to almost all ferlins is an N-terminal C2-FerI-C2 sandwich, a FerB motif, and two C-terminal C2 domains (C2E and C2F adjacent to the transmembrane domain. Preservation of these structural elements throughout eukaryotic evolution suggests a fundamental role of these motifs for ferlin function. In contrast, DysF, C2DE, and FerA are optional, giving rise to subtle differences in domain topologies of ferlin genes. Despite conservation of multiple C2 domains in all ferlins, the C-terminal C2 domains (C2E and C2F displayed higher sequence conservation and greater conservation of putative calcium binding residues across paralogs and orthologs. Interestingly, the two most studied non-mammalian ferlins (Fer-1 and Misfire in model organisms C. elegans and D. melanogaster, present as outgroups in the phylogenetic analysis, with results suggesting

  2. Three-dimensional Structure of a Viral Genome-delivery Portal Vertex

    Energy Technology Data Exchange (ETDEWEB)

    A Olia; P Prevelige Jr.; J Johnson; G Cingolani

    2011-12-31

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here, we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25-{angstrom}-resolution structure of the portal-protein core bound to 12 copies of gene product 4 (gp4) reveals a {approx}1.1-MDa assembly formed by 24 proteins. Unexpectedly, a lower-resolution structure of the full-length portal protein unveils the unique topology of the C-terminal domain, which forms a {approx}200-{angstrom}-long {alpha}-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell.

  3. Mapping the pericentric heterochromatin by comparative genomic hybridization analysis and chromosome deletions in Drosophila melanogaster

    OpenAIRE

    He, Bing; Caudy, Amy; Parsons, Lance; Rosebrock, Adam; Pane, Attilio; Raj, Sandeep; Wieschaus, Eric

    2012-01-01

    Heterochromatin represents a significant portion of eukaryotic genomes and has essential structural and regulatory functions. Its molecular organization is largely unknown due to difficulties in sequencing through and assembling repetitive sequences enriched in the heterochromatin. Here we developed a novel strategy using chromosomal rearrangements and embryonic phenotypes to position unmapped Drosophila melanogaster heterochromatic sequence to specific chromosomal regions. By excluding seque...

  4. Structured RNAs in the ENCODE selected regions of the human genome

    DEFF Research Database (Denmark)

    Washietl, Stefan; Pedersen, Jakob Skou; Korbel, Jan O

    2007-01-01

    Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack...... with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz...

  5. Terminal structures of West Nile virus genomic RNA and their interactions with viral NS5 protein

    International Nuclear Information System (INIS)

    Dong Hongping; Zhang Bo; Shi Peiyong

    2008-01-01

    Genome cyclization is essential for flavivirus replication. We used RNases to probe the structures formed by the 5'-terminal 190 nucleotides and the 3'-terminal 111 nucleotides of the West Nile virus (WNV) genomic RNA. When analyzed individually, the two RNAs adopt stem-loop structures as predicted by the thermodynamic-folding program. However, when mixed together, the two RNAs form a duplex that is mediated through base-pairings of two sets of RNA elements (5'CS/3'CSI and 5'UAR/3'UAR). Formation of the RNA duplex facilitates a conformational change that leaves the 3'-terminal nucleotides of the genome (position - 8 to - 16) to be single-stranded. Viral NS5 binds specifically to the 5'-terminal stem-loop (SL1) of the genomic RNA. The 5'SL1 RNA structure is essential for WNV replication. The study has provided further evidence to suggest that flavivirus genome cyclization and NS5/5'SL1 RNA interaction facilitate NS5 binding to the 3' end of the genome for the initiation of viral minus-strand RNA synthesis

  6. Genome-wide identification of structural variants in genes encoding drug targets

    DEFF Research Database (Denmark)

    Rasmussen, Henrik Berg; Dahmcke, Christina Mackeprang

    2012-01-01

    The objective of the present study was to identify structural variants of drug target-encoding genes on a genome-wide scale. We also aimed at identifying drugs that are potentially amenable for individualization of treatments based on knowledge about structural variation in the genes encoding...

  7. Phylogenetic distribution of large-scale genome patchiness

    Directory of Open Access Journals (Sweden)

    Hackenberg Michael

    2008-04-01

    Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.

  8. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  9. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    Science.gov (United States)

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  10. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.

    Science.gov (United States)

    Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M

    2014-01-30

    RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.

  11. Evolution of DNA replication protein complexes in eukaryotes and Archaea.

    Directory of Open Access Journals (Sweden)

    Nicholas Chia

    Full Text Available BACKGROUND: The replication of DNA in Archaea and eukaryotes requires several ancillary complexes, including proliferating cell nuclear antigen (PCNA, replication factor C (RFC, and the minichromosome maintenance (MCM complex. Bacterial DNA replication utilizes comparable proteins, but these are distantly related phylogenetically to their archaeal and eukaryotic counterparts at best. METHODOLOGY/PRINCIPAL FINDINGS: While the structures of each of the complexes do not differ significantly between the archaeal and eukaryotic versions thereof, the evolutionary dynamic in the two cases does. The number of subunits in each complex is constant across all taxa. However, they vary subtly with regard to composition. In some taxa the subunits are all identical in sequence, while in others some are homologous rather than identical. In the case of eukaryotes, there is no phylogenetic variation in the makeup of each complex-all appear to derive from a common eukaryotic ancestor. This is not the case in Archaea, where the relationship between the subunits within each complex varies taxon-to-taxon. We have performed a detailed phylogenetic analysis of these relationships in order to better understand the gene duplications and divergences that gave rise to the homologous subunits in Archaea. CONCLUSION/SIGNIFICANCE: This domain level difference in evolution suggests that different forces have driven the evolution of DNA replication proteins in each of these two domains. In addition, the phylogenies of all three gene families support the distinctiveness of the proposed archaeal phylum Thaumarchaeota.

  12. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  13. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    Energy Technology Data Exchange (ETDEWEB)

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  14. Genomic structural variation contributes to phenotypic change of industrial bioethanol yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Zhang, Ke; Zhang, Li-Jie; Fang, Ya-Hong; Jin, Xin-Na; Qi, Lei; Wu, Xue-Chang; Zheng, Dao-Qiong

    2016-03-01

    Genomic structural variation (GSV) is a ubiquitous phenomenon observed in the genomes of Saccharomyces cerevisiae strains with different genetic backgrounds; however, the physiological and phenotypic effects of GSV are not well understood. Here, we first revealed the genetic characteristics of a widely used industrial S. cerevisiae strain, ZTW1, by whole genome sequencing. ZTW1 was identified as an aneuploidy strain and a large-scale GSV was observed in the ZTW1 genome compared with the genome of a diploid strain YJS329. These GSV events led to copy number variations (CNVs) in many chromosomal segments as well as one whole chromosome in the ZTW1 genome. Changes in the DNA dosage of certain functional genes directly affected their expression levels and the resultant ZTW1 phenotypes. Moreover, CNVs of large chromosomal regions triggered an aneuploidy stress in ZTW1. This stress decreased the proliferation ability and tolerance of ZTW1 to various stresses, while aneuploidy response stress may also provide some benefits to the fermentation performance of the yeast, including increased fermentation rates and decreased byproduct generation. This work reveals genomic characters of the bioethanol S. cerevisiae strain ZTW1 and suggests that GSV is an important kind of mutation that changes the traits of industrial S. cerevisiae strains. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Regulated eukaryotic DNA replication origin firing with purified proteins.

    Science.gov (United States)

    Yeeles, Joseph T P; Deegan, Tom D; Janska, Agnieszka; Early, Anne; Diffley, John F X

    2015-03-26

    Eukaryotic cells initiate DNA replication from multiple origins, which must be tightly regulated to promote precise genome duplication in every cell cycle. To accomplish this, initiation is partitioned into two temporally discrete steps: a double hexameric minichromosome maintenance (MCM) complex is first loaded at replication origins during G1 phase, and then converted to the active CMG (Cdc45-MCM-GINS) helicase during S phase. Here we describe the reconstitution of budding yeast DNA replication initiation with 16 purified replication factors, made from 42 polypeptides. Origin-dependent initiation recapitulates regulation seen in vivo. Cyclin-dependent kinase (CDK) inhibits MCM loading by phosphorylating the origin recognition complex (ORC) and promotes CMG formation by phosphorylating Sld2 and Sld3. Dbf4-dependent kinase (DDK) promotes replication by phosphorylating MCM, and can act either before or after CDK. These experiments define the minimum complement of proteins, protein kinase substrates and co-factors required for regulated eukaryotic DNA replication.

  16. Exploring the role of genome and structural ions in preventing viral capsid collapse during dehydration

    Science.gov (United States)

    Martín-González, Natalia; Guérin Darvas, Sofía M.; Durana, Aritz; Marti, Gerardo A.; Guérin, Diego M. A.; de Pablo, Pedro J.

    2018-03-01

    Even though viruses evolve mainly in liquid milieu, their horizontal transmission routes often include episodes of dry environment. Along their life cycle, some insect viruses, such as viruses from the Dicistroviridae family, withstand dehydrated conditions with presently unknown consequences to their structural stability. Here, we use atomic force microscopy to monitor the structural changes of viral particles of Triatoma virus (TrV) after desiccation. Our results demonstrate that TrV capsids preserve their genome inside, conserving their height after exposure to dehydrating conditions, which is in stark contrast with other viruses that expel their genome when desiccated. Moreover, empty capsids (without genome) resulted in collapsed particles after desiccation. We also explored the role of structural ions in the dehydration process of the virions (capsid containing genome) by chelating the accessible cations from the external solvent milieu. We observed that ion suppression helps to keep the virus height upon desiccation. Our results show that under drying conditions, the genome of TrV prevents the capsid from collapsing during dehydration, while the structural ions are responsible for promoting solvent exchange through the virion wall.

  17. Horizontal transfer of a eukaryotic plastid-targeted protein gene to cyanobacteria

    Directory of Open Access Journals (Sweden)

    Keeling Patrick J

    2007-06-01

    Full Text Available Abstract Background Horizontal or lateral transfer of genetic material between distantly related prokaryotes has been shown to play a major role in the evolution of bacterial and archaeal genomes, but exchange of genes between prokaryotes and eukaryotes is not as well understood. In particular, gene flow from eukaryotes to prokaryotes is rarely documented with strong support, which is unusual since prokaryotic genomes appear to readily accept foreign genes. Results Here, we show that abundant marine cyanobacteria in the related genera Synechococcus and Prochlorococcus acquired a key Calvin cycle/glycolytic enzyme from a eukaryote. Two non-homologous forms of fructose bisphosphate aldolase (FBA are characteristic of eukaryotes and prokaryotes respectively. However, a eukaryotic gene has been inserted immediately upstream of the ancestral prokaryotic gene in several strains (ecotypes of Synechococcus and Prochlorococcus. In one lineage this new gene has replaced the ancestral gene altogether. The eukaryotic gene is most closely related to the plastid-targeted FBA from red algae. This eukaryotic-type FBA once replaced the plastid/cyanobacterial type in photosynthetic eukaryotes, hinting at a possible functional advantage in Calvin cycle reactions. The strains that now possess this eukaryotic FBA are scattered across the tree of Synechococcus and Prochlorococcus, perhaps because the gene has been transferred multiple times among cyanobacteria, or more likely because it has been selectively retained only in certain lineages. Conclusion A gene for plastid-targeted FBA has been transferred from red algae to cyanobacteria, where it has inserted itself beside its non-homologous, functional analogue. Its current distribution in Prochlorococcus and Synechococcus is punctate, suggesting a complex history since its introduction to this group.

  18. Insights into the genome structure and copy-number variation of Eimeria tenella

    Directory of Open Access Journals (Sweden)

    Lim Lik-Sin

    2012-08-01

    method to improve the assembly of the genome of E. tenella from shotgun data, and to help reveal its overall structure. A preliminary assessment of copy-number variation (extra or missing copies of genomic segments between strains of E. tenella was also carried out. The emerging picture is of a very unusual genome architecture displaying inter-strain copy-number variation. We suggest that these features may be related to the known ability of this parasite to rapidly develop drug resistance.

  19. The primary structures of ribosomal proteins S14 and S16 from the archaebacterium Halobacterium marismortui. Comparison with eubacterial and eukaryotic ribosomal proteins.

    Science.gov (United States)

    Kimura, J; Kimura, M

    1987-09-05

    The amino acid sequences of two ribosomal proteins, S14 and S16, from the archaebacterium Halobacterium marismortui have been determined. Sequence data were obtained by the manual and solid-phase sequencing of peptides derived from enzymatic digestions with trypsin, chymotrypsin, pepsin, and Staphylococcus aureus protease as well as by chemical cleavage with cyanogen bromide. Proteins S14 and S16 contain 109 and 126 amino acid residues and have Mr values of 11,964 and 13,515, respectively. Comparison of the sequences with those of ribosomal proteins from other organisms demonstrates that S14 has a significant homology with the rat liver ribosomal protein S11 (36% identity) as well as with the Escherichia coli ribosomal protein S17 (37%), and that S16 is related to the yeast ribosomal protein YS22 (40%) and proteins S8 from E. coli (28%) and Bacillus stearothermophilus (30%). A comparison of the amino acid residues in the homologous regions of halophilic and nonhalophilic ribosomal proteins reveals that halophilic proteins have more glutamic acids, asparatic acids, prolines, and alanines, and less lysines, arginines, and isoleucines than their nonhalophilic counterparts. These amino acid substitutions probably contribute to the structural stability of halophilic ribosomal proteins.

  20. Dynamic Architecture of Eukaryotic DNA Replication Forks In Vivo, Visualized by Electron Microscopy.

    Science.gov (United States)

    Zellweger, Ralph; Lopes, Massimo

    2018-01-01

    The DNA replication process can be heavily perturbed by several different conditions of genotoxic stress, particularly relevant for cancer onset and therapy. The combination of psoralen crosslinking and electron microscopy has proven instrumental to reveal the fine architecture of in vivo DNA replication intermediates and to uncover their remodeling upon specific conditions of genotoxic stress. The replication structures are stabilized in vivo (by psoralen crosslinking) prior to extraction and enrichment procedures, allowing their visualization at the transmission electron microscope. This chapter outlines the procedures required to visualize and interpret in vivo replication intermediates of eukaryotic genomic DNA, and includes an improved method for enrichment of replication intermediates, compared to previously used BND-cellulose columns.

  1. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  2. GenRGenS: Software for Generating Random Genomic Sequences and Structures

    OpenAIRE

    Ponty , Yann; Termier , Michel; Denise , Alain

    2006-01-01

    International audience; GenRGenS is a software tool dedicated to randomly generating genomic sequences and structures. It handles several classes of models useful for sequence analysis, such as Markov chains, hidden Markov models, weighted context-free grammars, regular expressions and PROSITE expressions. GenRGenS is the only program that can handle weighted context-free grammars, thus allowing the user to model and to generate structured objects (such as RNA secondary structures) of any giv...

  3. SL1 revisited: functional analysis of the structure and conformation of HIV-1 genome RNA.

    Science.gov (United States)

    Sakuragi, Sayuri; Yokoyama, Masaru; Shioda, Tatsuo; Sato, Hironori; Sakuragi, Jun-Ichi

    2016-11-11

    The dimer initiation site/dimer linkage sequence (DIS/DLS) region of HIV is located on the 5' end of the viral genome and suggested to form complex secondary/tertiary structures. Within this structure, stem-loop 1 (SL1) is believed to be most important and an essential key to dimerization, since the sequence and predicted secondary structure of SL1 are highly stable and conserved among various virus subtypes. In particular, a six-base palindromic sequence is always present at the hairpin loop of SL1 and the formation of kissing-loop structure at this position between the two strands of genomic RNA is suggested to trigger dimerization. Although the higher-order structure model of SL1 is well accepted and perhaps even undoubted lately, there could be stillroom for consideration to depict the functional SL1 structure while in vivo (in virion or cell). In this study, we performed several analyses to identify the nucleotides and/or basepairing within SL1 which are necessary for HIV-1 genome dimerization, encapsidation, recombination and infectivity. We unexpectedly found that some nucleotides that are believed to contribute the formation of the stem do not impact dimerization or infectivity. On the other hand, we found that one G-C basepair involved in stem formation may serve as an alternative dimer interactive site. We also report on our further investigation of the roles of the palindromic sequences on viral replication. Collectively, we aim to assemble a more-comprehensive functional map of SL1 on the HIV-1 viral life cycle. We discovered several possibilities for a novel structure of SL1 in HIV-1 DLS. The newly proposed structure model suggested that the hairpin loop of SL1 appeared larger, and genome dimerization process might consist of more complicated mechanism than previously understood. Further investigations would be still required to fully understand the genome packaging and dimerization of HIV.

  4. Gene order data from a model amphibian (Ambystoma: new perspectives on vertebrate genome structure and evolution

    Directory of Open Access Journals (Sweden)

    Voss S Randal

    2006-08-01

    Full Text Available Abstract Background Because amphibians arise from a branch of the vertebrate evolutionary tree that is juxtaposed between fishes and amniotes, they provide important comparative perspective for reconstructing character changes that have occurred during vertebrate evolution. Here, we report the first comparative study of vertebrate genome structure that includes a representative amphibian. We used 491 transcribed sequences from a salamander (Ambystoma genetic map and whole genome assemblies for human, mouse, rat, dog, chicken, zebrafish, and the freshwater pufferfish Tetraodon nigroviridis to compare gene orders and rearrangement rates. Results Ambystoma has experienced a rate of genome rearrangement that is substantially lower than mammalian species but similar to that of chicken and fish. Overall, we found greater conservation of genome structure between Ambystoma and tetrapod vertebrates, nevertheless, 57% of Ambystoma-fish orthologs are found in conserved syntenies of four or more genes. Comparisons between Ambystoma and amniotes reveal extensive conservation of segmental homology for 57% of the presumptive Ambystoma-amniote orthologs. Conclusion Our analyses suggest relatively constant interchromosomal rearrangement rates from the euteleost ancestor to the origin of mammals and illustrate the utility of amphibian mapping data in establishing ancestral amniote and tetrapod gene orders. Comparisons between Ambystoma and amniotes reveal some of the key events that have structured the human genome since diversification of the ancestral amniote lineage.

  5. The genomic structure of the human UFO receptor.

    Science.gov (United States)

    Schulz, A S; Schleithoff, L; Faust, M; Bartram, C R; Janssen, J W

    1993-02-01

    Using a DNA transfection-tumorigenicity assay we have recently identified the UFO oncogene. It encodes a tyrosine kinase receptor characterized by the juxtaposition of two immunoglobulin-like and two fibronectin type III repeats in its extracellular domain. Here we describe the genomic organization of the human UFO locus. The UFO receptor is encoded by 20 exons that are distributed over a region of 44 kb. Different isoforms of UFO mRNA are generated by alternative splicing of exon 10 and differential usage of two imperfect polyadenylation sites resulting in the presence or absence of 1.5-kb 3' untranslated sequences. Primer extension and S1 nuclease analyses revealed multiple transcriptional initiation sites including a major site 169 bp upstream of the translation start site. The promoter region is GC rich, lacks TATA and CAAT boxes, but contains potential recognition sites for a variety of trans-acting factors, including Sp1, AP-2 and the cyclic AMP response element-binding protein. Proto-UFO and its oncogenic counterpart exhibit identical cDNA and promoter regions sequences. Possible modes of UFO activation are discussed.

  6. Protein structure similarity clustering (PSSC) and natural product structure as inspiration sources for drug development and chemical genomics

    NARCIS (Netherlands)

    Dekker, Frank J; Koch, Marcus A; Waldmann, Herbert; Dekker, Frans

    Finding small molecules that modulate protein function is of primary importance in drug development and in the emerging field of chemical genomics. To facilitate the identification of such molecules, we developed a novel strategy making use of structural conservatism found in protein domain

  7. Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

    Science.gov (United States)

    Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

    2005-01-01

    Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.

  8. The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions

    Energy Technology Data Exchange (ETDEWEB)

    Merchant, Sabeeha S

    2007-04-09

    Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

  9. In silico ionomics segregates parasitic from free-living eukaryotes.

    Science.gov (United States)

    Greganova, Eva; Steinmann, Michael; Mäser, Pascal; Fankhauser, Niklaus

    2013-01-01

    Ion transporters are fundamental to life. Due to their ancient origin and conservation in sequence, ion transporters are also particularly well suited for comparative genomics of distantly related species. Here, we perform genome-wide ion transporter profiling as a basis for comparative genomics of eukaryotes. From a given predicted proteome, we identify all bona fide ion channels, ion porters, and ion pumps. Concentrating on unicellular eukaryotes (n = 37), we demonstrate that clustering of species according to their repertoire of ion transporters segregates obligate endoparasites (n = 23) on the one hand, from free-living species and facultative parasites (n = 14) on the other hand. This surprising finding indicates strong convergent evolution of the parasites regarding the acquisition and homeostasis of inorganic ions. Random forest classification identifies transporters of ammonia, plus transporters of iron and other transition metals, as the most informative for distinguishing the obligate parasites. Thus, in silico ionomics further underscores the importance of iron in infection biology and suggests access to host sources of nitrogen and transition metals to be selective forces in the evolution of parasitism. This finding is in agreement with the phenomenon of iron withholding as a primordial antimicrobial strategy of infected mammals.

  10. Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Song Jun

    2008-06-01

    Full Text Available Abstract Background Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS plays in gene expression. Results We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS ( Conclusion We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI. MRI is an interdependence between nucleotide choice and base composition over a distance of 20–1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.

  11. Chloroplast genomes of Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea: Structures and comparative analysis.

    Science.gov (United States)

    Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung

    2017-08-08

    We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.

  12. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

    Science.gov (United States)

    Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-07-01

    This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

  13. Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

    Science.gov (United States)

    Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo

    2014-09-13

    Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral

  14. Structural genomic variation in childhood epilepsies with complex phenotypes

    DEFF Research Database (Denmark)

    Helbig, Ingo; Swinkels, Marielle E M; Aten, Emmelien

    2014-01-01

    of CNVs in patients with unclassified epilepsies and complex phenotypes. A total of 222 patients from three European countries, including patients with structural lesions on magnetic resonance imaging (MRI), dysmorphic features, and multiple congenital anomalies, were clinically evaluated and screened.......9%). Segregation of all identified variants could be assessed in 42 patients, 11 of which were de novo. The frequency of all structural variants and de novo variants was not statistically different between patients with or without MRI abnormalities or MRI subcategories. Patients with dysmorphic features were more...

  15. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob Hull

    2014-01-01

    annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo...

  16. Genomes in Turmoil: Frugality Drives Microbial Community Structure in Extremely Acidic Environments

    Science.gov (United States)

    Holmes, D. S.

    2016-12-01

    Extremely acidic environments (To gain insight into these issues, we have conducted deep bioinformatic analyses, including metabolic reconstruction of key assimilatory pathways, phylogenomics and network scrutiny of >160 genomes of acidophiles, including representatives from Archaea, Bacteria and Eukarya and at least ten metagenomes of acidic environments [Cardenas JP, et al. pp 179-197 in Acidophiles, eds R. Quatrini and D. B. Johnson, Caister Academic Press, UK (2016)]. Results yielded valuable insights into cellular processes, including carbon and nitrogen management and energy production, linking biogeochemical processes to organismal physiology. They also provided insight into the evolutionary forces that shape the genomic structure of members of acidophile communities. Niche partitioning can explain diversity patterns in rapidly changing acidic environments such as bioleaching heaps. However, in spatially and temporally homogeneous acidic environments genome flux appears to provide deeper insight into the composition and evolution of acidic consortia. Acidophiles have undergone genome streamlining by gene loss promoting mutual coexistence of species that exploit complementarity use of scarce resources consistent with the Black Queen hypothesis [Morris JJ et al. mBio 3: e00036-12 (2012)]. Acidophiles also have a large pool of accessory genes (the microbial super-genome) that can be accessed by horizontal gene transfer. This further promotes dependency relationships as drivers of community structure and the evolution of keystone species. Acknowledgements: Fondecyt 1130683; Basal CCTE PFB16

  17. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants

    NARCIS (Netherlands)

    Hehir-Kwa, J.Y.; Marschall, T.; Kloosterman, W.P.; Francioli, L.C.; Baaijens, J.A.; Dijkstra, L.J.; Abdellaoui, A.; Koval, V.; Thung, D.T.; Wardenaar, R.; Renkens, I.; Coe, B.P.; Deelen, P.; de Ligt, J.; Lameijer, E.W.; Dijk, F.; Hormozdiari, F.; Uitterlinden, A.G.; van Duijn, C.M.; Eichler, E.E.; Bakker, P.I.W.; Swertz, M.A.; Wijmenga, C.; van Ommen, G.J.B; Slagboom, P.E.; Boomsma, D.I.; Schönhuth, A.; Ye, K.; Guryev, V.

    2016-01-01

    Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic

  18. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.

    Science.gov (United States)

    Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou

    2011-11-01

    Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.

  19. Evidence of ancient genome reduction in red algae (Rhodophyta).

    Science.gov (United States)

    Qiu, Huan; Price, Dana C; Yang, Eun Chan; Yoon, Hwan Su; Bhattacharya, Debashish

    2015-08-01

    Red algae (Rhodophyta) comprise a monophyletic eukaryotic lineage of ~6,500 species with a fossil record that extends back 1.2 billion years. A surprising aspect of red algal evolution is that sequenced genomes encode a relatively limited gene inventory (~5-10 thousand genes) when compared with other free-living algae or to other eukaryotes. This suggests that the common ancestor of red algae may have undergone extensive genome reduction, which can result from lineage specialization to a symbiotic or parasitic lifestyle or adaptation to an extreme or oligotrophic environment. We gathered genome and transcriptome data from a total of 14 red algal genera that represent the major branches of this phylum to study genome evolution in Rhodophyta. Analysis of orthologous gene gains and losses identifies two putative major phases of genome reduction: (i) in the stem lineage leading to all red algae resulting in the loss of major functions such as flagellae and basal bodies, the glycosyl-phosphatidylinositol anchor biosynthesis pathway, and the autophagy regulation pathway; and (ii) in the common ancestor of the extremophilic Cyanidiophytina. Red algal genomes are also characterized by the recruitment of hundreds of bacterial genes through horizontal gene transfer that have taken on multiple functions in shared pathways and have replaced eukaryotic gene homologs. Our results suggest that Rhodophyta may trace their origin to a gene depauperate ancestor. Unlike plants, it appears that a limited gene inventory is sufficient to support the diversification of a major eukaryote lineage that possesses sophisticated multicellular reproductive structures and an elaborate triphasic sexual cycle. © 2015 Phycological Society of America.

  20. Recognition of extremophilic archaeal viruses by eukaryotic cells

    DEFF Research Database (Denmark)

    Uldahl, Kristine Buch; Wu, Linping; Hall, Arnaldur

    2016-01-01

    followed viral uptake, intracellular trafficking and cell viability in human endothelial cells of brain (hCMEC/D3 cells) and umbilical vein (HUVEC) origin. Whereas SMV1 is efficiently internalized into both types of human cells, SSV2 differentiates between HUVECs and hCMEC/D3 cells, thus opening a path......Viruses from the third domain of life, Archaea, exhibit unusual features including extreme stability that allow their survival in harsh environments. In addition, these species have never been reported to integrate into human or any other eukaryotic genomes, and could thus serve for exploration...

  1. Population Structure Analysis of Bull Genomes of European and Western Ancestry

    DEFF Research Database (Denmark)

    Chung, Neo Christopher; Szyda, Joanna; Frąszczak, Magdalena

    2017-01-01

    Since domestication, population bottlenecks, breed formation, and selective breeding have radically shaped the genealogy and genetics of Bos taurus. In turn, characterization of population structure among diverse bull (males of Bos taurus) genomes enables detailed assessment of genetic resources...... and origins. By analyzing 432 unrelated bull genomes from 13 breeds and 16 countries, we demonstrate genetic diversity and structural complexity among the European/Western cattle population. Importantly, we relaxed a strong assumption of discrete or admixed population, by adapting latent variable models...... harboring largest genetic differentiation suggest positive selection underlying population structure. We carried out gene set analysis using SNP annotations to identify enriched functional categories such as energy-related processes and multiple development stages. Our population structure analysis of bull...

  2. INVESTIGATIONS INTO MOLECULAR PATHWAYS IN THE POST GENOME ERA: CROSS SPECIES COMPARATIVE GENOMICS APPROACH

    Science.gov (United States)

    Genome sequencing efforts in the past decade were aimed at generating draft sequences of many prokaryotic and eukaryotic model organisms. Successful completion of unicellular eukaryotes, worm, fly and human genome have opened up the new field of molecular biology and function...

  3. Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria

    OpenAIRE

    Repar, Jelena; Supek, Fran; Klanjscek, Tin; Warnecke, Tobias; Zahradka, Ksenija; Zahradka, Davor

    2017-01-01

    A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compen...

  4. The Drosophila Helicase MLE Targets Hairpin Structures in Genomic Transcripts.

    Directory of Open Access Journals (Sweden)

    Simona Cugusi

    2016-01-01

    Full Text Available RNA hairpins are a common type of secondary structures that play a role in every aspect of RNA biochemistry including RNA editing, mRNA stability, localization and translation of transcripts, and in the activation of the RNA interference (RNAi and microRNA (miRNA pathways. Participation in these functions often requires restructuring the RNA molecules by the association of single-strand (ss RNA-binding proteins or by the action of helicases. The Drosophila MLE helicase has long been identified as a member of the MSL complex responsible for dosage compensation. The complex includes one of two long non-coding RNAs and MLE was shown to remodel the roX RNA hairpin structures in order to initiate assembly of the complex. Here we report that this function of MLE may apply to the hairpins present in the primary RNA transcripts that generate the small molecules responsible for RNA interference. Using stocks from the Transgenic RNAi Project and the Vienna Drosophila Research Center, we show that MLE specifically targets hairpin RNAs at their site of transcription. The association of MLE at these sites is independent of sequence and chromosome location. We use two functional assays to test the biological relevance of this association and determine that MLE participates in the RNAi pathway.

  5. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  6. The complete mitochondrial genome structure of the jaguar (Panthera onca).

    Science.gov (United States)

    Caragiulo, Anthony; Dougherty, Eric; Soto, Sofia; Rabinowitz, Salisa; Amato, George

    2016-01-01

    The jaguar (Panthera onca) is the largest felid in the Western hemisphere, and the only member of the Panthera genus in the New World. The jaguar inhabits most countries within Central and South America, and is considered near threatened by the International Union for the Conservation of Nature. This study represents the first sequence of the entire jaguar mitogenome, which was the only Panthera mitogenome that had not been sequenced. The jaguar mitogenome is 17,049 bases and possesses the same molecular structure as other felid mitogenomes. Bayesian inference (BI) and maximum likelihood (ML) were used to determine the phylogenetic placement of the jaguar within the Panthera genus. Both BI and ML analyses revealed the jaguar to be sister to the tiger/leopard/snow leopard clade.

  7. A Functional, Genome-wide Evaluation of Liposensitive Yeast Identifies the “ARE2 Required for Viability” (ARV1) Gene Product as a Major Component of Eukaryotic Fatty Acid Resistance*

    Science.gov (United States)

    Ruggles, Kelly V.; Garbarino, Jeanne; Liu, Ying; Moon, James; Schneider, Kerry; Henneberry, Annette; Billheimer, Jeff; Millar, John S.; Marchadier, Dawn; Valasek, Mark A.; Joblin-Mills, Aidan; Gulati, Sonia; Munkacsi, Andrew B.; Repa, Joyce J.; Rader, Dan; Sturley, Stephen L.

    2014-01-01

    The toxic subcellular accumulation of lipids predisposes several human metabolic syndromes, including obesity, type 2 diabetes, and some forms of neurodegeneration. To identify pathways that prevent lipid-induced cell death, we performed a genome-wide fatty acid sensitivity screen in Saccharomyces cerevisiae. We identified 167 yeast mutants as sensitive to 0.5 mm palmitoleate, 45% of which define pathways that were conserved in humans. 63 lesions also impacted the status of the lipid droplet; however, this was not correlated to the degree of fatty acid sensitivity. The most liposensitive yeast strain arose due to deletion of the “ARE2 required for viability” (ARV1) gene, encoding an evolutionarily conserved, potential lipid transporter that localizes to the endoplasmic reticulum membrane. Down-regulation of mammalian ARV1 in MIN6 pancreatic β-cells or HEK293 cells resulted in decreased neutral lipid synthesis, increased fatty acid sensitivity, and lipoapoptosis. Conversely, elevated expression of human ARV1 in HEK293 cells or mouse liver significantly increased triglyceride mass and lipid droplet number. The ARV1-induced hepatic triglyceride accumulation was accompanied by up-regulation of DGAT1, a triglyceride synthesis gene, and the fatty acid transporter, CD36. Furthermore, ARV1 was identified as a transcriptional of the protein peroxisome proliferator-activated receptor α (PPARα), a key regulator of lipid homeostasis whose transcriptional targets include DGAT1 and CD36. These results implicate ARV1 as a protective factor in lipotoxic diseases due to modulation of fatty acid metabolism. In conclusion, a lipotoxicity-based genetic screen in a model microorganism has identified 75 human genes that may play key roles in neutral lipid metabolism and disease. PMID:24273168

  8. A functional, genome-wide evaluation of liposensitive yeast identifies the "ARE2 required for viability" (ARV1) gene product as a major component of eukaryotic fatty acid resistance.

    Science.gov (United States)

    Ruggles, Kelly V; Garbarino, Jeanne; Liu, Ying; Moon, James; Schneider, Kerry; Henneberry, Annette; Billheimer, Jeff; Millar, John S; Marchadier, Dawn; Valasek, Mark A; Joblin-Mills, Aidan; Gulati, Sonia; Munkacsi, Andrew B; Repa, Joyce J; Rader, Dan; Sturley, Stephen L

    2014-02-14

    The toxic subcellular accumulation of lipids predisposes several human metabolic syndromes, including obesity, type 2 diabetes, and some forms of neurodegeneration. To identify pathways that prevent lipid-induced cell death, we performed a genome-wide fatty acid sensitivity screen in Saccharomyces cerevisiae. We identified 167 yeast mutants as sensitive to 0.5 mm palmitoleate, 45% of which define pathways that were conserved in humans. 63 lesions also impacted the status of the lipid droplet; however, this was not correlated to the degree of fatty acid sensitivity. The most liposensitive yeast strain arose due to deletion of the "ARE2 required for viability" (ARV1) gene, encoding an evolutionarily conserved, potential lipid transporter that localizes to the endoplasmic reticulum membrane. Down-regulation of mammalian ARV1 in MIN6 pancreatic β-cells or HEK293 cells resulted in decreased neutral lipid synthesis, increased fatty acid sensitivity, and lipoapoptosis. Conversely, elevated expression of human ARV1 in HEK293 cells or mouse liver significantly increased triglyceride mass and lipid droplet number. The ARV1-induced hepatic triglyceride accumulation was accompanied by up-regulation of DGAT1, a triglyceride synthesis gene, and the fatty acid transporter, CD36. Furthermore, ARV1 was identified as a transcriptional of the protein peroxisome proliferator-activated receptor α (PPARα), a key regulator of lipid homeostasis whose transcriptional targets include DGAT1 and CD36. These results implicate ARV1 as a protective factor in lipotoxic diseases due to modulation of fatty acid metabolism. In conclusion, a lipotoxicity-based genetic screen in a model microorganism has identified 75 human genes that may play key roles in neutral lipid metabolism and disease.

  9. Complete Chloroplast Genomes of Papaver rhoeas and Papaver orientale: Molecular Structures, Comparative Analysis, and Phylogenetic Analysis

    Directory of Open Access Journals (Sweden)

    Jianguo Zhou

    2018-02-01

    Full Text Available Papaver rhoeas L. and P. orientale L., which belong to the family Papaveraceae, are used as ornamental and medicinal plants. The chloroplast genome has been used for molecular markers, evolutionary biology, and barcoding identification. In this study, the complete chloroplast genome sequences of P. rhoeas and P. orientale are reported. Results show that the complete chloroplast genomes of P. rhoeas and P. orientale have typical quadripartite structures, which are comprised of circular 152,905 and 152,799-bp-long molecules, respectively. A total of 130 genes were identified in each genome, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence divergence analysis of four species from Papaveraceae indicated that the most divergent regions are found in the non-coding spacers with minimal differences among three Papaver species. These differences include the ycf1 gene and intergenic regions, such as rpoB-trnC, trnD-trnT, petA-psbJ, psbE-petL, and ccsA-ndhD. These regions are hypervariable regions, which can be used as specific DNA barcodes. This finding suggested that the chloroplast genome could be used as a powerful tool to resolve the phylogenetic positions and relationships of Papaveraceae. These results offer valuable information for future research in the identification of Papaver species and will benefit further investigations of these species.

  10. Intermediary metabolism in protists: a sequence-based view of facultative anaerobic metabolism in evolutionarily diverse eukaryotes.

    Science.gov (United States)

    Ginger, Michael L; Fritz-Laylin, Lillian K; Fulton, Chandler; Cande, W Zacheus; Dawson, Scott C

    2010-12-01

    Protists account for the bulk of eukaryotic diversity. Through studies of gene and especially genome sequences the molecular basis for this diversity can be determined. Evident from genome sequencing are examples of versatile metabolism that go far beyond the canonical pathways described for eukaryotes in textbooks. In the last 2-3 years, genome sequencing and transcript profiling has unveiled several examples of heterotrophic and phototrophic protists that are unexpectedly well-equipped for ATP production using a facultative anaerobic metabolism, including some protists that can (Chlamydomonas reinhardtii) or are predicted (Naegleria gruberi, Acanthamoeba castellanii, Amoebidium parasiticum) to produce H(2) in their metabolism. It is possible that some enzymes of anaerobic metabolism were acquired and distributed among eukaryotes by lateral transfer, but it is also likely that the common ancestor of eukaryotes already had far more metabolic versatility than was widely thought a few years ago. The discussion of core energy metabolism in unicellular eukaryotes is the subject of this review. Since genomic sequencing has so far only touched the surface of protist diversity, it is anticipated that sequences of additional protists may reveal an even wider range of metabolic capabilities, while simultaneously enriching our understanding of the early evolution of eukaryotes. Copyright © 2010 Elsevier GmbH. All rights reserved.

  11. Complete plastid genomes from Ophioglossum californicum, Psilotum nudum, and Equisetum hyemale reveal an ancestral land plant genome structure and resolve the position of Equisetales among monilophytes

    Directory of Open Access Journals (Sweden)

    Grewe Felix

    2013-01-01

    Full Text Available Abstract Background Plastid genome structure and content is remarkably conserved in land plants. This widespread conservation has facilitated taxon-rich phylogenetic analyses that have resolved organismal relationships among many land plant groups. However, the relationships among major fern lineages, especially the placement of Equisetales, remain enigmatic. Results In order to understand the evolution of plastid genomes and to establish phylogenetic relationships among ferns, we sequenced the plastid genomes from three early diverging species: Equisetum hyemale (Equisetales, Ophioglossum californicum (Ophioglossales, and Psilotum nudum (Psilotales. A comparison of fern plastid genomes showed that some lineages have retained inverted repeat (IR boundaries originating from the common ancestor of land plants, while other lineages have experienced multiple IR changes including expansions and inversions. Genome content has remained stable throughout ferns, except for a few lineage-specific losses of genes and introns. Notably, the losses of the rps16 gene and the rps12i346 intron are shared among Psilotales, Ophioglossales, and Equisetales, while the gain of a mitochondrial atp1 intron is shared between Marattiales and Polypodiopsida. These genomic structural changes support the placement of Equisetales as sister to Ophioglossales + Psilotales and Marattiales as sister to Polypodiopsida. This result is augmented by some molecular phylogenetic analyses that recover the same relationships, whereas others suggest a relationship between Equisetales and Polypodiopsida. Conclusions Although molecular analyses were inconsistent with respect to the position of Marattiales and Equisetales, several genomic structural changes have for the first time provided a clear placement of these lineages within the ferns. These results further demonstrate the power of using rare genomic structural changes in cases where molecular data fail to provide strong phylogenetic

  12. Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

    Science.gov (United States)

    Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...

  13. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture.

    Science.gov (United States)

    Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N

    2017-11-14

    Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.

  14. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

    DEFF Research Database (Denmark)

    Nielsen, Henrik; Engelbrecht, Jacob; Brunak, Søren

    1997-01-01

    We have developed a new method for the identification of signal peptides and their cleavage based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be applied on genome...

  15. Plastid genome structure and loss of photosynthetic ability in the parasitic genus Cuscuta.

    Science.gov (United States)

    Revill, Meredith J W; Stanley, Susan; Hibberd, Julian M

    2005-09-01

    The genus Cuscuta (dodder) is composed of parasitic plants, some species of which appear to be losing the ability to photosynthesize. A molecular phylogeny was constructed using 15 species of Cuscuta in order to assess whether changes in photosynthetic ability and alterations in structure of the plastid genome relate to phylogenetic position within the genus. The molecular phylogeny provides evidence for four major clades within Cuscuta. Although DNA blot analysis showed that Cuscuta species have smaller plastid genomes than tobacco, and that plastome size varied significantly even within one Cuscuta clade, dot blot analysis indicated that the dodders possess homologous sequence to 101 genes from the tobacco plastome. Evidence is provided for significant rates of DNA transfer from plastid to nucleus in Cuscuta. Size and structure of Cuscuta plastid genomes, as well as photosynthetic ability, appear to vary independently of position within the phylogeny, thus supporting the hypothesis that within Cuscuta photosynthetic ability and organization of the plastid genome are changing in an unco-ordinated manner.

  16. Universal Temporal Profile of Replication Origin Activation in Eukaryotes

    Science.gov (United States)

    Goldar, Arach

    2011-03-01

    The complete and faithful transmission of eukaryotic genome to daughter cells involves the timely duplication of mother cell's DNA. DNA replication starts at multiple chromosomal positions called replication origin. From each activated replication origin two replication forks progress in opposite direction and duplicate the mother cell's DNA. While it is widely accepted that in eukaryotic organisms replication origins are activated in a stochastic manner, little is known on the sources of the observed stochasticity. It is often associated to the population variability to enter S phase. We extract from a growing Saccharomyces cerevisiae population the average rate of origin activation in a single cell by combining single molecule measurements and a numerical deconvolution technique. We show that the temporal profile of the rate of origin activation in a single cell is similar to the one extracted from a replicating cell population. Taking into account this observation we exclude the population variability as the origin of observed stochasticity in origin activation. We confirm that the rate of origin activation increases in the early stage of S phase and decreases at the latter stage. The population average activation rate extracted from single molecule analysis is in prefect accordance with the activation rate extracted from published micro-array data, confirming therefore the homogeneity and genome scale invariance of dynamic of replication process. All these observations point toward a possible role of replication fork to control the rate of origin activation.

  17. A genome wide survey of SNP variation reveals the genetic structure of sheep breeds.

    Directory of Open Access Journals (Sweden)

    James W Kijas

    Full Text Available The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability.

  18. Morphology, genome sequence, and structural proteome of type phage P335 from Lactococcus lactis

    DEFF Research Database (Denmark)

    Labrie, Simon J.; Josephsen, Jytte; Neve, Horst

    2008-01-01

    for a shorter tail and a different collar/whisker structure. Its 33,613-bp double-stranded DNA genome had 50 open reading frames. Putative functions were assigned to 29 of them. Unlike other sequenced genomes from lactococcal phages belonging to this species, P335 did not have a lysogeny module. However, it did...... genome. The genetic diversity of the P335 species indicates that they are exceptional models for studying the modular theory of phage evolution....

  19. Do lipids shape the eukaryotic cell cycle?

    Science.gov (United States)

    Furse, Samuel; Shearman, Gemma C

    2018-01-01

    Successful passage through the cell cycle presents a number of structural challenges to the cell. Inceptive studies carried out in the last five years have produced clear evidence of modulations in the lipid profile (sometimes referred to as the lipidome) of eukaryotes as a function of the cell cycle. This mounting body of evidence indicates that lipids play key roles in the structural transformations seen across the cycle. The accumulation of this evidence coincides with a revolution in our understanding of how lipid composition regulates a plethora of biological processes ranging from protein activity through to cellular signalling and membrane compartmentalisation. In this review, we discuss evidence from biological, chemical and physical studies of the lipid fraction across the cell cycle that demonstrate that lipids are well-developed cellular components at the heart of the biological machinery responsible for managing progress through the cell cycle. Furthermore, we discuss the mechanisms by which this careful control is exercised. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  20. Comparative RNA genomics

    DEFF Research Database (Denmark)

    Backofen, Rolf; Gorodkin, Jan; Hofacker, Ivo L.

    2018-01-01

    Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly...... small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs...... that exert a vastly diverse array of molecule functions. In this chapter we provide a—necessarily incomplete—overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world....

  1. Identification of balanced chromosomal rearrangements previously unknown among participants in the 1000 Genomes Project: implications for interpretation of structural variation in genomes and the future of clinical cytogenetics.

    Science.gov (United States)

    Dong, Zirui; Wang, Huilin; Chen, Haixiao; Jiang, Hui; Yuan, Jianying; Yang, Zhenjun; Wang, Wen-Jing; Xu, Fengping; Guo, Xiaosen; Cao, Ye; Zhu, Zhenzhen; Geng, Chunyu; Cheung, Wan Chee; Kwok, Yvonne K; Yang, Huanming; Leung, Tak Yeung; Morton, Cynthia C; Cheung, Sau Wai; Choy, Kwong Wai

    2017-11-02

    PurposeRecent studies demonstrate that whole-genome sequencing enables detection of cryptic rearrangements in apparently balanced chromosomal rearrangements (also known as balanced chromosomal abnormalities, BCAs) previously identified by conventional cytogenetic methods. We aimed to assess our analytical tool for detecting BCAs in the 1000 Genomes Project without knowing which bands were affected.MethodsThe 1000 Genomes Project provides an unprecedented integrated map of structural variants in phenotypically normal subjects, but there is no information on potential inclusion of subjects with apparent BCAs akin to those traditionally detected in diagnostic cytogenetics laboratories. We applied our analytical tool to 1,166 genomes from the 1000 Genomes Project with sufficient physical coverage (8.25-fold).ResultsWith this approach, we detected four reciprocal balanced translocations and four inversions, ranging in size from 57.9 kb to 13.3 Mb, all of which were confirmed by cytogenetic methods and polymerase chain reaction studies. One of these DNAs has a subtle translocation that is not readily identified by chromosome analysis because of the similarity of the banding patterns and size of exchanged segments, and another results in disruption of all transcripts of an OMIM gene.ConclusionOur study demonstrates the extension of utilizing low-pass whole-genome sequencing for unbiased detection of BCAs including translocations and inversions previously unknown in the 1000 Genomes Project.GENETICS in MEDICINE advance online publication, 2 November 2017; doi:10.1038/gim.2017.170.

  2. Effects of aneuploidy on genome structure, expression, and interphase organization in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Bruno Huettel

    2008-10-01

    Full Text Available Aneuploidy refers to losses and/or gains of individual chromosomes from the normal chromosome set. The resulting gene dosage imbalance has a noticeable affect on the phenotype, as illustrated by aneuploid syndromes, including Down syndrome in humans, and by human solid tumor cells, which are highly aneuploid. Although the phenotypic manifestations of aneuploidy are usually apparent, information about the underlying alterations in structure, expression, and interphase organization of unbalanced chromosome sets is still sparse. Plants generally tolerate aneuploidy better than animals, and, through colchicine treatment and breeding strategies, it is possible to obtain inbred sibling plants with different numbers of chromosomes. This possibility, combined with the genetic and genomics tools available for Arabidopsis thaliana, provides a powerful means to assess systematically the molecular and cytological consequences of aberrant numbers of specific chromosomes. Here, we report on the generation of Arabidopsis plants in which chromosome 5 is present in triplicate. We compare the global transcript profiles of normal diploids and chromosome 5 trisomics, and assess genome integrity using array comparative genome hybridization. We use live cell imaging to determine the interphase 3D arrangement of transgene-encoded fluorescent tags on chromosome 5 in trisomic and triploid plants. The results indicate that trisomy 5 disrupts gene expression throughout the genome and supports the production and/or retention of truncated copies of chromosome 5. Although trisomy 5 does not grossly distort the interphase arrangement of fluorescent-tagged sites on chromosome 5, it may somewhat enhance associations between transgene alleles. Our analysis reveals the complex genomic changes that can occur in aneuploids and underscores the importance of using multiple experimental approaches to investigate how chromosome numerical changes condition abnormal phenotypes and

  3. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    OpenAIRE

    Zuccolo, Andrea; Bowers, John E; Estill, James C; Xiong, Zhiyong; Luo, Meizhong; Sebastian, Aswathy; Goicoechea, Jos? Luis; Collura, Kristi; Yu, Yeisoo; Jiao, Yuannian; Duarte, Jill; Tang, Haibao; Ayyampalayam, Saravanaraj; Rounsley, Steve; Kudrna, Dave

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome....

  4. Improvisation in evolution of genes and genomes: whose structure is it anyway?

    Science.gov (United States)

    Shakhnovich, Boris E; Shakhnovich, Eugene I

    2008-06-01

    Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.

  5. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...... allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  6. High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states.

    Directory of Open Access Journals (Sweden)

    Kevin A Wilkinson

    2008-04-01

    Full Text Available Replication and pathogenesis of the human immunodeficiency virus (HIV is tightly linked to the structure of its RNA genome, but genome structure in infectious virions is poorly understood. We invent high-throughput SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension technology, which uses many of the same tools as DNA sequencing, to quantify RNA backbone flexibility at single-nucleotide resolution and from which robust structural information can be immediately derived. We analyze the structure of HIV-1 genomic RNA in four biologically instructive states, including the authentic viral genome inside native particles. Remarkably, given the large number of plausible local structures, the first 10% of the HIV-1 genome exists in a single, predominant conformation in all four states. We also discover that noncoding regions functioning in a regulatory role have significantly lower (p-value < 0.0001 SHAPE reactivities, and hence more structure, than do viral coding regions that function as the template for protein synthesis. By directly monitoring protein binding inside virions, we identify the RNA recognition motif for the viral nucleocapsid protein. Seven structurally homologous binding sites occur in a well-defined domain in the genome, consistent with a role in directing specific packaging of genomic RNA into nascent virions. In addition, we identify two distinct motifs that are targets for the duplex destabilizing activity of this same protein. The nucleocapsid protein destabilizes local HIV-1 RNA structure in ways likely to facilitate initial movement both of the retroviral reverse transcriptase from its tRNA primer and of the ribosome in coding regions. Each of the three nucleocapsid interaction motifs falls in a specific genome domain, indicating that local protein interactions can be organized by the long-range architecture of an RNA. High-throughput SHAPE reveals a comprehensive view of HIV-1 RNA genome structure, and further

  7. Chromatin Structure in Cell Differentiation, Aging and Cancer

    NARCIS (Netherlands)

    S. Kheradmand Kia (Sima)

    2009-01-01

    textabstractChromatin is the structure that the eukaryotic genome is packaged into, allowing over a metre of DNA to fit into the small volume of the nucleus. It is composed of DNA and proteins, most of which are histones. This DNA-protein complex is the template for a number of essential cell

  8. Evolution of glutamate dehydrogenase genes: evidence for lateral gene transfer within and between prokaryotes and eukaryotes

    Directory of Open Access Journals (Sweden)

    Roger Andrew J

    2003-06-01

    Full Text Available Abstract Background Lateral gene transfer can introduce genes with novel functions into genomes or replace genes with functionally similar orthologs or paralogs. Here we present a study of the occurrence of the latter gene replacement phenomenon in the four gene families encoding different classes of glutamate dehydrogenase (GDH, to evaluate and compare the patterns and rates of lateral gene transfer (LGT in prokaryotes and eukaryotes. Results We extend the taxon sampling of gdh genes with nine new eukaryotic sequences and examine the phylogenetic distribution pattern of the various GDH classes in combination with maximum likelihood phylogenetic analyses. The distribution pattern analyses indicate that LGT has played a significant role in the evolution of the four gdh gene families. Indeed, a number of gene transfer events are identified by phylogenetic analyses, including numerous prokaryotic intra-domain transfers, some prokaryotic inter-domain transfers and several inter-domain transfers between prokaryotes and microbial eukaryotes (protists. Conclusion LGT has apparently affected eukaryotes and prokaryotes to a similar extent within the gdh gene families. In the absence of indications that the evolution of the gdh gene families is radically different from other families, these results suggest that gene transfer might be an important evolutionary mechanism in microbial eukaryote genome evolution.

  9. Diversity of Eukaryotic Translational Initiation Factor eIF4E in Protists.

    Science.gov (United States)

    Jagus, Rosemary; Bachvaroff, Tsvetan R; Joshi, Bhavesh; Place, Allen R

    2012-01-01

    The greatest diversity of eukaryotic species is within the microbial eukaryotes, the protists, with plants and fungi/metazoa representing just two of the estimated seventy five lineages of eukaryotes. Protists are a diverse group characterized by unusual genome features and a wide range of genome sizes from 8.2 Mb in the apicomplexan parasite Babesia bovis to 112,000-220,050 Mb in the dinoflagellate Prorocentrum micans. Protists possess numerous cellular, molecular and biochemical traits not observed in "text-book" model organisms. These features challenge some of the concepts and assumptions about the regulation of gene expression in eukaryotes. Like multicellular eukaryotes, many protists encode multiple eIF4Es, but few functional studies have been undertaken except in parasitic species. An earlier phylogenetic analysis of protist eIF4Es indicated that they cannot be grouped within the three classes that describe eIF4E family members from multicellular organisms. Many more protist sequences are now available from which three clades can be recognized that are distinct from the plant/fungi/metazoan classes. Understanding of the protist eIF4Es will be facilitated as more sequences become available particularly for the under-represented opisthokonts and amoebozoa. Similarly, a better understanding of eIF4Es within each clade will develop as more functional studies of protist eIF4Es are completed.

  10. Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

    Science.gov (United States)

    Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

    2017-04-01

    There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.

  11. Universal Internucleotide Statistics in Full Genomes: A Footprint of the DNA Structure and Packaging?

    OpenAIRE

    Bogachev, Mikhail I.; Kayumov, Airat R.; Bunde, Armin

    2014-01-01

    Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleot...

  12. The MCM Helicase Motor of the Eukaryotic Replisome.

    Science.gov (United States)

    Abid Ali, Ferdos; Costa, Alessandro

    2016-05-08

    The MCM motor of the CMG helicase powers ahead of the eukaryotic replication machinery to unwind DNA, in a process that requires ATP hydrolysis. The reconstitution of DNA replication in vitro has established the succession of events that lead to replication origin activation by the MCM and recent studies have started to elucidate the structural basis of duplex DNA unwinding. Despite the exciting progress, how the MCM translocates on DNA remains a matter of debate. Copyright © 2016. Published by Elsevier Ltd.

  13. Structural analysis of a set of proteins resulting from a bacterial genomics project.

    Science.gov (United States)

    Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R

    2005-09-01

    The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.

  14. Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

    Science.gov (United States)

    Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W

    2016-06-13

    Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.

  15. Arabinogalactan proteins have deep roots in eukaryotes

    DEFF Research Database (Denmark)

    Hervé, Cécile; Siméon, Amandine; Jam, Murielle

    2016-01-01

    Arabinogalactan proteins (AGPs) are highly glycosylated, hydroxyproline-rich proteins found at the cell surface of plants, where they play key roles in developmental processes. Brown algae are marine, multicellular, photosynthetic eukaryotes. They belong to the phylum Stramenopiles, which...

  16. Global MLST of Salmonella Typhi Revisited in Post-Genomic Era: Genetic conservation, Population Structure and Comparative genomics of rare sequence types

    Directory of Open Access Journals (Sweden)

    Kien-Pong eYap

    2016-03-01

    Full Text Available Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus Sequence Typing (MLST is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2 co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC and tviD that may explain the variations that differentiate between seemingly successful (widespread and unsuccessful (poor dissemination S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure.

  17. A structural model of the genome packaging process in a membrane-containing double stranded DNA virus.

    Directory of Open Access Journals (Sweden)

    Chuan Hong

    2014-12-01

    Full Text Available Two crucial steps in the virus life cycle are genome encapsidation to form an infective virion and genome exit to infect the next host cell. In most icosahedral double-stranded (ds DNA viruses, the viral genome enters and exits the capsid through a unique vertex. Internal membrane-containing viruses possess additional complexity as the genome must be translocated through the viral membrane bilayer. Here, we report the structure of the genome packaging complex with a membrane conduit essential for viral genome encapsidation in the tailless icosahedral membrane-containing bacteriophage PRD1. We utilize single particle electron cryo-microscopy (cryo-EM and symmetry-free image reconstruction to determine structures of PRD1 virion, procapsid, and packaging deficient mutant particles. At the unique vertex of PRD1, the packaging complex replaces the regular 5-fold structure and crosses the lipid bilayer. These structures reveal that the packaging ATPase P9 and the packaging efficiency factor P6 form a dodecameric portal complex external to the membrane moiety, surrounded by ten major capsid protein P3 trimers. The viral transmembrane density at the special vertex is assigned to be a hexamer of heterodimer of proteins P20 and P22. The hexamer functions as a membrane conduit for the DNA and as a nucleating site for the unique vertex assembly. Our structures show a conformational alteration in the lipid membrane after the P9 and P6 are recruited to the virion. The P8-genome complex is then packaged into the procapsid through the unique vertex while the genome terminal protein P8 functions as a valve that closes the channel once the genome is inside. Comparing mature virion, procapsid, and mutant particle structures led us to propose an assembly pathway for the genome packaging apparatus in the PRD1 virion.

  18. A structural model of the genome packaging process in a membrane-containing double stranded DNA virus.

    Science.gov (United States)

    Hong, Chuan; Oksanen, Hanna M; Liu, Xiangan; Jakana, Joanita; Bamford, Dennis H; Chiu, Wah

    2014-12-01

    Two crucial steps in the virus life cycle are genome encapsidation to form an infective virion and genome exit to infect the next host cell. In most icosahedral double-stranded (ds) DNA viruses, the viral genome enters and exits the capsid through a unique vertex. Internal membrane-containing viruses possess additional complexity as the genome must be translocated through the viral membrane bilayer. Here, we report the structure of the genome packaging complex with a membrane conduit essential for viral genome encapsidation in the tailless icosahedral membrane-containing bacteriophage PRD1. We utilize single particle electron cryo-microscopy (cryo-EM) and symmetry-free image reconstruction to determine structures of PRD1 virion, procapsid, and packaging deficient mutant particles. At the unique vertex of PRD1, the packaging complex replaces the regular 5-fold structure and crosses the lipid bilayer. These structures reveal that the packaging ATPase P9 and the packaging efficiency factor P6 form a dodecameric portal complex external to the membrane moiety, surrounded by ten major capsid protein P3 trimers. The viral transmembrane density at the special vertex is assigned to be a hexamer of heterodimer of proteins P20 and P22. The hexamer functions as a membrane conduit for the DNA and as a nucleating site for the unique vertex assembly. Our structures show a conformational alteration in the lipid membrane after the P9 and P6 are recruited to the virion. The P8-genome complex is then packaged into the procapsid through the unique vertex while the genome terminal protein P8 functions as a valve that closes the channel once the genome is inside. Comparing mature virion, procapsid, and mutant particle structures led us to propose an assembly pathway for the genome packaging apparatus in the PRD1 virion.

  19. Considerations in the identification of functional RNA structural elements in genomic alignments

    Directory of Open Access Journals (Sweden)

    Blencowe Benjamin J

    2007-01-01

    Full Text Available Abstract Background Accurate identification of novel, functional noncoding (nc RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. Results We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component was

  20. Combining Functional and Structural Genomics to Sample the Essential Burkholderia Structome

    Science.gov (United States)

    Baugh, Loren; Gallagher, Larry A.; Patrapuvich, Rapatbhorn; Clifton, Matthew C.; Gardberg, Anna S.; Edwards, Thomas E.; Armour, Brianna; Begley, Darren W.; Dieterich, Shellie H.; Dranow, David M.; Abendroth, Jan; Fairman, James W.; Fox, David; Staker, Bart L.; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W.; Stacy, Robin; Myler, Peter J.; Stewart, Lance J.; Manoil, Colin; Van Voorhis, Wesley C.

    2013-01-01

    Background The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. Methodology/Principal Findings We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an “ortholog rescue” strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. Conclusions/Significance This collection of structures, solubility and experimental essentiality data

  1. Combining functional and structural genomics to sample the essential Burkholderia structome.

    Directory of Open Access Journals (Sweden)

    Loren Baugh

    Full Text Available The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite.We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq. We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID structure determination pipeline. To maximize structural coverage of these targets, we applied an "ortholog rescue" strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail.This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against

  2. Combining functional and structural genomics to sample the essential Burkholderia structome.

    Science.gov (United States)

    Baugh, Loren; Gallagher, Larry A; Patrapuvich, Rapatbhorn; Clifton, Matthew C; Gardberg, Anna S; Edwards, Thomas E; Armour, Brianna; Begley, Darren W; Dieterich, Shellie H; Dranow, David M; Abendroth, Jan; Fairman, James W; Fox, David; Staker, Bart L; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W; Stacy, Robin; Myler, Peter J; Stewart, Lance J; Manoil, Colin; Van Voorhis, Wesley C

    2013-01-01

    The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an "ortholog rescue" strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases

  3. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    OpenAIRE

    Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V

    2007-01-01

    Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...

  4. Structural genomics: keeping up with expanding knowledge of the protein universe

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2010-01-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space — a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a reassessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006. PMID:17587562

  5. Structural genomics: keeping up with expanding knowledge of the protein universe.

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2007-06-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.

  6. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  7. SV2: accurate structural variation genotyping and de novo mutation detection from whole genomes.

    Science.gov (United States)

    Antaki, Danny; Brandler, William M; Sebat, Jonathan

    2018-05-15

    Structural variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease. Here, we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2). jsebat@ucsd.edu. Supplementary data are available at Bioinformatics online.

  8. Population genomic structure and linkage disequilibrium analysis of South African goat breeds using genome-wide SNP data.

    Science.gov (United States)

    Mdladla, K; Dzomba, E F; Huson, H J; Muchadeyi, F C

    2016-08-01

    The sustainability of goat farming in marginal areas of southern Africa depends on local breeds that are adapted to specific agro-ecological conditions. Unimproved non-descript goats are the main genetic resources used for the development of commercial meat-type breeds of South Africa. Little is known about genetic diversity and the genetics of adaptation of these indigenous goat populations. This study investigated the genetic diversity, population structure and breed relations, linkage disequilibrium, effective population size and persistence of gametic phase in goat populations of South Africa. Three locally developed meat-type breeds of the Boer (n = 33), Savanna (n = 31), Kalahari Red (n = 40), a feral breed of Tankwa (n = 25) and unimproved non-descript village ecotypes (n = 110) from four goat-producing provinces of the Eastern Cape, KwaZulu-Natal, Limpopo and North West were assessed using the Illumina Goat 50K SNP Bead Chip assay. The proportion of SNPs with minor allele frequencies >0.05 ranged from 84.22% in the Tankwa to 97.58% in the Xhosa ecotype, with a mean of 0.32 ± 0.13 across populations. Principal components analysis, admixture and pairwise FST identified Tankwa as a genetically distinct population and supported clustering of the populations according to their historical origins. Genome-wide FST identified 101 markers potentially under positive selection in the Tankwa. Average linkage disequilibrium was highest in the Tankwa (r(2)  = 0.25 ± 0.26) and lowest in the village ecotypes (r(2) range = 0.09 ± 0.12 to 0.11 ± 0.14). We observed an effective population size of 100 kb with the exception of those in Savanna and Tswana populations. This study highlights the high level of genetic diversity in South African indigenous goats as well as the utility of the genome-wide SNP marker panels in genetic studies of these populations. © 2016 Stichting International Foundation for Animal Genetics.

  9. Comparative radiobiology of genetic loci of eukaryots as the basis of the general theory of mutations

    International Nuclear Information System (INIS)

    Aleksandrov, I.D.

    1983-01-01

    One of the fundamental problems of modern molecular cellular radiobiology is to reveal general and peculiar processes of the formation of gene mutations and chromosome aberrations in each stage of their formation in the irradiated genome of the higher eukaryots. The solution of the problems depends on the development of research within the framework of comparative radiobiology of genetic loci of the higher eukaryots that makes it possible to study quantitative regularities in the formation of gene (point) mutations and chromosome aberrations in one object and in the same experiment

  10. Once in a lifetime: strategies for preventing re-replication in prokaryotic and eukaryotic cells

    DEFF Research Database (Denmark)

    Nielsen, Olaf; Løbner-Olesen, Anders

    2008-01-01

    DNA replication is an extremely accurate process and cells have evolved intricate control mechanisms to ensure that each region of their genome is replicated only once during S phase. Here, we compare what is known about the processes that prevent re-replication in prokaryotic and eukaryotic cells...... prokaryotes and eukaryotes are inactivated until the next cell cycle. Furthermore, in both systems the beta-clamp of the replicative polymerase associates with enzymatic activities that contribute to the inactivation of the helicase loaders. Finally, recent studies suggest that the control mechanism...

  11. Population Genomics of Paramecium Species.

    Science.gov (United States)

    Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael

    2017-05-01

    Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Supplementary Material for: Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody; Coll, Francesc; McNerney, Ruth; Ascher, David; Pires, Douglas; Furnham, Nick; Coeck, Nele; Hill-Cawthorne, Grant; Nair, Mridul; Mallard, Kim; Ramsay, Andrew; Campino, Susana; Hibberd, Martin; Pain, Arnab; Rigouts, Leen; Clark, Taane

    2016-01-01

    Abstract Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure

  13. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  14. Gene Transfer in Eukaryotic Cells Using Activated Dendrimers

    Science.gov (United States)

    Dennig, Jörg

    Gene transfer into eukaryotic cells plays an important role in cell biology. Over the last 30 years a number of transfection methods have been developed to mediate gene transfer into eukaryotic cells. Classical methods include co-precipitation of DNA with calcium phosphate, charge-dependent precipitation of DNA with DEAE-dextran, electroporation of nucleic acids, and formation of transfection complexes between DNA and cationic liposomes. Gene transfer technologies based on activated PAMAM-dendrimers provide another class of transfection reagents. PAMAM-dendrimers are highly branched, spherical molecules. Activation of newly synthesized dendrimers involves hydrolytic removal of some of the branches, and results in a molecule with a higher degree of flexibility. Activated dendrimers assemble DNA into compact structures via charge interactions. Activated dendrimer - DNA complexes bind to the cell membrane of eukaryotic cells, and are transported into the cell by non-specific endocytosis. A structural model of the activated dendrimer - DNA complex and a potential mechanism for its uptake into cells will be discussed.

  15. SINEs as driving forces in genome evolution.

    Science.gov (United States)

    Schmitz, J

    2012-01-01

    SINEs are short interspersed elements derived from cellular RNAs that repetitively retropose via RNA intermediates and integrate more or less randomly back into the genome. SINEs propagate almost entirely vertically within their host cells and, once established in the germline, are passed on from generation to generation. As non-autonomous elements, their reverse transcription (from RNA to cDNA) and genomic integration depends on the activity of the enzymatic machinery of autonomous retrotransposons, such as long interspersed elements (LINEs). SINEs are widely distributed in eukaryotes, but are especially effectively propagated in mammalian species. For example, more than a million Alu-SINE copies populate the human genome (approximately 13% of genomic space), and few master copies of them are still active. In the organisms where they occur, SINEs are a challenge to genomic integrity, but in the long term also can serve as beneficial building blocks for evolution, contributing to phenotypic heterogeneity and modifying gene regulatory networks. They substantially expand the genomic space and introduce structural variation to the genome. SINEs have the potential to mutate genes, to alter gene expression, and to generate new parts of genes. A balanced distribution and controlled activity of such properties is crucial to maintaining the organism's dynamic and thriving evolution. Copyright © 2012 S. Karger AG, Basel.

  16. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  17. Structure and mechanism of the ATPase that powers viral genome packaging.

    Science.gov (United States)

    Hilbert, Brendan J; Hayes, Janelle A; Stone, Nicholas P; Duffy, Caroline M; Sankaran, Banumathi; Kelch, Brian A

    2015-07-21

    Many viruses package their genomes into procapsids using an ATPase machine that is among the most powerful known biological motors. However, how this motor couples ATP hydrolysis to DNA translocation is still unknown. Here, we introduce a model system with unique properties for studying motor structure and mechanism. We describe crystal structures of the packaging motor ATPase domain that exhibit nucleotide-dependent conformational changes involving a large rotation of an entire subdomain. We also identify the arginine finger residue that catalyzes ATP hydrolysis in a neighboring motor subunit, illustrating that previous models for motor structure need revision. Our findings allow us to derive a structural model for the motor ring, which we validate using small-angle X-ray scattering and comparisons with previously published data. We illustrate the model's predictive power by identifying the motor's DNA-binding and assembly motifs. Finally, we integrate our results to propose a mechanistic model for DNA translocation by this molecular machine.

  18. Repair of DNA DSB in higher eukaryotes

    International Nuclear Information System (INIS)

    Wang, H.; Perrault, A.R.; Takeda, Y.; Iliakis, G.

    2003-01-01

    Cells of higher eukaryotes process within minutes double strand breaks (DSBs) in their genome using a NHEJ apparatus that engages DNA-PKcs, Ku, DNA ligase IV, XRCC4, and other as of yet unidentified factors. Although chemical inhibition, or mutation, in any of these factors delays processing, cells ultimately remove the majority of DNA DSBs using an alternative pathway operating with slower kinetics. This alternative pathway is active in mutants deficient in genes of the RAD52 epistasis group. We proposed, therefore, that it reflects an alternative form of NHEJ that operates as a backup (B-NHEJ) to the DNA-PK- dependent (D-NHEJ) pathway, rather than homology directed repair of DSBs. We studied the role of Ku and DNA-PKcs in the coordination of these pathways using as a model end joining of restriction endonuclease linearized plasmid DNA in whole cell extracts. Efficient error-free endjoining observed in such in-vitro reactions is strongly inhibited by anti-Ku antibodies. The inhibition requires DNA-PKcs, despite that fact that Ku efficiently binds DNA ends in the presence of antibodies, or in the absence of DNA-PKcs. Strong inhibition of DNA endjoining is also mediated by wortmannin, an inhibitor of DNA-PKcs, in the presence but not in the absence of Ku, and this inhibition can be rescued by pre-incubating the reaction with double stranded oligonucleotides. The results are compatible with a role of Ku in directing endjoining to a DNA-PK dependent pathway, mediated by efficient end binding and productive interactions with DNA-PKcs. On the other hand, efficient end joining is observed in extracts of cells lacking DNA-PKcs, as well as in Ku-depleted extracts sugggesting the operation of alternative pathways. Extracts depleted of Ku and DNA-PKcs rejoin blunt ends, as well as homologous ends with 3' or 5' protruding single strands with similar efficiency, but addition of Ku suppresses joining of blunt ends and homologous ends with 3' overhangs. We propose that the

  19. The interplay between genome and network evolution in eukaryotes

    NARCIS (Netherlands)

    Fokkens, L.

    2013-01-01

    We assume that the functional relations that a protein engages in, influences it’s evolutionary dynamics. An extreme case is a functional module in which all components are strongly dependent on each other, for example a protein complex or metabolic pathway. If a protein’s function is only relevant

  20. Phylogenetic analysis of the core histone doublet and DNA topo II genes of Marseilleviridae: evidence of proto-eukaryotic provenance.

    Science.gov (United States)

    Erives, Albert J

    2017-11-28

    While the genomes of eukaryotes and Archaea both encode the histone-fold domain, only eukaryotes encode the core histone paralogs H2A, H2B, H3, and H4. With DNA, these core histones assemble into the nucleosomal octamer underlying eukaryotic chromatin. Importantly, core histones for H2A and H3 are maintained as neofunctionalized paralogs adapted for general bulk chromatin (canonical H2 and H3) or specialized chromatin (H2A.Z enriched at gene promoters and cenH3s enriched at centromeres). In this context, the identification of core histone-like "doublets" in the cytoplasmic replication factories of the Marseilleviridae (MV) is a novel finding with possible relevance to understanding the origin of eukaryotic chromatin. Here, we analyze and compare the core histone doublet genes from all known MV genomes as well as other MV genes relevant to the origin of the eukaryotic replisome. Using different phylogenetic approaches, we show that MV histone domains encode obligate H2B-H2A and H4-H3 dimers of possible proto-eukaryotic origin. MV core histone moieties form sister clades to each of the four eukaryotic clades of canonical and variant core histones. This suggests that MV core histone moieties diverged prior to eukaryotic neofunctionalizations associated with paired linear chromosomes and variant histone octamer assembly. We also show that MV genomes encode a proto-eukaryotic DNA topoisomerase II enzyme that forms a sister clade to eukaryotes. This is a relevant finding given that DNA topo II influences histone deposition and chromatin compaction and is the second most abundant nuclear protein after histones. The combined domain architecture and phylogenomic analyses presented here suggest that a primitive origin for MV histone genes is a more parsimonious explanation than horizontal gene transfers + gene fusions + sufficient divergence to eliminate relatedness to eukaryotic neofunctionalizations within the H2A and H3 clades without loss of relatedness to each of

  1. Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria.

    Science.gov (United States)

    Repar, Jelena; Supek, Fran; Klanjscek, Tin; Warnecke, Tobias; Zahradka, Ksenija; Zahradka, Davor

    2017-04-01

    A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compensate for the genome-destabilizing effect of environmental DNA damage and may be expected to result in a more conserved gene order in radiation-resistant species. However, here we show that rates of genome rearrangements, measured as loss of gene order conservation with time, are higher in radiation-resistant species in multiple, phylogenetically independent groups of bacteria. Comparison of indicators of selection for genome organization between radiation-resistant and phylogenetically matched, nonresistant species argues against tolerance to disruption of genome structure as a strategy for radiation resistance. Interestingly, an important mechanism affecting genome rearrangements in prokaryotes, the symmetrical inversions around the origin of DNA replication, shapes genome structure of both radiation-resistant and nonresistant species. In conclusion, the opposing effects of environmental DNA damage and DNA repair result in elevated rates of genome rearrangements in radiation-resistant bacteria. Copyright © 2017 Repar et al.

  2. Whole genome PCR scanning reveals the syntenic genome structure of toxigenic Vibrio cholerae strains in the O1/O139 population.

    Directory of Open Access Journals (Sweden)

    Bo Pang

    Full Text Available Vibrio cholerae is commonly found in estuarine water systems. Toxigenic O1 and O139 V. cholerae strains have caused cholera epidemics and pandemics, whereas the nontoxigenic strains within these serogroups only occasionally lead to disease. To understand the differences in the genome and clonality between the toxigenic and nontoxigenic strains of V. cholerae serogroups O1 and O139, we employed a whole genome PCR scanning (WGPScanning method, an rrn operon-mediated fragment rearrangement analysis and comparative genomic hybridization (CGH to analyze the genome structure of different strains. WGPScanning in conjunction with CGH revealed that the genomic contents of the toxigenic strains were conservative, except for a few indels located mainly in mobile elements. Minor nucleotide variation in orthologous genes appeared to be the major difference between the toxigenic strains. rrn operon-mediated rearrangements were infrequent in El Tor toxigenic strains tested using I-CeuI digested pulsed-field gel electrophoresis (PFGE analysis and PCR analysis based on flanking sequence of rrn operons. Using these methods, we found that the genomic structures of toxigenic El Tor and O139 strains were syntenic. The nontoxigenic strains exhibited more extensive sequence variations, but toxin coregulated pilus positive (TCP+ strains had a similar structure. TCP+ nontoxigenic strains could be subdivided into multiple lineages according to the TCP type, suggesting the existence of complex intermediates in the evolution of toxigenic strains. The data indicate that toxigenic O1 El Tor and O139 strains were derived from a single lineage of intermediates from complex clones in the environment. The nontoxigenic strains with non-El Tor type TCP may yet evolve into new epidemic clones after attaining toxigenic attributes.

  3. Population Structure and Genomic Breed Composition in an Angus-Brahman Crossbred Cattle Population.

    Science.gov (United States)

    Gobena, Mesfin; Elzo, Mauricio A; Mateescu, Raluca G

    2018-01-01

    Crossbreeding is a common strategy used in tropical and subtropical regions to enhance beef production, and having accurate knowledge of breed composition is essential for the success of a crossbreeding program. Although pedigree records have been traditionally used to obtain the breed composition of crossbred cattle, the accuracy of pedigree-based breed composition can be reduced by inaccurate and/or incomplete records and Mendelian sampling. Breed composition estimation from genomic data has multiple advantages including higher accuracy without being affected by missing, incomplete, or inaccurate records and the ability to be used as independent authentication of breed in breed-labeled beef products. The present study was conducted with 676 Angus-Brahman crossbred cattle with genotype and pedigree information to evaluate the feasibility and accuracy of using genomic data to determine breed composition. We used genomic data in parametric and non-parametric methods to detect population structure due to differences in breed composition while accounting for the confounding effect of close familial relationships. By applying principal component analysis (PCA) and the maximum likelihood method of ADMIXTURE to genomic data, it was possible to successfully characterize population structure resulting from heterogeneous breed ancestry, while accounting for close familial relationships. PCA results offered additional insight into the different hierarchies of genetic variation structuring. The first principal component was strongly correlated with Angus-Brahman proportions, and the second represented variation within animals that have a relatively more extended Brangus lineage-indicating the presence of a distinct pattern of genetic variation in these cattle. Although there was strong agreement between breed proportions estimated from pedigree and genetic information, there were significant discrepancies between these two methods for certain animals. This was most likely due

  4. Population Structure and Genomic Breed Composition in an Angus–Brahman Crossbred Cattle Population

    Directory of Open Access Journals (Sweden)

    Mesfin Gobena

    2018-03-01

    Full Text Available Crossbreeding is a common strategy used in tropical and subtropical regions to enhance beef production, and having accurate knowledge of breed composition is essential for the success of a crossbreeding program. Although pedigree records have been traditionally used to obtain the breed composition of crossbred cattle, the accuracy of pedigree-based breed composition can be reduced by inaccurate and/or incomplete records and Mendelian sampling. Breed composition estimation from genomic data has multiple advantages including higher accuracy without being affected by missing, incomplete, or inaccurate records and the ability to be used as independent authentication of breed in breed-labeled beef products. The present study was conducted with 676 Angus–Brahman crossbred cattle with genotype and pedigree information to evaluate the feasibility and accuracy of using genomic data to determine breed composition. We used genomic data in parametric and non-parametric methods to detect population structure due to differences in breed composition while accounting for the confounding effect of close familial relationships. By applying principal component analysis (PCA and the maximum likelihood method of ADMIXTURE to genomic data, it was possible to successfully characterize population structure resulting from heterogeneous breed ancestry, while accounting for close familial relationships. PCA results offered additional insight into the different hierarchies of genetic variation structuring. The first principal component was strongly correlated with Angus–Brahman proportions, and the second represented variation within animals that have a relatively more extended Brangus lineage—indicating the presence of a distinct pattern of genetic variation in these cattle. Although there was strong agreement between breed proportions estimated from pedigree and genetic information, there were significant discrepancies between these two methods for certain animals

  5. Insights into the Initiation of Eukaryotic DNA Replication.

    Science.gov (United States)

    Bruck, Irina; Perez-Arnaiz, Patricia; Colbert, Max K; Kaplan, Daniel L

    2015-01-01

    The initiation of DNA replication is a highly regulated event in eukaryotic cells to ensure that the entire genome is copied once and only once during S phase. The primary target of cellular regulation of eukaryotic DNA replication initiation is the assembly and activation of the replication fork helicase, the 11-subunit assembly that unwinds DNA at a replication fork. The replication fork helicase, called CMG for Cdc45-Mcm2-7, and GINS, assembles in S phase from the constituent Cdc45, Mcm2-7, and GINS proteins. The assembly and activation of the CMG replication fork helicase during S phase is governed by 2 S-phase specific kinases, CDK and DDK. CDK stimulates the interaction between Sld2, Sld3, and Dpb11, 3 initiation factors that are each required for the initiation of DNA replication. DDK, on the other hand, phosphorylates the Mcm2, Mcm4, and Mcm6 subunits of the Mcm2-7 complex. Sld3 recruits Cdc45 to Mcm2-7 in a manner that depends on DDK, and recent work suggests that Sld3 binds directly to Mcm2-7 and also to single-stranded DNA. Furthermore, recent work demonstrates that Sld3 and its human homolog Treslin substantially stimulate DDK phosphorylation of Mcm2. These data suggest that the initiation factor Sld3/Treslin coordinates the assembly and activation of the eukaryotic replication fork helicase by recruiting Cdc45 to Mcm2-7, stimulating DDK phosphorylation of Mcm2, and binding directly to single-stranded DNA as the origin is melted.

  6. A second pathway to degrade pyrimidine nucleic acid precursors in eukaryotes

    DEFF Research Database (Denmark)

    Andersen, Gorm; Bjornberg, Olof; Polakova, Silvia

    2008-01-01

    Pyrimidine bases are the central precursors for RNA and DNA, and their intracellular pools are determined by de novo, salvage and catabolic pathways. In eukaryotes, degradation of uracil has been believed to proceed only via the reduction to dihydrouracil. Using a yeast model, Saccharomyces kluyv...... of the eukaryotic or prokaryotic genes involved in pyrimidine degradation described to date.......Pyrimidine bases are the central precursors for RNA and DNA, and their intracellular pools are determined by de novo, salvage and catabolic pathways. In eukaryotes, degradation of uracil has been believed to proceed only via the reduction to dihydrouracil. Using a yeast model, Saccharomyces......, respectively. The gene products of URC1 and URC4 are highly conserved proteins with so far unknown functions and they are present in a variety of prokaryotes and fungi. In bacteria and in some fungi, URC1 and URC4 are linked on the genome together with the gene for uracil phosphoribosyltransferase (URC6). Urc1...

  7. A novel component of the mitochondrial genome segregation machinery in trypanosomes

    Directory of Open Access Journals (Sweden)

    Anneliese Hoffmann

    2016-07-01

    Full Text Available We recently described a new component (TAC102 of the mitochondrial genome segregation machinery (mtGSM in the protozoan parasite Trypanosoma brucei. T. brucei belongs to a group of organisms that contain a single mitochondrial organelle with a single mitochondrial genome (mt-genome per cell. The mt-genome consists of 5000 minicircles (1 kb and 25 maxicircles (23 kb that are catenated into a large network. After replication of the network its segregation is driven by the separating basal bodies, which are homologous structures to the centrioles organizing the spindle apparatus in many eukaryotes. The structure connecting the basal body to the mt-genome was named the Tripartite Attachment Complex (TAC owing its name to the distribution across three areas in the cell including the two mitochondrial membranes.

  8. Common structural and epigenetic changes in the genome of castration-resistant prostate cancer.

    Science.gov (United States)

    Friedlander, Terence W; Roy, Ritu; Tomlins, Scott A; Ngo, Vy T; Kobayashi, Yasuko; Azameera, Aruna; Rubin, Mark A; Pienta, Kenneth J; Chinnaiyan, Arul; Ittmann, Michael M; Ryan, Charles J; Paris, Pamela L

    2012-02-01

    Progression of primary prostate cancer to castration-resistant prostate cancer (CRPC) is associated with numerous genetic and epigenetic alterations that are thought to promote survival at metastatic sites. In this study, we investigated gene copy number and CpG methylation status in CRPC to gain insight into specific pathophysiologic pathways that are active in this advanced form of prostate cancer. Our analysis defined and validated 495 genes exhibiting significant differences in CRPC in gene copy number, including gains in androgen receptor (AR) and losses of PTEN and retinoblastoma 1 (RB1). Significant copy number differences existed between tumors with or without AR gene amplification, including a common loss of AR repressors in AR-unamplified tumors. Simultaneous gene methylation and allelic deletion occurred frequently in RB1 and HSD17B2, the latter of which is involved in testosterone metabolism. Lastly, genomic DNA from most CRPC was hypermethylated compared with benign prostate tissue. Our findings establish a comprehensive methylation signature that couples epigenomic and structural analyses, thereby offering insights into the genomic alterations in CRPC that are associated with a circumvention of hormonal therapy. Genes identified in this integrated genomic study point to new drug targets in CRPC, an incurable disease state which remains the chief therapeutic challenge. ©2012 AACR.

  9. Primary structure of the human follistatin precursor and its genomic organization

    International Nuclear Information System (INIS)

    Shimasaki, Shunichi; Koga, Makoto; Esch, F.

    1988-01-01

    Follistatin is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release. By use of the recently characterized porcine follistatin cDNA as a probe to screen a human testis cDNA library and a genomic library, the structure of the complete human follistatin precursor as well as its genomic organization have been determined. Three of eight cDNA clones that were sequenced predicted a precursor with 344 amino acids, whereas the remaining five cDNA clones encoded a 317 amino acid precursor, resulting from alternative splicing of the precursor mRNA. Mature follistatins contain four contiguous domains that are encoded by precisely separated exons; three of the domains are highly similar to each other, as well as to human epidermal growth factor and human pancreatic secretory trypsin inhibitor. The genomic organization of the human follistatin is similar to that of the human epidermal growth factor gene and thus supports the notion of exon shuffling during evolution

  10. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo.

    Science.gov (United States)

    Zubradt, Meghan; Gupta, Paromita; Persad, Sitara; Lambowitz, Alan M; Weissman, Jonathan S; Rouskin, Silvi

    2017-01-01

    Coupling of structure-specific in vivo chemical modification to next-generation sequencing is transforming RNA secondary structure studies in living cells. The dominant strategy for detecting in vivo chemical modifications uses reverse transcriptase truncation products, which introduce biases and necessitate population-average assessments of RNA structure. Here we present dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), which encodes DMS modifications as mismatches using a thermostable group II intron reverse transcriptase. DMS-MaPseq yields a high signal-to-noise ratio, can report multiple structural features per molecule, and allows both genome-wide studies and focused in vivo investigations of even low-abundance RNAs. We apply DMS-MaPseq for the first analysis of RNA structure within an animal tissue and to identify a functional structure involved in noncanonical translation initiation. Additionally, we use DMS-MaPseq to compare the in vivo structure of pre-mRNAs with their mature isoforms. These applications illustrate DMS-MaPseq's capacity to dramatically expand in vivo analysis of RNA structure.

  11. Communities of microbial eukaryotes in the mammalian gut within the context of environmental eukaryotic diversity

    Energy Technology Data Exchange (ETDEWEB)

    Parfrey, Laura Wegener; Walters, William A.; Lauber, Christian L.; Clemente, Jose C.; Berg-Lyons, Donna; Teiling, Clotilde; Kodira, Chinnappa; Mohiuddin, Mohammed; Brunelle, Julie; Driscoll, Mark; Fierer, Noah; Gilbert, Jack A.; Knight, Rob

    2014-06-19

    Eukaryotic microbes (protists) residing in the vertebrate gut influence host health and disease, but their diversity and distribution in healthy hosts is poorly understood. Protists found in the gut are typically considered parasites, but many are commensal and some are beneficial. Further, the hygiene hypothesis predicts that association with our co-evolved microbial symbionts may be important to overall health. It is therefore imperative that we understand the normal diversity of our eukaryotic gut microbiota to test for such effects and avoid eliminating commensal organisms. We assembled a dataset of healthy individuals from two populations, one with traditional, agrarian lifestyles and a second with modern, westernized lifestyles, and characterized the human eukaryotic microbiota via high-throughput sequencing. To place the human gut microbiota within a broader context our dataset also includes gut samples from diverse mammals and samples from other aquatic and terrestrial environments. We curated the SILVA ribosomal database to reflect current knowledge of eukaryotic taxonomy and employ it as a phylogenetic framework to compare eukaryotic diversity across environment. We show that adults from the non-western population harbor a diverse community of protists, and diversity in the human gut is comparable to that in other mammals. However, the eukaryotic microbiota of the western population appears depauperate. The distribution of symbionts found in mammals reflects both host phylogeny and diet. Eukaryotic microbiota in the gut are less diverse and more patchily distributed than bacteria. More broadly, we show that eukaryotic communities in the gut are less diverse than in aquatic and terrestrial habitats, and few taxa are shared across habitat types, and diversity patterns of eukaryotes are correlated with those observed for bacteria. These results outline the distribution and diversity of microbial eukaryotic communities in the mammalian gut and across

  12. Characterization of an eukaryotic peptide deformylase from Plasmodium falciparum.

    Science.gov (United States)

    Bracchi-Ricard, V; Nguyen, K T; Zhou, Y; Rajagopalan, P T; Chakrabarti, D; Pei, D

    2001-12-15

    Ribosomal protein synthesis in eubacteria and eukaryotic organelles initiates with an N-formylmethionyl-tRNA(i), resulting in N-terminal formylation of all nascent polypeptides. Peptide deformylase (PDF) catalyzes the subsequent removal of the N-terminal formyl group from the majority of bacterial proteins. Until recently, PDF has been thought as an enzyme unique to the bacterial kingdom. Searches of the genomic DNA databases identified several genes that encode proteins of high sequence homology to bacterial PDF from eukaryotic organisms. The cDNA encoding Plasmodium falciparum PDF (PfPDF) has been cloned and overexpressed in Escherichia coli. The recombinant protein is catalytically active in deformylating N-formylated peptides, shares many of the properties of bacterial PDF, and is inhibited by specific PDF inhibitors. Western blot analysis indicated expression of mature PfPDF in trophozoite, schizont, and segmenter stages of intraerythrocytic development. These results provide strong evidence that a functional PDF is present in P. falciparum. In addition, PDF inhibitors inhibited the growth of P. falciparum in the intraerythrocytic culture. (c)2001 Elsevier Science.

  13. Eukaryotes first: how could that be?

    Science.gov (United States)

    Mariscal, Carlos; Doolittle, W Ford

    2015-09-26

    In the half century since the formulation of the prokaryote : eukaryote dichotomy, many authors have proposed that the former evolved from something resembling the latter, in defiance of common (and possibly common sense) views. In such 'eukaryotes first' (EF) scenarios, the last universal common ancestor is imagined to have possessed significantly many of the complex characteristics of contemporary eukaryotes, as relics of an earlier 'progenotic' period or RNA world. Bacteria and Archaea thus must have lost these complex features secondarily, through 'streamlining'. If the canonical three-domain tree in which Archaea and Eukarya are sisters is accepted, EF entails that Bacteria and Archaea are convergently prokaryotic. We ask what this means and how it might be tested. © 2015 The Author(s).

  14. Cloning, production, and purification of proteins for a medium-scale structural genomics project.

    Science.gov (United States)

    Quevillon-Cheruel, Sophie; Collinet, Bruno; Trésaugues, Lionel; Minard, Philippe; Henckes, Gilles; Aufrère, Robert; Blondeau, Karine; Zhou, Cong-Zhao; Liger, Dominique; Bettache, Nabila; Poupon, Anne; Aboulfath, Ilham; Leulliot, Nicolas; Janin, Joël; van Tilbeurgh, Herman

    2007-01-01

    The South-Paris Yeast Structural Genomics Pilot Project (http://www.genomics.eu.org) aims at systematically expressing, purifying, and determining the three-dimensional structures of Saccharomyces cerevisiae proteins. We have already cloned 240 yeast open reading frames in the Escherichia coli pET system. Eighty-two percent of the targets can be expressed in E. coli, and 61% yield soluble protein. We have currently purified 58 proteins. Twelve X-ray structures have been solved, six are in progress, and six other proteins gave crystals. In this chapter, we present the general experimental flowchart applied for this project. One of the main difficulties encountered in this pilot project was the low solubility of a great number of target proteins. We have developed parallel strategies to recover these proteins from inclusion bodies, including refolding, coexpression with chaperones, and an in vitro expression system. A limited proteolysis protocol, developed to localize flexible regions in proteins that could hinder crystallization, is also described.

  15. Multiple independent structural dynamic events in the evolution of snake mitochondrial genomes.

    Science.gov (United States)

    Qian, Lifu; Wang, Hui; Yan, Jie; Pan, Tao; Jiang, Shanqun; Rao, Dingqi; Zhang, Baowei

    2018-05-10

    Mitochondrial DNA sequences have long been used in phylogenetic studies. However, little attention has been paid to the changes in gene arrangement patterns in the snake's mitogenome. Here, we analyzed the complete mitogenome sequences and structures of 65 snake species from 14 families and examined their structural patterns, organization and evolution. Our purpose was to further investigate the evolutionary implications and possible rearrangement mechanisms of the mitogenome within snakes. In total, eleven types of mitochondrial gene arrangement patterns were detected (Type I, II, III, III-A, III-B, III-B1, III-C, III-D, III-E, III-F, III-G), with mitochondrial genome rearrangements being a major trend in snakes, especially in Alethinophidia. In snake mitogenomes, the rearrangements mainly involved three processes, gene loss, translocation and duplication. Within Scolecophidia, the O L was lost several times in Typhlopidae and Leptotyphlopidae, but persisted as a plesiomorphy in the Alethinophidia. Duplication of the control region and translocation of the tRNA Leu gene are two visible features in Alethinophidian mitochondrial genomes. Independently and stochastically, the duplication of pseudo-Pro (P*) emerged in seven different lineages of unequal size in three families, indicating that the presence of P* was a polytopic event in the mitogenome. The WANCY tRNA gene cluster and the control regions and their adjacent segments were hotspots for mitogenome rearrangement. Maintenance of duplicate control regions may be the source for snake mitogenome structural diversity.

  16. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny

    Science.gov (United States)

    Marr, Henry S.; Tarigo, Jaime L.; Cohn, Leah A.; Bird, David M.; Scholl, Elizabeth H.; Levy, Michael G.; Wiegmann, Brian M.; Birkenheuer, Adam J.

    2016-01-01

    The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon) are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi) with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis) and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae) while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward understanding the

  17. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny.

    Directory of Open Access Journals (Sweden)

    Megan E Schreeg

    Full Text Available The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward

  18. Complete Sequence and Analysis of the Mitochondrial Genome of Hemiselmis andersenii CCMP644 (Cryptophyceae

    Directory of Open Access Journals (Sweden)

    Bowman Sharen

    2008-05-01

    Full Text Available Abstract Background Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote endosymbiosis. Cryptophytes are unusual in that they possess four genomes–a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. Results The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a ~20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22–336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu gene and possesses a trnS-derived 'trnK(uuu', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher

  19. A genomic perspective on protein tyrosine phosphatases: gene structure, pseudogenes, and genetic disease linkage

    DEFF Research Database (Denmark)

    Andersen, Jannik N; Jansen, Peter G; Echwald, Søren M

    2004-01-01

    sequence databases, we discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts. Direct orthologs were present in the mouse genome for all 38 human PTP genes. In addition, we identified 12 PTP pseudogenes unique to humans...... that have probably contaminated previous bioinformatics analysis of this gene family. PCR amplification and transcript sequencing indicate that some PTP pseudogenes are expressed, but their function (if any) is unknown. Furthermore, we analyzed the enhanced diversity generated by alternative splicing...

  20. Structural and functional insights of β-glucosidases identified from the genome of Aspergillus fumigatus

    Science.gov (United States)

    Dodda, Subba Reddy; Aich, Aparajita; Sarkar, Nibedita; Jain, Piyush; Jain, Sneha; Mondal, Sudipa; Aikat, Kaustav; Mukhopadhyay, Sudit S.

    2018-03-01

    Thermostable glucose tolerant β-glucosidase from Aspergillus species has attracted worldwide interest for their potentiality in industrial applications and bioethanol production. A strain of Aspergillus fumigatus (AfNITDGPKA3) identified by our laboratory from straw retting ground showed higher cellulase activity, specifically the β-glucosidase activity, compared to other contemporary strains. Though A. fumigatus has been known for high cellulase activity, detailed identification and characterization of the cellulase genes from their genome is yet to be done. In this work we have been analyzed the cellulase genes from the genome sequence database of Aspergillus fumigatus (Af293). Genome analysis suggests two cellobiohydrolase, eleven endoglucanase and seventeen β-glucosidase genes present. β-Glucosidase genes belong to either Glycohydro1 (GH1 or Bgl1) or Glycohydro3 (GH3 or Bgl3) family. The sequence similarity suggests that Bgl1 and Bgl3 of A. fumagatus are phylogenetically close to those of A. fisheri and A. oryzae. The modelled structure of the Bgl1 predicts the (β/α)8 barrel type structure with deep and narrow active site, whereas, Bgl3 shows the (α/β)8 barrel and (α/β)6 sandwich structure with shallow and open active site. Docking results suggest that amino acids Glu544, Glu466, Trp408,Trp567,Tyr44,Tyr222,Tyr770,Asp844,Asp537,Asn212,Asn217 of Bgl3 and Asp224,Asn242,Glu440, Glu445, Tyr367, Tyr365,Thr994,Trp435,Trp446 of Bgl1 are involved in the hydrolysis. Binding affinity analyses suggest that Bgl3 and Bgl1 enzymes are more active on the substrates like 4-methylumbelliferyl glycoside (MUG) and p-nitrophenyl-β-D-1, 4-glucopyranoside (pNPG) than on cellobiose. Further docking with glucose suggests that Bgl1 is more glucose tolerant than Bgl3. Analysis of the Aspergillus fumigatus genome may help to identify a β-glucosidase enzyme with better property and the structural information may help to develop an engineered recombinant enzyme.

  1. Reproduction, symbiosis, and the eukaryotic cell

    Science.gov (United States)

    Godfrey-Smith, Peter

    2015-01-01

    This paper develops a conceptual framework for addressing questions about reproduction, individuality, and the units of selection in symbiotic associations, with special attention to the origin of the eukaryotic cell. Three kinds of reproduction are distinguished, and a possible evolutionary sequence giving rise to a mitochondrion-containing eukaryotic cell from an endosymbiotic partnership is analyzed as a series of transitions between each of the three forms of reproduction. The sequence of changes seen in this “egalitarian” evolutionary transition is compared with those that apply in “fraternal” transitions, such as the evolution of multicellularity in animals. PMID:26286983

  2. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  3. The structures of bovine herpesvirus 1 virion and concatemeric DNA: implications for cleavage and packaging of herpesvirus genomes

    International Nuclear Information System (INIS)

    Schynts, Frederic; McVoy, Michael A.; Meurens, Francois; Detry, Bruno; Epstein, Alberto L.; Thiry, Etienne

    2003-01-01

    Herpesvirus genomes are often characterized by the presence of direct and inverted repeats that delineate their grouping into six structural classes. Class D genomes consist of a long (L) segment and a short (S) segment. The latter is flanked by large inverted repeats. DNA replication produces concatemers of head-to-tail linked genomes that are cleaved into unit genomes during the process of packaging DNA into capsids. Packaged class D genomes are an equimolar mixture of two isomers in which S is in either of two orientations, presumably a consequence of homologous recombination between the inverted repeats. The L segment remains predominantly fixed in a prototype (P) orientation; however, low levels of genomes having inverted L (I L ) segments have been reported for some class D herpesviruses. Inefficient formation of class D I L genomes has been attributed to infrequent L segment inversion, but recent detection of frequent inverted L segments in equine herpesvirus 1 concatemers [Virology 229 (1997) 415-420] suggests that the defect may be at the level of cleavage and packaging rather than inversion. In this study, the structures of virion and concatemeric DNA of another class D herpesvirus, bovine herpesvirus 1, were determined. Virion DNA contained low levels of I L genomes, whereas concatemeric DNA contained significant amounts of L segments in both P and I L orientations. However, concatemeric termini exhibited a preponderance of L termini derived from P isomers which was comparable to the preponderance of P genomes found in virion DNA. Thus, the defect in formation of I L genomes appears to lie at the level of concatemer cleavage. These results have important implications for the mechanisms by which herpesvirus DNA cleavage and packaging occur

  4. A Three-Dimensional Model of the Yeast Genome

    Science.gov (United States)

    Noble, William; Duan, Zhi-Jun; Andronescu, Mirela; Schutz, Kevin; McIlwain, Sean; Kim, Yoo Jung; Lee, Choli; Shendure, Jay; Fields, Stanley; Blau, C. Anthony

    Layered on top of information conveyed by DNA sequence and chromatin are higher order structures that encompass portions of chromosomes, entire chromosomes, and even whole genomes. Interphase chromosomes are not positioned randomly within the nucleus, but instead adopt preferred conformations. Disparate DNA elements co-localize into functionally defined aggregates or factories for transcription and DNA replication. In budding yeast, Drosophila and many other eukaryotes, chromosomes adopt a Rabl configuration, with arms extending from centromeres adjacent to the spindle pole body to telomeres that abut the nuclear envelope. Nonetheless, the topologies and spatial relationships of chromosomes remain poorly understood. Here we developed a method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among transfer RNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.

  5. Target Selection and Deselection at the Berkeley StructuralGenomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Kim, Sung-Hou; Brenner, Steven E.

    2005-03-22

    At the Berkeley Structural Genomics Center (BSGC), our goalis to obtain a near-complete structural complement of proteins in theminimal organisms Mycoplasma genitalium and M. pneumoniae, two closelyrelated pathogens. Current targets for structure determination have beenselected in six major stages, starting with those predicted to be mosttractable to high throughput study and likely to yield new structuralinformation. We report on the process used to select these proteins, aswell as our target deselection procedure. Target deselection reducesexperimental effort by eliminating targets similar to those recentlysolved by the structural biology community or other centers. We measurethe impact of the 69 structures solved at the BSGC as of July 2004 onstructure prediction coverage of the M. pneumoniae and M. genitaliumproteomes. The number of Mycoplasma proteins for which thefold couldfirst be reliably assigned based on structures solved at the BSGC (24 M.pneumoniae and 21 M. genitalium) is approximately 25 percent of the totalresulting from work at all structural genomics centers and the worldwidestructural biology community (94 M. pneumoniae and 86M. genitalium)during the same period. As the number of structures contributed by theBSGC during that period is less than 1 percent of the total worldwideoutput, the benefits of a focused target selection strategy are apparent.If the structures of all current targets were solved, the percentage ofM. pneumoniae proteins for which folds could be reliably assigned wouldincrease from approximately 57 percent (391 of 687) at present to around80 percent (550 of 687), and the percentage of the proteome that could beaccurately modeled would increase from around 37 percent (254 of 687) toabout 64 percent (438 of 687). In M. genitalium, the percentage of theproteome that could be structurally annotated based on structures of ourremaining targets would rise from 72 percent (348 of 486) to around 76percent (371 of 486), with the

  6. Genetic exchange in eukaryotes through horizontal transfer: connected by the mobilome.

    Science.gov (United States)

    Wallau, Gabriel Luz; Vieira, Cristina; Loreto, Élgion Lúcio Silva

    2018-01-01

    All living species contain genetic information that was once shared by their common ancestor. DNA is being inherited through generations by vertical transmission (VT) from parents to offspring and from ancestor to descendant species. This process was considered the sole pathway by which biological entities exchange inheritable information. However, Horizontal Transfer (HT), the exchange of genetic information by other means than parents to offspring, was discovered in prokaryotes along with strong evidence showing that it is a very important process by which prokaryotes acquire new genes. For some time now, it has been a scientific consensus that HT events were rare and non-relevant for evolution of eukaryotic species, but there is growing evidence supporting that HT is an important and frequent phenomenon in eukaryotes as well. Here, we will discuss the latest findings regarding HT among eukaryotes, mainly HT of transposons (HTT), establishing HTT once and for all as an important phenomenon that should be taken into consideration to fully understand eukaryotes genome evolution. In addition, we will discuss the latest development methods to detect such events in a broader scale and highlight the new approaches which should be pursued by researchers to fill the knowledge gaps regarding HTT among eukaryotes.

  7. The three-dimensional genome organization of Drosophila melanogaster through data integration.

    Science.gov (United States)

    Li, Qingjiao; Tjong, Harianto; Li, Xiao; Gong, Ke; Zhou, Xianghong Jasmine; Chiolo, Irene; Alber, Frank

    2017-07-31

    Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.

  8. The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial ‘mobilome’

    Science.gov (United States)

    Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

    2009-01-01

    Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The ∼108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element

  9. Discovery of new enzymes and metabolic pathways using structure and genome context

    Science.gov (United States)

    Zhao, Suwen; Kumar, Ritesh; Sakai, Ayano; Vetting, Matthew W.; Wood, B. McKay; Brown, Shoshana; Bonanno, Jeffery B.; Hillerich, Brandan S.; Seidel, Ronald D.; Babbitt, Patricia C.; Almo, Steven C.; Sweedler, Jonathan V.; Gerlt, John A.; Cronan, John E.; Jacobson, Matthew P.

    2014-01-01

    Assigning valid functions to proteins identified in genome projects is challenging, with over-prediction and database annotation errors major concerns1. We, and others2, are developing computation-guided strategies for functional discovery using “metabolite docking” to experimentally derived3 or homology-based4 three-dimensional structures. Bacterial metabolic pathways often are encoded by “genome neighborhoods” (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by “predicting” the intermediates in the glycolytic pathway in E. coli5. Metabolite docking to multiple binding proteins/enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. We report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed i) the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and ii) the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guide functional predictions to enable the discovery of new metabolic pathways. PMID:24056934

  10. Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models.

    Science.gov (United States)

    Lehermeier, Christina; Schön, Chris-Carolin; de Los Campos, Gustavo

    2015-09-01

    Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to "correct" for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP. Copyright © 2015 by the Genetics Society of America.

  11. Modeling structure of G protein-coupled receptors in huan genome

    KAUST Repository

    Zhang, Yang

    2016-01-26

    G protein-coupled receptors (or GPCRs) are integral transmembrane proteins responsible to various cellular signal transductions. Human GPCR proteins are encoded by 5% of human genes but account for the targets of 40% of the FDA approved drugs. Due to difficulties in crystallization, experimental structure determination remains extremely difficult for human GPCRs, which have been a major barrier in modern structure-based drug discovery. We proposed a new hybrid protocol, GPCR-I-TASSER, to construct GPCR structure models by integrating experimental mutagenesis data with ab initio transmembrane-helix assembly simulations, assisted by the predicted transmembrane-helix interaction networks. The method was tested in recent community-wide GPCRDock experiments and constructed models with a root mean square deviation 1.26 Å for Dopamine-3 and 2.08 Å for Chemokine-4 receptors in the transmembrane domain regions, which were significantly closer to the native than the best templates available in the PDB. GPCR-I-TASSER has been applied to model all 1,026 putative GPCRs in the human genome, where 923 are found to have correct folds based on the confidence score analysis and mutagenesis data comparison. The successfully modeled GPCRs contain many pharmaceutically important families that do not have previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. All the human GPCR models have been made publicly available through the GPCR-HGmod database at http://zhanglab.ccmb.med.umich.edu/GPCR-HGmod/ The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins which should bring useful impact on the effort of GPCR-targeted drug discovery.

  12. Radiogenetic effects in eukaryote cells

    International Nuclear Information System (INIS)

    Mosseh, I.B.

    1987-01-01

    It is shown that under DNA radiation effect the primary reactions begin from nitrogen base injuries. Some of them are fixed as point mutations. In other cases bases for occurrence of chromosome structural mutations arise. The conclusion is made that point mutations induced by irradiation are formed during a short time interval and they depend to a lasser extent on different intra cellular processes including repair than on chromosomal rearrangement. Protein, as a component of chromosome, which influences undoubtedly on repair of premutation changes, plays a considerable role when forming chromosomal rearrangement. Mutations formed depending on their functional values lead either to change in a cell genotype or to cell destruction

  13. Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns

    Directory of Open Access Journals (Sweden)

    Landweber Laura F

    2008-12-01

    Full Text Available Abstract Background Nyctotherus ovalis is a single-celled eukaryote that has hydrogen-producing mitochondria and lives in the hindgut of cockroaches. Like all members of the ciliate taxon, it has two types of nuclei, a micronucleus and a macronucleus. N. ovalis generates its macronuclear chromosomes by forming polytene chromosomes that subsequently develop into macronuclear chromosomes by DNA elimination and rearrangement. Results We examined the structure of these gene-sized macronuclear chromosomes in N. ovalis. We determined the telomeres, subtelomeric regions, UTRs, coding regions and introns by sequencing a large set of macronuclear DNA sequences (4,242 and cDNAs (5,484 and comparing them with each other. The telomeres consist of repeats CCC(AAAACCCCn, similar to those in spirotrichous ciliates such as Euplotes, Sterkiella (Oxytricha and Stylonychia. Per sequenced chromosome we found evidence for either a single protein-coding gene, a single tRNA, or the complete ribosomal RNAs cluster. Hence the chromosomes appear to encode single transcripts. In the short subtelomeric regions we identified a few overrepresented motifs that could be involved in gene regulation, but there is no consensus polyadenylation site. The introns are short (21–29 nucleotides, and a significant fraction (1/3 of the tiny introns is conserved in the distantly related ciliate Paramecium tetraurelia. As has been observed in P. tetraurelia, the N. ovalis introns tend to contain in-frame stop codons or have a length that is not dividable by three. This pattern causes premature termination of mRNA translation in the event of intron retention, and potentially degradation of unspliced mRNAs by the nonsense-mediated mRNA decay pathway. Conclusion The combination of short leaders, tiny introns and single genes leads to very minimal macronuclear chromosomes. The smallest we identified contained only 150 nucleotides.

  14. Inferring network structure in non-normal and mixed discrete-continuous genomic data.

    Science.gov (United States)

    Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

    2018-03-01

    Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. © 2017, The International Biometric Society.

  15. Genome Scan for Selection in Structured Layer Chicken Populations Exploiting Linkage Disequilibrium Information.

    Directory of Open Access Journals (Sweden)

    Mahmood Gholami

    Full Text Available An increasing interest is being placed in the detection of genes, or genomic regions, that have been targeted by selection because identifying signatures of selection can lead to a better understanding of genotype-phenotype relationships. A common strategy for the detection of selection signatures is to compare samples from distinct populations and to search for genomic regions with outstanding genetic differentiation. The aim of this study was to detect selective signatures in layer chicken populations using a recently proposed approach, hapFLK, which exploits linkage disequilibrium information while accounting appropriately for the hierarchical structure of populations. We performed the analysis on 70 individuals from three commercial layer breeds (White Leghorn, White Rock and Rhode Island Red, genotyped for approximately 1 million SNPs. We found a total of 41 and 107 regions with outstanding differentiation or similarity using hapFLK and its single SNP counterpart FLK respectively. Annotation of selection signature regions revealed various genes and QTL corresponding to productions traits, for which layer breeds were selected. A number of the detected genes were associated with growth and carcass traits, including IGF-1R, AGRP and STAT5B. We also annotated an interesting gene associated with the dark brown feather color mutational phenotype in chickens (SOX10. We compared FST, FLK and hapFLK and demonstrated that exploiting linkage disequilibrium information and accounting for hierarchical population structure decreased the false detection rate.

  16. Genomic structure and evolution of the mating type locus in the green seaweed Ulva partita.

    Science.gov (United States)

    Yamazaki, Tomokazu; Ichihara, Kensuke; Suzuki, Ryogo; Oshima, Kenshiro; Miyamura, Shinichi; Kuwano, Kazuyoshi; Toyoda, Atsushi; Suzuki, Yutaka; Sugano, Sumio; Hattori, Masahira; Kawano, Shigeyuki

    2017-09-15

    The evolution of sex chromosomes and mating loci in organisms with UV systems of sex/mating type determination in haploid phases via genes on UV chromosomes is not well understood. We report the structure of the mating type (MT) locus and its evolutionary history in the green seaweed Ulva partita, which is a multicellular organism with an isomorphic haploid-diploid life cycle and mating type determination in the haploid phase. Comprehensive comparison of a total of 12.0 and 16.6 Gb of genomic next-generation sequencing data for mt - and mt + strains identified highly rearranged MT loci of 1.0 and 1.5 Mb in size and containing 46 and 67 genes, respectively, including 23 gametologs. Molecular evolutionary analyses suggested that the MT loci diverged over a prolonged period in the individual mating types after their establishment in an ancestor. A gene encoding an RWP-RK domain-containing protein was found in the mt - MT locus but was not an ortholog of the chlorophycean mating type determination gene MID. Taken together, our results suggest that the genomic structure and its evolutionary history in the U. partita MT locus are similar to those on other UV chromosomes and that the MT locus genes are quite different from those of Chlorophyceae.

  17. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Science.gov (United States)

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Coordination of genomic structure and transcription by the main bacterial nucleoid-associated protein HU

    Science.gov (United States)

    Berger, Michael; Farcas, Anca; Geertz, Marcel; Zhelyazkova, Petya; Brix, Klaudia; Travers, Andrew; Muskhelishvili, Georgi

    2010-01-01

    The histone-like protein HU is a highly abundant DNA architectural protein that is involved in compacting the DNA of the bacterial nucleoid and in regulating the main DNA transactions, including gene transcription. However, the coordination of the genomic structure and function by HU is poorly understood. Here, we address this question by comparing transcript patterns and spatial distributions of RNA polymerase in Escherichia coli wild-type and hupA/B mutant cells. We demonstrate that, in mutant cells, upregulated genes are preferentially clustered in a large chromosomal domain comprising the ribosomal RNA operons organized on both sides of OriC. Furthermore, we show that, in parallel to this transcription asymmetry, mutant cells are also impaired in forming the transcription foci—spatially confined aggregations of RNA polymerase molecules transcribing strong ribosomal RNA operons. Our data thus implicate HU in coordinating the global genomic structure and function by regulating the spatial distribution of RNA polymerase in the nucleoid. PMID:20010798

  19. Population structure and genomic inbreeding in nine Swiss dairy cattle populations.

    Science.gov (United States)

    Signer-Hasler, Heidi; Burren, Alexander; Neuditschko, Markus; Frischknecht, Mirjam; Garrick, Dorian; Stricker, Christian; Gredler, Birgit; Bapst, Beat; Flury, Christine

    2017-11-07

    Domestication, breed formation and intensive selection have resulted in divergent cattle breeds that likely exhibit their own genomic signatures. In this study, we used genotypes from 27,612 autosomal single nucleotide polymorphisms to characterize population structure based on 9214 sires representing nine Swiss dairy cattle populations: Brown Swiss (BS), Braunvieh (BV), Original Braunvieh (OB), Holstein (HO), Red Holstein (RH), Swiss Fleckvieh (SF), Simmental (SI), Eringer (ER) and Evolèner (EV). Genomic inbreeding (F ROH ) and signatures of selection were determined by calculating runs of homozygosity (ROH). The results build the basis for a better understanding of the genetic development of Swiss dairy cattle populations and highlight differences between the original populations (i.e. OB, SI, ER and EV) and those that have become more popular in Switzerland as currently reflected by their larger populations (i.e. BS, BV, HO, RH and SF). The levels of genetic diversity were highest and lowest in the SF and BS breeds, respectively. Based on F ST values, we conclude that, among all pairwise comparisons, BS and HO (0.156) differ more than the other pairs of populations. The original Swiss cattle populations OB, SI, ER, and EV are clearly genetically separated from the Swiss cattle populations that are now more common and represented by larger numbers of cows. Mean levels of F ROH ranged from 0.027 (ER) to 0.091 (BS). Three of the original Swiss cattle populations, ER (F ROH : 0.027), OB (F ROH : 0.029), and SI (F ROH : 0.039), showed low levels of genomic inbreeding, whereas it was much higher in EV (F ROH : 0.074). Private signatures of selection for the original Swiss cattle populations are reported for BTA4, 5, 11 and 26. The low levels of genomic inbreeding observed in the original Swiss cattle populations ER, OB and SI compared to the other breeds are explained by a lesser use of artificial insemination and greater use of natural service. Natural service

  20. From Genome to Structure and Back Again: A Family Portrait of the Transcarbamylases

    Directory of Open Access Journals (Sweden)

    Dashuang Shi

    2015-08-01

    Full Text Available Enzymes in the transcarbamylase family catalyze the transfer of a carbamyl group from carbamyl phosphate (CP to an amino group of a second substrate. The two best-characterized members, aspartate transcarbamylase (ATCase and ornithine transcarbamylase (OTCase, are present in most organisms from bacteria to humans. Recently, structures of four new transcarbamylase members, N-acetyl-l-ornithine transcarbamylase (AOTCase, N-succinyl-l-ornithine transcarbamylase (SOTCase, ygeW encoded transcarbamylase (YTCase and putrescine transcarbamylase (PTCase have also been determined. Crystal structures of these enzymes have shown that they have a common overall fold with a trimer as their basic biological unit. The monomer structures share a common CP binding site in their N-terminal domain, but have different second substrate binding sites in their C-terminal domain. The discovery of three new transcarbamylases, l-2,3-diaminopropionate transcarbamylase (DPTCase, l-2,4-diaminobutyrate transcarbamylase (DBTCase and ureidoglycine transcarbamylase (UGTCase, demonstrates that our knowledge and understanding of the spectrum of the transcarbamylase family is still incomplete. In this review, we summarize studies on the structures and function of transcarbamylases demonstrating how structural information helps to define biological function and how small structural differences govern enzyme specificity. Such information is important for correctly annotating transcarbamylase sequences in the genome databases and for identifying new members of the transcarbamylase family.

  1. Structure of the acidianus filamentous virus 3 and comparative genomics of related archaeal lipothrixviruses

    DEFF Research Database (Denmark)

    Vestergaard, Gisle Alberg; Aramayo, Ricardo; Basta, Tamara

    2008-01-01

    Four novel filamentous viruses with double-stranded DNA genomes, namely, Acidianus filamentous virus 3 (AFV3), AFV6, AFV7, and AFV8, have been characterized from the hyperthermophilic archaeal genus Acidianus, and they are assigned to the Betalipothrixvirus genus of the family Lipothrixviridae....... The structures of the approximately 2-mum-long virions are similar, and one of them, AFV3, was studied in detail. It consists of a cylindrical envelope containing globular subunits arranged in a helical formation that is unique for any known double-stranded DNA virus. The envelope is 3.1 nm thick and encases...... structural proteins; (iii) multiple overlapping open reading frames, which may be indicative of gene recoding; (iv) putative 12-bp genetic elements; and (v) partial gene sequences corresponding closely to spacer sequences of chromosomal repeat clusters....

  2. High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all?

    Science.gov (United States)

    Leulliot, Nicolas; Trésaugues, Lionel; Bremang, Michael; Sorel, Isabelle; Ulryck, Nathalie; Graille, Marc; Aboulfath, Ilham; Poupon, Anne; Liger, Dominique; Quevillon-Cheruel, Sophie; Janin, Joël; van Tilbeurgh, Herman

    2005-06-01

    Crystallization has long been regarded as one of the major bottlenecks in high-throughput structural determination by X-ray crystallography. Structural genomics projects have addressed this issue by using robots to set up automated crystal screens using nanodrop technology. This has moved the bottleneck from obtaining the first crystal hit to obtaining diffraction-quality crystals, as crystal optimization is a notoriously slow process that is difficult to automatize. This article describes the high-throughput optimization strategies used in the Yeast Structural Genomics project, with selected successful examples.

  3. SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption.

    Science.gov (United States)

    Ho, Michelle L; Adler, Benjamin A; Torre, Michael L; Silberg, Jonathan J; Suh, Junghae

    2013-12-20

    Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions.

  4. DNA is structured as a linear "jigsaw puzzle" in the genomes of Arabidopsis, rice, and budding yeast.

    Science.gov (United States)

    Liu, Yun-Hua; Zhang, Meiping; Wu, Chengcang; Huang, James J; Zhang, Hong-Bin

    2014-01-01

    Knowledge of how a genome is structured and organized from its constituent elements is crucial to understanding its biology and evolution. Here, we report the genome structuring and organization pattern as revealed by systems analysis of the sequences of three model species, Arabidopsis, rice and yeast, at the whole-genome and chromosome levels. We found that all fundamental function elements (FFE) constituting the genomes, including genes (GEN), DNA transposable elements (DTE), retrotransposable elements (RTE), simple sequence repeats (SSR), and (or) low complexity repeats (LCR), are structured in a nonrandom and correlative manner, thus leading to a hypothesis that the DNA of the species is structured as a linear "jigsaw puzzle". Furthermore, we showed that different FFE differ in their importance in the formation and evolution of the DNA jigsaw puzzle structure between species. DTE and RTE play more important roles than GEN, LCR, and SSR in Arabidopsis, whereas GEN and RTE play more important roles than LCR, SSR, and DTE in rice. The genes having multiple recognized functions play more important roles than those having single functions. These results provide useful knowledge necessary for better understanding genome biology and evolution of the species and for effective molecular breeding of rice.

  5. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    Science.gov (United States)

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was

  6. Complete Mitochondrial Genome of the Medicinal Mushroom Ganoderma lucidum

    Science.gov (United States)

    Chen, Haimei; Chen, Xiangdong; Lan, Jin; Liu, Chang

    2013-01-01

    Ganoderma lucidum is one of the well-known medicinal basidiomycetes worldwide. The mitochondrion, referred to as the second genome, is an organelle found in most eukaryotic cells and participates in critical cellular functions. Elucidating the structure and function of this genome is important to understand completely the genetic contents of G. lucidum. In this study, we assembled the mitochondrial genome of G. lucidum and analyzed the differential expressions of its encoded genes across three developmental stages. The mitochondrial genome is a typical circular DNA molecule of 60,630 bp with a GC content of 26.67%. Genome annotation identified genes that encode 15 conserved proteins, 27 tRNAs, small and large rRNAs, four homing endonucleases, and two hypothetical proteins. Except for genes encoding trnW and two hypothetical proteins, all genes were located on the positive strand. For the repeat structure analysis, eight forward, two inverted, and three tandem repeats were detected. A pair of fragments with a total length around 5.5 kb was found in both the nuclear and mitochondrial genomes, which suggests the possible transfer of DNA sequences between two genomes. RNA-Seq data for samples derived from three stages, namely, mycelia, primordia, and fruiting bodies, were mapped to the mitochondrial genome and qualified. The protein-coding genes were expressed higher in mycelia or primordial stages compared with those in the fruiting bodies. The rRNA abundances were significantly higher in all three stages. Two regions were transcribed but did not contain any identified protein or tRNA genes. Furthermore, three RNA-editing sites were detected. Genome synteny analysis showed that significant genome rearrangements occurred in the mitochondrial genomes. This study provides valuable information on the gene contents of the mitochondrial genome and their differential expressions at various developmental stages of G. lucidum. The results contribute to the understanding of the

  7. Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis

    Directory of Open Access Journals (Sweden)

    Richards Vincent P

    2012-12-01

    Full Text Available Abstract Background Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Results Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection. A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs [plasmid, phage, integrative conjugative element (ICE] and comparison to other species provided convincing evidence for lateral gene transfer (LGT between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae, with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST of a subset of the isolates (n = 45 detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types], suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. Conclusion This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human

  8. Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis.

    Science.gov (United States)

    Richards, Vincent P; Zadoks, Ruth N; Pavinski Bitar, Paulina D; Lefébure, Tristan; Lang, Ping; Werner, Brenda; Tikofsky, Linda; Moroni, Paolo; Stanhope, Michael J

    2012-12-18

    Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection). A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs) [plasmid, phage, integrative conjugative element (ICE)] and comparison to other species provided convincing evidence for lateral gene transfer (LGT) between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae), with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST) of a subset of the isolates (n = 45) detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types]), suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates) occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human bacteria (Streptococcus urinalis) is cause for concern

  9. Variation in the OC locus of Acinetobacter baumannii genomes predicts extensive structural diversity in the lipooligosaccharide.

    Directory of Open Access Journals (Sweden)

    Johanna J Kenyon

    Full Text Available Lipooligosaccharide (LOS is a complex surface structure that is linked to many pathogenic properties of Acinetobacter baumannii. In A. baumannii, the genes responsible for the synthesis of the outer core (OC component of the LOS are located between ilvE and aspS. The content of the OC locus is usually variable within a species, and examination of 6 complete and 227 draft A. baumannii genome sequences available in GenBank non-redundant and Whole Genome Shotgun databases revealed nine distinct new types, OCL4-OCL12, in addition to the three known ones. The twelve gene clusters fell into two distinct groups, designated Group A and Group B, based on similarities in the genes present. OCL6 (Group B was unique in that it included genes for the synthesis of L-Rhamnosep. Genetic exchange of the different configurations between strains has occurred as some OC forms were found in several different sequence types (STs. OCL1 (Group A was the most widely distributed being present in 18 STs, and OCL6 was found in 16 STs. Variation within clones was also observed, with more than one OC locus type found in the two globally disseminated clones, GC1 and GC2, that include the majority of multiply antibiotic resistant isolates. OCL1 was the most abundant gene cluster in both GC1 and GC2 genomes but GC1 isolates also carried OCL2, OCL3 or OCL5, and OCL3 was also present in GC2. As replacement of the OC locus in the major global clones indicates the presence of sub-lineages, a PCR typing scheme was developed to rapidly distinguish Group A and Group B types, and to distinguish the specific forms found in GC1 and GC2 isolates.

  10. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi

    KAUST Repository

    Assefa, Samuel

    2015-10-06

    Malaria cases caused by the zoonotic parasite Plasmodium knowlesi are being increasingly reported throughout Southeast Asia and in travelers returning from the region. To test for evidence of signatures of selection or unusual population structure in this parasite, we surveyed genome sequence diversity in 48 clinical isolates recently sampled from Malaysian Borneo and in five lines maintained in laboratory rhesus macaques after isolation in the 1960s from Peninsular Malaysia and the Philippines. Overall genomewide nucleotide diversity (π = 6.03 × 10) was much higher than has been seen in worldwide samples of either of the major endemic malaria parasite species Plasmodium falciparum and Plasmodium vivax. A remarkable substructure is revealed within P. knowlesi, consisting of two major sympatric clusters of the clinical isolates and a third cluster comprising the laboratory isolates. There was deep differentiation between the two clusters of clinical isolates [mean genomewide fixation index (F) = 0.21, with 9,293 SNPs having fixed differences of F = 1.0]. This differentiation showed marked heterogeneity across the genome, with mean F values of different chromosomes ranging from 0.08 to 0.34 and with further significant variation across regions within several chromosomes. Analysis of the largest cluster (cluster 1, 38 isolates) indicated long-term population growth, with negatively skewed allele frequency distributions (genomewide average Tajima\\'s D = -1.35). Against this background there was evidence of balancing selection on particular genes, including the circumsporozoite protein (csp) gene, which had the top Tajima\\'s D value (1.57), and scans of haplotype homozygosity implicate several genomic regions as being under recent positive selection.

  11. Comparative genome analyses reveal distinct structure in the saltwater crocodile MHC.

    Directory of Open Access Journals (Sweden)

    Weerachai Jaratlerdsiri

    Full Text Available The major histocompatibility complex (MHC is a dynamic genome region with an essential role in the adaptive immunity of vertebrates, especially antigen presentation. The MHC is generally divided into subregions (classes I, II and III containing genes of similar function across species, but with different gene number and organisation. Crocodylia (crocodilians are widely distributed and represent an evolutionary distinct group among higher vertebrates, but the genomic organisation of MHC within this lineage has been largely unexplored. Here, we studied the MHC region of the saltwater crocodile (Crocodylus porosus and compared it with that of other taxa. We characterised genomic clusters encompassing MHC class I and class II genes in the saltwater crocodile based on sequencing of bacterial artificial chromosomes. Six gene clusters spanning ∼452 kb were identified to contain nine MHC class I genes, six MHC class II genes, three TAP genes, and a TRIM gene. These MHC class I and class II genes were in separate scaffold regions and were greater in length (2-6 times longer than their counterparts in well-studied fowl B loci, suggesting that the compaction of avian MHC occurred after the crocodilian-avian split. Comparative analyses between the saltwater crocodile MHC and that from the alligator and gharial showed large syntenic areas (>80% identity with similar gene order. Comparisons with other vertebrates showed that the saltwater crocodile had MHC class I genes located along with TAP, consistent with birds studied. Linkage between MHC class I and TRIM39 observed in the saltwater crocodile resembled MHC in eutherians compared, but absent in avian MHC, suggesting that the saltwater crocodile MHC appears to have gene organisation intermediate between these two lineages. These observations suggest that the structure of the saltwater crocodile MHC, and other crocodilians, can help determine the MHC that was present in the ancestors of archosaurs.

  12. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi

    KAUST Repository

    Assefa, Samuel; Lim, Caeul; Preston, Mark D.; Duffy, Craig W.; Nair, Mridul; Adroub, Sabir; Kadir, Khamisah A.; Goldberg, Jonathan M.; Neafsey, Daniel E.; Divis, Paul; Clark, Taane G.; Duraisingh, Manoj T.; Conway, David J.; Pain, Arnab; Singh, Balbir

    2015-01-01

    Malaria cases caused by the zoonotic parasite Plasmodium knowlesi are being increasingly reported throughout Southeast Asia and in travelers returning from the region. To test for evidence of signatures of selection or unusual population structure in this parasite, we surveyed genome sequence diversity in 48 clinical isolates recently sampled from Malaysian Borneo and in five lines maintained in laboratory rhesus macaques after isolation in the 1960s from Peninsular Malaysia and the Philippines. Overall genomewide nucleotide diversity (π = 6.03 × 10) was much higher than has been seen in worldwide samples of either of the major endemic malaria parasite species Plasmodium falciparum and Plasmodium vivax. A remarkable substructure is revealed within P. knowlesi, consisting of two major sympatric clusters of the clinical isolates and a third cluster comprising the laboratory isolates. There was deep differentiation between the two clusters of clinical isolates [mean genomewide fixation index (F) = 0.21, with 9,293 SNPs having fixed differences of F = 1.0]. This differentiation showed marked heterogeneity across the genome, with mean F values of different chromosomes ranging from 0.08 to 0.34 and with further significant variation across regions within several chromosomes. Analysis of the largest cluster (cluster 1, 38 isolates) indicated long-term population growth, with negatively skewed allele frequency distributions (genomewide average Tajima's D = -1.35). Against this background there was evidence of balancing selection on particular genes, including the circumsporozoite protein (csp) gene, which had the top Tajima's D value (1.57), and scans of haplotype homozygosity implicate several genomic regions as being under recent positive selection.

  13. Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis

    Science.gov (United States)

    2012-01-01

    Background Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Results Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection). A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs) [plasmid, phage, integrative conjugative element (ICE)] and comparison to other species provided convincing evidence for lateral gene transfer (LGT) between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae), with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST) of a subset of the isolates (n = 45) detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types]), suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates) occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. Conclusion This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human bacteria (Streptococcus

  14. Structure modeling of all identified G protein-coupled receptors in the human genome.

    Science.gov (United States)

    Zhang, Yang; Devries, Mark E; Skolnick, Jeffrey

    2006-02-01

    G protein-coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha) root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness

  15. Structure modeling of all identified G protein-coupled receptors in the human genome.

    Directory of Open Access Journals (Sweden)

    Yang Zhang

    2006-02-01

    Full Text Available G protein-coupled receptors (GPCRs, encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness

  16. Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe.

    Directory of Open Access Journals (Sweden)

    Martin Sikora

    2014-05-01

    Full Text Available Genome sequencing of the 5,300-year-old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has yielded new insights into his origin and relationship to modern European populations. A key finding of that study was an apparent recent common ancestry with individuals from Sardinia, based largely on the Y chromosome haplogroup and common autosomal SNP variation. Here, we compiled and analyzed genomic datasets from both modern and ancient Europeans, including genome sequence data from over 400 Sardinians and two ancient Thracians from Bulgaria, to investigate this result in greater detail and determine its implications for the genetic structure of Neolithic Europe. Using whole-genome sequencing data, we confirm that the Iceman is, indeed, most closely related to Sardinians. Furthermore, we show that this relationship extends to other individuals from cultural contexts associated with the spread of agriculture during the Neolithic transition, in contrast to individuals from a hunter-gatherer context. We hypothesize that this genetic affinity of ancient samples from different parts of Europe with Sardinians represents a common genetic component that was geographically widespread across Europe during the Neolithic, likely related to migrations and population expansions associated with the spread of agriculture.

  17. The population genomics of begomoviruses: global scale population structure and gene flow

    Directory of Open Access Journals (Sweden)

    Prasanna HC

    2010-09-01

    Full Text Available Abstract Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could

  18. Meeting Report: Towards the Visualization of Genome Activity at Nanoscale Dimensions

    International Nuclear Information System (INIS)

    Ritland Politz, Joan C.

    2006-01-01

    A report on the Fifth Annual Nanostructural Genomics meeting, Bar Harbor, USA, 7-10 September 2005. It is a rare meeting where one can hear the latest developments in comparative genome analysis, relate these findings to advances in understanding both the linear and three-dimensional organization of the eukaryotic genome, and see it all beginning to fit into the context of the structure and function of the nucleus, visualized using state-of-the art labeling and microscopic techniques. These cross-disciplinary areas of research have been presented by a diverse group of scientists for the past five years at the Nanostructural Genomics meeting at the Jackson Laboratory in Bar Harbor, and the 2005 meeting again gave attendees much food for thought. In summary, the meeting provided a delightfully unique perspective on the application of exciting experimental breakthroughs at the interface of genomics, cell biology and optical physics.

  19. Origin of phagotrophic eukaryotes as social cheaters in microbial biofilms

    Directory of Open Access Journals (Sweden)

    Jékely Gáspár

    2007-01-01

    Full Text Available Abstract Background The origin of eukaryotic cells was one of the most dramatic evolutionary transitions in the history of life. It is generally assumed that eukaryotes evolved later then prokaryotes by the transformation or fusion of prokaryotic lineages. However, as yet there is no consensus regarding the nature of the prokaryotic group(s ancestral to eukaryotes. Regardless of this, a hardly debatable fundamental novel characteristic of the last eukaryotic common ancestor was the ability to exploit prokaryotic biomass by the ingestion of entire cells, i.e. phagocytosis. The recent advances in our understanding of the social life of prokaryotes may help to explain the origin of this form of total exploitation. Presentation of the hypothesis Here I propose that eukaryotic cells originated in a social environment, a differentiated microbial mat or biofilm that was maintained by the cooperative action of its members. Cooperation was costly (e.g. the production of developmental signals or an extracellular matrix but yielded benefits that increased the overall fitness of the social group. I propose that eukaryotes originated as selfish cheaters that enjoyed the benefits of social aggregation but did not contribute to it themselves. The cheaters later evolved into predators that lysed other cells and eventually became professional phagotrophs. During several cycles of social aggregation and dispersal the number of cheaters was contained by a chicken game situation, i.e. reproductive success of cheaters was high when they were in low abundance but was reduced when they were over-represented. Radical changes in cell structure, including the loss of the rigid prokaryotic cell wall and the development of endomembranes, allowed the protoeukaryotes to avoid cheater control and to exploit nutrients more efficiently. Cellular changes were buffered by both the social benefits and the protective physico-chemical milieu of the interior of biofilms. Symbiosis

  20. Crystal Structures of DNA-Whirly Complexes and Their Role in Arabidopsis Organelle Genome Repair

    Energy Technology Data Exchange (ETDEWEB)

    Cappadocia, Laurent; Maréchal, Alexandre; Parent, Jean-Sébastien; Lepage, Étienne; Sygusch, Jurgen; Brisson, Normand (Montreal)

    2010-09-07

    DNA double-strand breaks are highly detrimental to all organisms and need to be quickly and accurately repaired. Although several proteins are known to maintain plastid and mitochondrial genome stability in plants, little is known about the mechanisms of DNA repair in these organelles and the roles of specific proteins. Here, using ciprofloxacin as a DNA damaging agent specific to the organelles, we show that plastids and mitochondria can repair DNA double-strand breaks through an error-prone pathway similar to the microhomology-mediated break-induced replication observed in humans, yeast, and bacteria. This pathway is negatively regulated by the single-stranded DNA (ssDNA) binding proteins from the Whirly family, thus indicating that these proteins could contribute to the accurate repair of plant organelle genomes. To understand the role of Whirly proteins in this process, we solved the crystal structures of several Whirly-DNA complexes. These reveal a nonsequence-specific ssDNA binding mechanism in which DNA is stabilized between domains of adjacent subunits and rendered unavailable for duplex formation and/or protein interactions. Our results suggest a model in which the binding of Whirly proteins to ssDNA would favor accurate repair of DNA double-strand breaks over an error-prone microhomology-mediated break-induced replication repair pathway.

  1. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    Science.gov (United States)

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  2. Whole-Genome Analysis of a Novel Fish Reovirus (MsReV Discloses Aquareovirus Genomic Structure Relationship with Host in Saline Environments

    Directory of Open Access Journals (Sweden)

    Zhong-Yuan Chen

    2015-08-01

    Full Text Available Aquareoviruses are serious pathogens of aquatic animals. Here, genome characterization and functional gene analysis of a novel aquareovirus, largemouth bass Micropterus salmoides reovirus (MsReV, was described. It comprises 11 dsRNA segments (S1–S11 covering 24,024 bp, and encodes 12 putative proteins including the inclusion forming-related protein NS87 and the fusion-associated small transmembrane (FAST protein NS22. The function of NS22 was confirmed by expression in fish cells. Subsequently, MsReV was compared with two representative aquareoviruses, saltwater fish turbot Scophthalmus maximus reovirus (SMReV and freshwater fish grass carp reovirus strain 109 (GCReV-109. MsReV NS87 and NS22 genes have the same structure and function with those of SMReV, whereas GCReV-109 is either missing the coiled-coil region in NS79 or the gene-encoding NS22. Significant similarities are also revealed among equivalent genome segments between MsReV and SMReV, but a difference is found between MsReV and GCReV-109. Furthermore, phylogenetic analysis showed that 13 aquareoviruses could be divided into freshwater and saline environments subgroups, and MsReV was closely related to SMReV in saline environments. Consequently, these viruses from hosts in saline environments have more genomic structural similarities than the viruses from hosts in freshwater. This is the first study of the relationships between aquareovirus genomic structure and their host environments.

  3. A Taste of Algal Genomes from the Joint Genome Institute

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2012-06-17

    Algae play profound roles in aquatic food chains and the carbon cycle, can impose health and economic costs through toxic blooms, provide models for the study of symbiosis, photosynthesis, and eukaryotic evolution, and are candidate sources for bio-fuels; all of these research areas are part of the mission of DOE's Joint Genome Institute (JGI). To date JGI has sequenced, assembled, annotated, and released to the public the genomes of 18 species and strains of algae, sampling almost all of the major clades of photosynthetic eukaryotes. With more algal genomes currently undergoing analysis, JGI continues its commitment to driving forward basic and applied algal science. Among these ongoing projects are the pan-genome of the dominant coccolithophore Emiliania huxleyi, the interrelationships between the 4 genomes in the nucleomorph-containing Bigelowiella natans and Guillardia theta, and the search for symbiosis genes of lichens.

  4. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

    DEFF Research Database (Denmark)

    Nielsen, Henrik; Engelbrecht, Jacob; Brunak, Søren

    1997-01-01

    We have developed a new method for the identication of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs signicantly better than previous prediction schemes, and can easily be applied to genome...

  5. Extensive expansion of A1 family aspartic proteinases in fungi revealed by evolutionary analyses of 107 complete eukaryotic proteomes

    NARCIS (Netherlands)

    Revuelta, M.V.; Kan, van J.A.L.; Kay, J.; Have, ten A.

    2014-01-01

    The A1 family of eukaryotic aspartic proteinases (APs) forms one of the 16 AP families. Although one of the best characterized families, the recent increase in genome sequence data has revealed many fungal AP homologs with novel sequence characteristics. This study was performed to explore the

  6. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    Science.gov (United States)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  7. Processive and nonprocessive cellulases for biofuel production. Lessons from bacterial genomes and structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Wilson, David B. [Cornell Univ. Ithaca, New York, NY (United States). Dept. of Molecular Biology and Genetics

    2012-01-15

    Cellulases are key enzymes used in many processes for producing liquid fuels from biomass. Currently there many efforts to reduce the cost of cellulases using both structural approaches to improve the properties of individual cellulases and genomic approaches to identify new cellulases as well as other proteins that increase the activity of cellulases in degrading pretreated biomass materials. Fungal GH-61 proteins are important new enzymes that increase the activity of current commercial cellulases leading to lower total protein loading and thus lower cost. Recent work has greatly increased our knowledge of these novel enzymes that appear to be oxido-reductases that target crystalline cellulose and increase its accessibility to cellulases. They appear to carry out the C1 activity originally proposed by Dr Reese. Cellobiose dehydrogenase appears to interact with GH-61 proteins in this function, providing a role for this puzzling enzyme. Cellulase research is making considerable progress and appears to be poised for even greater advances. (orig.)

  8. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria

    Directory of Open Access Journals (Sweden)

    Keeling Patrick J

    2007-09-01

    Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements

  9. Symbiosis and the origin of eukaryotic motility

    Science.gov (United States)

    Margulis, L.; Hinkle, G.

    1991-01-01

    Ongoing work to test the hypothesis of the origin of eukaryotic cell organelles by microbial symbioses is discussed. Because of the widespread acceptance of the serial endosymbiotic theory (SET) of the origin of plastids and mitochondria, the idea of the symbiotic origin of the centrioles and axonemes for spirochete bacteria motility symbiosis was tested. Intracellular microtubular systems are purported to derive from symbiotic associations between ancestral eukaryotic cells and motile bacteria. Four lines of approach to this problem are being pursued: (1) cloning the gene of a tubulin-like protein discovered in Spirocheata bajacaliforniesis; (2) seeking axoneme proteins in spirochets by antibody cross-reaction; (3) attempting to cultivate larger, free-living spirochetes; and (4) studying in detail spirochetes (e.g., Cristispira) symbiotic with marine animals. Other aspects of the investigation are presented.

  10. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  11. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  12. Mathematical model of reproductive death of irradiated eukaryotic cells, which considers saturation of DNA reparation system

    International Nuclear Information System (INIS)

    Knyigavko, V.G.; Ponomarenko, N.S.; Meshcheryakova, O.P.; Protasenya, S.Yu.

    2009-01-01

    A mathematical model of the processes determining reproductive death of the exposed cells was built. The model takes into account the phenomenon of saturation of the system of DNA radiation lesion reparation and structural functional peculiarities of chromatin structure in eukaryotes. The problem of assessment of the model parameters using experimental data was discussed.

  13. Identification of transcriptional signals in Encephalitozoon cuniculi widespread among Microsporidia phylum: support for accurate structural genome annotation

    Directory of Open Access Journals (Sweden)

    Wincker Patrick

    2009-12-01

    Full Text Available Abstract Background Microsporidia are obligate intracellular eukaryotic parasites with genomes ranging in size from 2.3 Mbp to more than 20 Mbp. The extremely small (2.9 Mbp and highly compact (~1 gene/kb genome of the human parasite Encephalitozoon cuniculi has been fully sequenced. The aim of this study was to characterize noncoding motifs that could be involved in regulation of gene expression in E. cuniculi and to show whether these motifs are conserved among the phylum Microsporidia. Results To identify such signals, 5' and 3'RACE-PCR experiments were performed on different E. cuniculi mRNAs. This analysis confirmed that transcription overrun occurs in E. cuniculi and may result from stochastic recognition of the AAUAAA polyadenylation signal. Such experiments also showed highly reduced 5'UTR's (E. cuniculi genes presented a CCC-like motif immediately upstream from the coding start. To characterize other signals involved in differential transcriptional regulation, we then focused our attention on the gene family coding for ribosomal proteins. An AAATTT-like signal was identified upstream from the CCC-like motif. In rare cases the cytosine triplet was shown to be substituted by a GGG-like motif. Comparative genomic studies confirmed that these different signals are also located upstream from genes encoding ribosomal proteins in other microsporidian species including Antonospora locustae, Enterocytozoon bieneusi, Anncaliia algerae (syn. Brachiola algerae and Nosema ceranae. Based on these results a systematic analysis of the ~2000 E. cuniculi coding DNA sequences was then performed and brings to highlight that 364 translation initiation codons (18.29% of total CDSs had been badly predicted. Conclusion We identified various signals involved in the maturation of E. cuniculi mRNAs. Presence of such signals, in phylogenetically distant microsporidian species, suggests that a common regulatory mechanism exists among the microsporidia. Furthermore

  14. Enzymes from Higher Eukaryotes for Industrial Biocatalysis

    Directory of Open Access Journals (Sweden)

    Zhibin Liu

    2004-01-01

    Full Text Available The industrial production of fine chemicals, feed and food ingredients, pharmaceuticals, agrochemicals and their respective intermediates relies on an increasing application of biocatalysis, i.e. on enzyme or whole-cell catalyzed conversions of molecules. Simple procedures for discovery, cloning and over-expression as well as fast growth favour fungi, yeasts and especially bacteria as sources of biocatalysts. Higher eukaryotes also harbour an almost unlimited number of potential biocatalysts, although to date the limited supply of enzymes, the high heterogeneity of enzyme preparations and the hazard of infectious contaminants keep some interesting candidates out of reach for industrial bioprocesses. In the past only a few animal and plant enzymes from agricultural waste materials were employed in food processing. The use of bacterial expression strains or non-conventional yeasts for the heterologous production of efficient eukaryotic enzymes can overcome the bottleneck in enzyme supply and provide sufficient amounts of homogenous enzyme preparations for reliable and economically feasible applications at large scale. Ideal enzymatic processes represent an environmentally friendly, »near-to-completion« conversion of (mostly non-natural substrates to pure products. Recent developments demonstrate the commercial feasibility of large-scale biocatalytic processes employing enzymes from higher eukaryotes (e.g. plants, animals and also their usefulness in some small-scale industrial applications.

  15. Arsenic and Antimony Transporters in Eukaryotes

    Directory of Open Access Journals (Sweden)

    Ewa Maciaszczyk-Dziubinska

    2012-03-01

    Full Text Available Arsenic and antimony are toxic metalloids, naturally present in the environment and all organisms have developed pathways for their detoxification. The most effective metalloid tolerance systems in eukaryotes include downregulation of metalloid uptake, efflux out of the cell, and complexation with phytochelatin or glutathione followed by sequestration into the vacuole. Understanding of arsenic and antimony transport system is of high importance due to the increasing usage of arsenic-based drugs in the treatment of certain types of cancer and diseases caused by protozoan parasites as well as for the development of bio- and phytoremediation strategies for metalloid polluted areas. However, in contrast to prokaryotes, the knowledge about specific transporters of arsenic and antimony and the mechanisms of metalloid transport in eukaryotes has been very limited for a long time. Here, we review the recent advances in understanding of arsenic and antimony transport pathways in eukaryotes, including a dual role of aquaglyceroporins in uptake and efflux of metalloids, elucidation of arsenic transport mechanism by the yeast Acr3 transporter and its role in arsenic hyperaccumulation in ferns, identification of vacuolar transporters of arsenic-phytochelatin complexes in plants and forms of arsenic substrates recognized by mammalian ABC transporters.

  16. Arsenic and Antimony Transporters in Eukaryotes

    Science.gov (United States)

    Maciaszczyk-Dziubinska, Ewa; Wawrzycka, Donata; Wysocki, Robert

    2012-01-01

    Arsenic and antimony are toxic metalloids, naturally present in the environment and all organisms have developed pathways for their detoxification. The most effective metalloid tolerance systems in eukaryotes include downregulation of metalloid uptake, efflux out of the cell, and complexation with phytochelatin or glutathione followed by sequestration into the vacuole. Understanding of arsenic and antimony transport system is of high importance due to the increasing usage of arsenic-based drugs in the treatment of certain types of cancer and diseases caused by protozoan parasites as well as for the development of bio- and phytoremediation strategies for metalloid polluted areas. However, in contrast to prokaryotes, the knowledge about specific transporters of arsenic and antimony and the mechanisms of metalloid transport in eukaryotes has been very limited for a long time. Here, we review the recent advances in understanding of arsenic and antimony transport pathways in eukaryotes, including a dual role of aquaglyceroporins in uptake and efflux of metalloids, elucidation of arsenic transport mechanism by the yeast Acr3 transporter and its role in arsenic hyperaccumulation in ferns, identification of vacuolar transporters of arsenic-phytochelatin complexes in plants and forms of arsenic substrates recognized by mammalian ABC transporters. PMID:22489166

  17. Genome Structural Diversity among 31 Bordetella pertussis Isolates from Two Recent U.S. Whooping Cough Statewide Epidemics.

    Science.gov (United States)

    Bowden, Katherine E; Weigand, Michael R; Peng, Yanhui; Cassiday, Pamela K; Sammons, Scott; Knipe, Kristen; Rowe, Lori A; Loparev, Vladimir; Sheth, Mili; Weening, Keeley; Tondella, M Lucia; Williams, Margaret M

    2016-01-01

    During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural

  18. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures.

    Science.gov (United States)

    Holmes, Avram J; Hollinshead, Marisa O; O'Keefe, Timothy M; Petrov, Victor I; Fariello, Gabriele R; Wald, Lawrence L; Fischl, Bruce; Rosen, Bruce R; Mair, Ross W; Roffman, Joshua L; Smoller, Jordan W; Buckner, Randy L

    2015-01-01

    The goal of the Brain Genomics Superstruct Project (GSP) is to enable large-scale exploration of the links between brain function, behavior, and ultimately genetic variation. To provide the broader scientific community data to probe these associations, a repository of structural and functional magnetic resonance imaging (MRI) scans linked to genetic information was constructed from a sample of healthy individuals. The initial release, detailed in the present manuscript, encompasses quality screened cross-sectional data from 1,570 participants ages 18 to 35 years who were scanned with MRI and completed demographic and health questionnaires. Personality and cognitive measures were obtained on a subset of participants. Each dataset contains a T1-weighted structural MRI scan and either one (n=1,570) or two (n=1,139) resting state functional MRI scans. Test-retest reliability datasets are included from 69 participants scanned within six months of their initial visit. For the majority of participants self-report behavioral and cognitive measures are included (n=926 and n=892 respectively). Analyses of data quality, structure, function, personality, and cognition are presented to demonstrate the dataset's utility.

  19. Distinct Mechanisms of Nuclease-Directed DNA-Structure-Induced Genetic Instability in Cancer Genomes.

    Science.gov (United States)

    Zhao, Junhua; Wang, Guliang; Del Mundo, Imee M; McKinney, Jennifer A; Lu, Xiuli; Bacolla, Albino; Boulware, Stephen B; Zhang, Changsheng; Zhang, Haihua; Ren, Pengyu; Freudenreich, Catherine H; Vasquez, Karen M

    2018-01-30

    Sequences with the capacity to adopt alternative DNA structures have been implicated in cancer etiology; however, the mechanisms are unclear. For example, H-DNA-forming sequences within oncogenes have been shown to stimulate genetic instability in mammals. Here, we report that H-DNA-forming sequences are enriched at translocation breakpoints in human cancer genomes, further implicating them in cancer etiology. H-DNA-induced mutations were suppressed in human cells deficient in the nucleotide excision repair nucleases, ERCC1-XPF and XPG, but were stimulated in cells deficient in FEN1, a replication-related endonuclease. Further, we found that these nucleases cleaved H-DNA conformations, and the interactions of modeled H-DNA with ERCC1-XPF, XPG, and FEN1 proteins were explored at the sub-molecular level. The results suggest mechanisms of genetic instability triggered by H-DNA through distinct structure-specific, cleavage-based replication-independent and replication-dependent pathways, providing critical evidence for a role of the DNA structure itself in the etiology of cancer and other human diseases. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  20. Gene name ambiguity of eukaryotic nomenclatures.

    Science.gov (United States)

    Chen, Lifeng; Liu, Hongfang; Friedman, Carol

    2005-01-15

    With more and more scientific literature published online, the effective management and reuse of this knowledge has become problematic. Natural language processing (NLP) may be a potential solution by extracting, structuring and organizing biomedical information in online literature in a timely manner. One essential task is to recognize and identify genomic entities in text. 'Recognition' can be accomplished using pattern matching and machine learning. But for 'identification' these techniques are not adequate. In order to identify genomic entities, NLP needs a comprehensive resource that specifies and classifies genomic entities as they occur in text and that associates them with normalized terms and also unique identifiers so that the extracted entities are well defined. Online organism databases are an excellent resource to create such a lexical resource. However, gene name ambiguity is a serious problem because it affects the appropriate identification of gene entities. In this paper, we explore the extent of the problem and suggest ways to address it. We obtained gene information from 21 organisms and quantified naming ambiguities within species, across species, with English words and with medical terms. When the case (of letters) was retained, official symbols displayed negligible intra-species ambiguity (0.02%) and modest ambiguities with general English words (0.57%) and medical terms (1.01%). In contrast, the across-species ambiguity was high (14.20%). The inclusion of gene synonyms increased intra-species ambiguity substantially and full names contributed greatly to gene-medical-term ambiguity. A comprehensive lexical resource that covers gene information for the 21 organisms was then created and used to identify gene names by using a straightforward string matching program to process 45,000 abstracts associated with the mouse model organism while ignoring case and gene names that were also English words. We found that 85.1% of correctly retrieved mouse

  1. Eukaryotic RNA polymerase subunit RPB8 is a new relative of the OB family.

    Science.gov (United States)

    Krapp, S; Kelly, G; Reischl, J; Weinzierl, R O; Matthews, S

    1998-02-01

    RNA polymerase II subunit RPB8 is an essential subunit that is highly conserved throughout eukaryotic evolution and is present in all three types of nuclear RNA polymerases. We report the first high resolution structural insight into eukaryotic RNA polymerase architecture with the solution structure of RPB8 from Saccharomyces cerevisiae. It consists of an eight stranded, antiparallel beta-barrel, four short helical regions and a large, unstructured omega-loop. The strands are connected in classic Greek-key fashion. The overall topology is unusual and contains a striking C2 rotational symmetry. Furthermore, it is most likely a novel associate of the oligonucleotide/oligosaccharide (OB) binding protein class.

  2. Synthesis of eukaryotic lipid biomarkers in the bacterial domain

    Science.gov (United States)

    Welander, P. V.; Banta, A. B.; Lee, A. K.; Wei, J. H.

    2017-12-01

    Lipid biomarkers are organic molecules preserved in sediments and sedimentary rocks that can function as geological proxies for certain microbial taxa or for specific environmental conditions. These molecular fossils provide a link between organisms and their environments in both modern and ancient settings and have afforded significant insight into ancient climatic events, mass extinctions, and various evolutionary transitions throughout Earth's history. However, the proper interpretation of lipid biomarkers is dependent on a broad understanding of their diagenetic precursors in modern systems. This includes understanding the taphonomic transformations that these molecules undergo, their biosynthetic pathways, and the ecological conditions that affect their cellular production. In this study, we focus on one group of lipid biomarkers - the sterols. These are polycyclic isoprenoidal lipids that have a high preservation potential and play a critical role in the physiology of most eukaryotes. However, the synthesis and function of these lipids in the bacterial domain has not been fully explored. Here we utilize a combination of bioinformatics, microbial genetics, and biochemistry to demonstrate that bacterial sterol producers are more prevalent in environmental metagenomic samples than in the genomic databases of cultured organisms and to identify novel proteins required to synthesize and modify sterols in bacteria. These proteins represent a distinct pathway for sterol synthesis exclusive to bacteria and indicate that sterol synthesis in bacteria may have evolved independently of eukaryotic sterol biosynthesis. Taken together, these results demonstrate how studies in extant bacteria can provide insight into the biological sources and the biosynthetic pathways of specific lipid biomarkers and in turn may allow for more robust interpretation of biomarker signatures.

  3. XTHs from Fragaria vesca: genomic structure and transcriptomic analysis in ripening fruit and other tissues.

    Science.gov (United States)

    Opazo, María Cecilia; Lizana, Rodrigo; Stappung, Yazmina; Davis, Thomas M; Herrera, Raúl; Moya-León, María Alejandra

    2017-11-07

    Fragaria vesca or 'woodland strawberry' has emerged as an attractive model for the study of ripening of non-climacteric fruit. It has several advantages, such as its small genome and its diploidy. The recent availability of the complete sequence of its genome opens the possibility for further analysis and its use as a reference species. Fruit softening is a physiological event and involves many biochemical changes that take place at the final stages of fruit development; among them, the remodeling of cell walls by the action of a set of enzymes. Xyloglucan endotransglycosylase/hydrolase (XTH) is a cell wall-associated enzyme, which is encoded by a multigene family. Its action modifies the structure of xyloglucans, a diverse group of polysaccharides that crosslink with cellulose microfibrills, affecting therefore the functional structure of the cell wall. The aim of this work is to identify the XTH-encoding genes present in F. vesca and to determine its transcription level in ripening fruit. The search resulted in identification of 26 XTH-encoding genes named as FvXTHs. Genetic structure and phylogenetic analyses were performed allowing the classification of FvXTH genes into three phylogenetic groups: 17 in group I/II, 2 in group IIIA and 4 in group IIIB. Two sequences were included into the ancestral group. Through a comparative analysis, characteristic structural protein domains were found in FvXTH protein sequences. In complement, expression analyses of FvXTHs by qPCR were performed in fruit at different developmental and ripening stages, as well as, in other tissues. The results showed a diverse expression pattern of FvXTHs in several tissues, although most of them are highly expressed in roots. Their expression patterns are not related to their respective phylogenetic groups. In addition, most FvXTHs are expressed in ripe fruit, and interestingly, some of them (FvXTH 18 and 20, belonging to phylogenic group I/II, and FvXTH 25 and 26 to group IIIB) display an

  4. insights from the genome of Melitaea cinxia

    OpenAIRE

    Ahola, Virpi; Wahlberg, Niklas; Frilander, Mikko J.

    2017-01-01

    The first lepidopteran genome (Bombyx mori) was published in 2004. Ten years later the genome of Melitaea cinxia came out as the third butterfly genome published, and the first eukaryotic genome sequenced in Finland. Owing to Ilkka Hanski, the M. cinxia system in the angstrom land Islands has become a famous model for metapopulation biology. More than 20 years of research on this system provides a strong ecological basis upon which a genetic framework could be built. Genetic knowledge is an e...

  5. Genomic Organization of Zebrafish microRNAs

    Directory of Open Access Journals (Sweden)

    Paydar Ima

    2008-05-01

    Full Text Available Abstract Background microRNAs (miRNAs are small (~22 nt non-coding RNAs that regulate cell movement, specification, and development. Expression of miRNAs is highly regulated, both spatially and temporally. Based on direct cloning, sequence conservation, and predicted secondary structures, a large number of miRNAs have been identified in higher eukaryotic genomes but whether these RNAs are simply a subset of a much larger number of noncoding RNA families is unknown. This is especially true in zebrafish where genome sequencing and annotation is not yet complete. Results We analyzed the zebrafish genome to identify the number and location of proven and predicted miRNAs resulting in the identification of 35 new miRNAs. We then grouped all 415 zebrafish miRNAs into families based on seed sequence identity as a means to identify possible functional redundancy. Based on genomic location and expression analysis, we also identified those miRNAs that are likely to be encoded as part of polycistronic transcripts. Lastly, as a resource, we compiled existing zebrafish miRNA expression data and, where possible, listed all experimentally proven mRNA targets. Conclusion Current analysis indicates the zebrafish genome encodes 415 miRNAs which can be grouped into 44 families. The largest of these families (the miR-430 family contains 72 members largely clustered in two main locations along chromosome 4. Thus far, most zebrafish miRNAs exhibit tissue specific patterns of expression.

  6. Comparisons of Copy Number, Genomic Structure, and Conserved Motifs for α-Amylase Genes from Barley, Rice, and Wheat

    Directory of Open Access Journals (Sweden)

    Qisen Zhang

    2017-10-01

    Full Text Available Barley is an important crop for the production of malt and beer. However, crops such as rice and wheat are rarely used for malting. α-amylase is the key enzyme that degrades starch during malting. In this study, we compared the genomic properties, gene copies, and conserved promoter motifs of α-amylase genes in barley, rice, and wheat. In all three crops, α-amylase consists of four subfamilies designated amy1, amy2, amy3, and amy4. In wheat and barley, members of amy1 and amy2 genes are localized on chromosomes 6 and 7, respectively. In rice, members of amy1 genes are found on chromosomes 1 and 2, and amy2 genes on chromosome 6. The barley genome has six amy1 members and three amy2 members. The wheat B genome contains four amy1 members and three amy2 members, while the rice genome has three amy1 members and one amy2 member. The B genome has mostly amy1 and amy2 members among the three wheat genomes. Amy1 promoters from all three crop genomes contain a GA-responsive complex consisting of a GA-responsive element (CAATAAA, pyrimidine box (CCTTTT and TATCCAT/C box. This study has shown that amy1 and amy2 from both wheat and barley have similar genomic properties, including exon/intron structures and GA-responsive elements on promoters, but these differ in rice. Like barley, wheat should have sufficient amy activity to degrade starch completely during malting. Other factors, such as high protein with haze issues and the lack of husk causing Lauting difficulty, may limit the use of wheat for brewing.

  7. Are maternal mitochondria the selfish entities that are masters of the cells of eukaryotic multicellular organisms?