WorldWideScience

Sample records for eukaryotic structural genomics

  1. The Center for Eukaryotic Structural Genomics.

    Science.gov (United States)

    Markley, John L; Aceti, David J; Bingman, Craig A; Fox, Brian G; Frederick, Ronnie O; Makino, Shin-ichi; Nichols, Karl W; Phillips, George N; Primm, John G; Sahu, Sarata C; Vojtik, Frank C; Volkman, Brian F; Wrobel, Russell L; Zolnai, Zsolt

    2009-04-01

    The Center for Eukaryotic Structural Genomics (CESG) is a "specialized" or "technology development" center supported by the Protein Structure Initiative (PSI). CESG's mission is to develop improved methods for the high-throughput solution of structures from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three years of PSI-2, CESG selected targets representing 601 proteins from Homo sapiens, 33 from mouse, 10 from rat, 139 from Galdieria sulphuraria, 35 from Arabidopsis thaliana, 96 from Cyanidioschyzon merolae, 80 from Plasmodium falciparum, 24 from yeast, and about 25 from other eukaryotes. Notably, 30% of all structures of human proteins solved by the PSI Centers were determined at CESG. Whereas eukaryotic proteins generally are considered to be much more challenging targets than prokaryotic proteins, the technology now in place at CESG yields success rates that are comparable to those of the large production centers that work primarily on prokaryotic proteins. We describe here the technological innovations that underlie CESG's platforms for bioinformatics and laboratory information management, target selection, protein production, and structure determination by X-ray crystallography or NMR spectroscopy.

  2. Structural genomics of eukaryotic targets at a laboratory scale.

    Science.gov (United States)

    Busso, Didier; Poussin-Courmontagne, Pierre; Rosé, David; Ripp, Raymond; Litt, Alain; Thierry, Jean-Claude; Moras, Dino

    2005-01-01

    Structural genomics programs are distributed worldwide and funded by large institutions such as the NIH in United-States, the RIKEN in Japan or the European Commission through the SPINE network in Europe. Such initiatives, essentially managed by large consortia, led to technology and method developments at the different steps required to produce biological samples compatible with structural studies. Besides specific applications, method developments resulted mainly upon miniaturization and parallelization. The challenge that academic laboratories faces to pursue structural genomics programs is to produce, at a higher rate, protein samples. The Structural Biology and Genomics Department (IGBMC - Illkirch - France) is implicated in a structural genomics program of high eukaryotes whose goal is solving crystal structures of proteins and their complexes (including large complexes) related to human health and biotechnology. To achieve such a challenging goal, the Department has established a medium-throughput pipeline for producing protein samples suitable for structural biology studies. Here, we describe the setting up of our initiative from cloning to crystallization and we demonstrate that structural genomics may be manageable by academic laboratories by strategic investments in robotic and by adapting classical bench protocols and new developments, in particular in the field of protein expression, to parallelization.

  3. Selecting targets from eukaryotic parasites for structural genomics and drug discovery.

    Science.gov (United States)

    Phan, Isabelle Q H; Stacy, Robin; Myler, Peter J

    2014-01-01

    The selection of targets is the first step for any structural genomics project. The application of structural genomics approaches to drug discovery also starts with the selection of targets. Here, three protocols are described that were developed to select targets from eukaryotic pathogens. These protocols could also be applied to other drug discovery projects.

  4. Selecting targets from eukaryotic parasites for structural genomics and drug discovery

    Science.gov (United States)

    Phan, Isabelle Q. H.; Stacy, Robin; Myler, Peter J.

    2015-01-01

    The selection of targets is the first step for any structural genomics project. The application of structural genomics approaches to drug discovery also starts with the selection of targets. Here, three protocols are described that were developed to select targets from eukaryotic pathogens. These protocols could also be applied to other drug discovery projects. PMID:24590708

  5. Comparative genomics of Eukaryotes

    NARCIS (Netherlands)

    Noort, Vera van

    2007-01-01

    This thesis focuses on developing comparative genomics methods in eukaryotes, with an emphasis on applications for gene function prediction and regulatory element detection. In the past, methods have been developed to predict functional associations between gene pairs in prokaryotes. The challenge

  6. Diversity of eukaryotic DNA replication origins revealed by genome-wide analysis of chromatin structure.

    Directory of Open Access Journals (Sweden)

    Nicolas M Berbenetz

    2010-09-01

    Full Text Available Eukaryotic DNA replication origins differ both in their efficiency and in the characteristic time during S phase when they become active. The biological basis for these differences remains unknown, but they could be a consequence of chromatin structure. The availability of genome-wide maps of nucleosome positions has led to an explosion of information about how nucleosomes are assembled at transcription start sites, but no similar maps exist for DNA replication origins. Here we combine high-resolution genome-wide nucleosome maps with comprehensive annotations of DNA replication origins to identify patterns of nucleosome occupancy at eukaryotic replication origins. On average, replication origins contain a nucleosome depleted region centered next to the ACS element, flanked on both sides by arrays of well-positioned nucleosomes. Our analysis identified DNA sequence properties that correlate with nucleosome occupancy at replication origins genome-wide and that are correlated with the nucleosome-depleted region. Clustering analysis of all annotated replication origins revealed a surprising diversity of nucleosome occupancy patterns. We provide evidence that the origin recognition complex, which binds to the origin, acts as a barrier element to position and phase nucleosomes on both sides of the origin. Finally, analysis of chromatin reconstituted in vitro reveals that origins are inherently nucleosome depleted. Together our data provide a comprehensive, genome-wide view of chromatin structure at replication origins and suggest a model of nucleosome positioning at replication origins in which the underlying sequence occludes nucleosomes to permit binding of the origin recognition complex, which then (likely in concert with nucleosome modifiers and remodelers positions nucleosomes adjacent to the origin to promote replication origin function.

  7. Insight into structure and assembly of the nuclear pore complex by utilizing the genome of a eukaryotic thermophile

    DEFF Research Database (Denmark)

    Amlacher, Stefan; Sarges, Phillip; Flemming, Dirk

    2011-01-01

    Despite decades of research, the structure and assembly of the nuclear pore complex (NPC), which is composed of ~30 nucleoporins (Nups), remain elusive. Here, we report the genome of the thermophilic fungus Chaetomium thermophilum (ct) and identify the complete repertoire of Nups therein. The the...... of a thermophilic eukaryote for studying complex molecular machines....

  8. Structural disorder in eukaryotes.

    Directory of Open Access Journals (Sweden)

    Rita Pancsa

    Full Text Available Based on early bioinformatic studies on a handful of species, the frequency of structural disorder of proteins is generally thought to be much higher in eukaryotes than in prokaryotes. To refine this view, we present here a comparative prediction study and analysis of 194 fully described eukaryotic proteomes and 87 reference prokaryotes for structural disorder. We found that structural disorder does distinguish eukaryotes from prokaryotes, but its frequency spans a very wide range in the two superkingdoms that largely overlap. The number of disordered binding regions and different Pfam domain types also contribute to distinguish eukaryotes from prokaryotes. Unexpectedly, the highest levels--and highest variability--of predicted disorder is found in protists, i.e. single-celled eukaryotes, often surpassing more complex eukaryote organisms, plants and animals. This trend contrasts with that of the number of domain types, which increases rather monotonously toward more complex organisms. The level of structural disorder appears to be strongly correlated with lifestyle, because some obligate intracellular parasites and endosymbionts have the lowest levels, whereas host-changing parasites have the highest level of predicted disorder. We conclude that protists have been the evolutionary hot-bed of experimentation with structural disorder, in a period when structural disorder was actively invented and the major functional classes of disordered proteins established.

  9. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    Energy Technology Data Exchange (ETDEWEB)

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  10. The Future of Multiplexed Eukaryotic Genome Engineering.

    Science.gov (United States)

    Thompson, David B; Aboulhouda, Soufiane; Hysolli, Eriona; Smith, Cory J; Wang, Stan; Castanon, Oscar; Church, George M

    2017-12-28

    Multiplex genome editing is the simultaneous introduction of multiple distinct modifications to a given genome. Though in its infancy, maturation of this field will facilitate powerful new biomedical research approaches and will enable a host of far-reaching biological engineering applications, including new therapeutic modalities and industrial applications, as well as "genome writing" and de-extinction efforts. In this Perspective, we focus on multiplex editing of large eukaryotic genomes. We describe the current state of multiplexed genome editing, the current limits of our ability to multiplex edits, and provide perspective on the many applications that fully realized multiplex editing technologies would enable in higher eukaryotic genomes. We offer a broad look at future directions, covering emergent CRISPR-based technologies, advances in intracellular delivery, and new DNA assembly approaches that may enable future genome editing on a massively multiplexed scale.

  11. What's in a genome? The C-value enigma and the evolution of eukaryotic genome content.

    Science.gov (United States)

    Elliott, Tyler A; Gregory, T Ryan

    2015-09-26

    Some notable exceptions aside, eukaryotic genomes are distinguished from those of Bacteria and Archaea in a number of ways, including chromosome structure and number, repetitive DNA content, and the presence of introns in protein-coding regions. One of the most notable differences between eukaryotic and prokaryotic genomes is in size. Unlike their prokaryotic counterparts, eukaryotes exhibit enormous (more than 60,000-fold) variability in genome size which is not explained by differences in gene number. Genome size is known to correlate with cell size and division rate, and by extension with numerous organism-level traits such as metabolism, developmental rate or body size. Less well described are the relationships between genome size and other properties of the genome, such as gene content, transposable element content, base pair composition and related features. The rapid expansion of 'complete' genome sequencing projects has, for the first time, made it possible to examine these relationships across a wide range of eukaryotes in order to shed new light on the causes and correlates of genome size diversity. This study presents the results of phylogenetically informed comparisons of genome data for more than 500 species of eukaryotes. Several relationships are described between genome size and other genomic parameters, and some recommendations are presented for how these insights can be extended even more broadly in the future. © 2015 The Author(s).

  12. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  13. Structure and function of eukaryotic chromosomes

    Energy Technology Data Exchange (ETDEWEB)

    Hennig, W.

    1987-01-01

    Contents: Introduction; Polytene Chromosomel Giant Chromosomes in Ciliates; The sp-I Genes in the Balbiani Rings of Chironomus Salivary Glands; The White Locus of Drosophila Melanogaster; The Genetic and Molecular Organization of the Dense Cluster of Functionally Related Vital Genes in the DOPA Decarboxylase Region of the Drosophila melanogaster Genome; Heat Shock Puffs and Response to Environmental Stress; The Y Chromosomal Lampbrush Loops of Drosophila; Contributions of Electron Microscopic Spreading Preparations (''Miller Spreads'') to the Analysis of Chromosome Structure; Replication of DNA in Eukaryotic Chromosomes; Gene Amplification in Dipteran Chromosomes; The Significance of Plant Transposable Elements in Biologically Relevant Processes; Arrangement of Chromosomes in Interphase Cell Nuclei; Heterochromatin and the Phenomenon of Chromosome Banding; Multiple Nonhistone Protein-DNA Complexes in Chromatin Regulate the Cell- and Stage-Specific Activity of an Eukaryotic Gene; Genetics of Sex Determination in Eukaryotes; Application of Basic Chromosome Research in Biotechnology and Medicine. This book presents an overview of various aspects of chromosome research.

  14. The Genome of Naegleria gruberi Illuminates Early Eukaryotic Versatility

    Energy Technology Data Exchange (ETDEWEB)

    Fritz-Laylin, Lillian K.; Prochnik, Simon E.; Ginger, Michael L.; Dacks, Joel; Carpenter, Meredith L.; Field, Mark C.; Kuo, Alan; Paredez, Alex; Chapman, Jarrod; Pham, Jonathan; Shu, Shengqiang; Neupane, Rochak; Cipriano, Michael; Mancuso, Joel; Tu, Hank; Salamov, Asaf; Lindquist, Erika; Shapiro, Harris; Lucas, Susan; Grigoriev, Igor V.; Cande, W. Zacheus; Fulton, Chandler; Rokhsar, Daniel S.; Dawson, Scott C.

    2010-03-01

    Genome sequences of diverse free-living protists are essential for understanding eukaryotic evolution and molecular and cell biology. The free-living amoeboflagellate Naegleria gruberi belongs to a varied and ubiquitous protist clade (Heterolobosea) that diverged from other eukaryotic lineages over a billion years ago. Analysis of the 15,727 protein-coding genes encoded by Naegleria's 41 Mb nuclear genome indicates a capacity for both aerobic respiration and anaerobic metabolism with concomitant hydrogen production, with fundamental implications for the evolution of organelle metabolism. The Naegleria genome facilitates substantially broader phylogenomic comparisons of free-living eukaryotes than previously possible, allowing us to identify thousands of genes likely present in the pan-eukaryotic ancestor, with 40% likely eukaryotic inventions. Moreover, we construct a comprehensive catalog of amoeboid-motility genes. The Naegleria genome, analyzed in the context of other protists, reveals a remarkably complex ancestral eukaryote with a rich repertoire of cytoskeletal, sexual, signaling, and metabolic modules.

  15. NMR screening and crystal quality of bacterially expressed prokaryotic and eukaryotic proteins in a structural genomics pipeline

    Science.gov (United States)

    Page, Rebecca; Peti, Wolfgang; Wilson, Ian A.; Stevens, Raymond C.; Wüthrich, Kurt

    2005-01-01

    In the Joint Center for Structural Genomics, one-dimensional (1D) 1H NMR spectroscopy is routinely used to characterize the folded state of protein targets and, thus, serves to guide subsequent crystallization efforts and to identify proteins for NMR structure determination. Here, we describe 1D 1H NMR screening of a group of 79 mouse homologue proteins, which correlates the NMR data with the outcome of subsequent crystallization experiments and crystallographic structure determination. Based on the 1D 1H NMR spectra, the proteins are classified into four groups, “A” to “D.” A-type proteins are candidates for structure determination by NMR or crystallography; “B”-type are earmarked for crystallography; “C” indicates folded globular proteins with broadened line shapes; and “D” are nonglobular, “unfolded” polypeptides. The results obtained from coarse- and fine-screen crystallization trials imply that only A- and B-type proteins should be used for extensive crystallization trials in the future, with C and D proteins subjected only to coarse-screen crystallization trials. Of the presently studied 79 soluble protein targets, 63% yielded A- or B-quality 1D 1H NMR spectra. Although similar yields of crystallization hits were obtained for all four groups, A to D, crystals from A- and B-type proteins diffracted on average to significantly higher resolution than crystals produced from C- or D-type proteins. Furthermore, the output of refined crystal structures from this test set of proteins was 4-fold higher for A- and B-type than for C- and D-type proteins. PMID:15677718

  16. diArk – a resource for eukaryotic genome research

    Directory of Open Access Journals (Sweden)

    Kollmar Martin

    2007-04-01

    Full Text Available Abstract Background The number of completed eukaryotic genome sequences and cDNA projects has increased exponentially in the past few years although most of them have not been published yet. In addition, many microarray analyses yielded thousands of sequenced EST and cDNA clones. For the researcher interested in single gene analyses (from a phylogenetic, a structural biology or other perspective it is therefore important to have up-to-date knowledge about the various resources providing primary data. Description The database is built around 3 central tables: species, sequencing projects and publications. The species table contains commonly and alternatively used scientific names, common names and the complete taxonomic information. For projects the sequence type and links to species project web-sites and species homepages are stored. All publications are linked to projects. The web-interface provides comprehensive search modules with detailed options and three different views of the selected data. We have especially focused on developing an elaborate taxonomic tree search tool that allows the user to instantaneously identify e.g. the closest relative to the organism of interest. Conclusion We have developed a database, called diArk, to store, organize, and present the most relevant information about completed genome projects and EST/cDNA data from eukaryotes. Currently, diArk provides information about 415 eukaryotes, 823 sequencing projects, and 248 publications.

  17. Single Cell Genomics and Transcriptomics for Unicellular Eukaryotes

    Energy Technology Data Exchange (ETDEWEB)

    Ciobanu, Doina; Clum, Alicia; Singh, Vasanth; Salamov, Asaf; Han, James; Copeland, Alex; Grigoriev, Igor; James, Timothy; Singer, Steven; Woyke, Tanja; Malmstrom, Rex; Cheng, Jan-Fang

    2014-03-14

    Despite their small size, unicellular eukaryotes have complex genomes with a high degree of plasticity that allow them to adapt quickly to environmental changes. Unicellular eukaryotes live with prokaryotes and higher eukaryotes, frequently in symbiotic or parasitic niches. To this day their contribution to the dynamics of the environmental communities remains to be understood. Unfortunately, the vast majority of eukaryotic microorganisms are either uncultured or unculturable, making genome sequencing impossible using traditional approaches. We have developed an approach to isolate unicellular eukaryotes of interest from environmental samples, and to sequence and analyze their genomes and transcriptomes. We have tested our methods with six species: an uncharacterized protist from cellulose-enriched compost identified as Platyophrya, a close relative of P. vorax; the fungus Metschnikowia bicuspidate, a parasite of water flea Daphnia; the mycoparasitic fungi Piptocephalis cylindrospora, a parasite of Cokeromyces and Mucor; Caulochytrium protosteloides, a parasite of Sordaria; Rozella allomycis, a parasite of the water mold Allomyces; and the microalgae Chlamydomonas reinhardtii. Here, we present the four components of our approach: pre-sequencing methods, sequence analysis for single cell genome assembly, sequence analysis of single cell transcriptomes, and genome annotation. This technology has the potential to uncover the complexity of single cell eukaryotes and their role in the environmental samples.

  18. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

    Directory of Open Access Journals (Sweden)

    Yubo Hou

    Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

  19. An Evolutionary Network of Genes Present in the Eukaryote Common Ancestor Polls Genomes on Eukaryotic and Mitochondrial Origin

    Science.gov (United States)

    Thiergart, Thorsten; Landan, Giddy; Schenk, Marc; Dagan, Tal; Martin, William F.

    2012-01-01

    To test the predictions of competing and mutually exclusive hypotheses for the origin of eukaryotes, we identified from a sample of 27 sequenced eukaryotic and 994 sequenced prokaryotic genomes 571 genes that were present in the eukaryote common ancestor and that have homologues among eubacterial and archaebacterial genomes. Maximum-likelihood trees identified the prokaryotic genomes that most frequently contained genes branching as the sister to the eukaryotic nuclear homologues. Among the archaebacteria, euryarchaeote genomes most frequently harbored the sister to the eukaryotic nuclear gene, whereas among eubacteria, the α-proteobacteria were most frequently represented within the sister group. Only 3 genes out of 571 gave a 3-domain tree. Homologues from α-proteobacterial genomes that branched as the sister to nuclear genes were found more frequently in genomes of facultatively anaerobic members of the rhiozobiales and rhodospirilliales than in obligate intracellular ricketttsial parasites. Following α-proteobacteria, the most frequent eubacterial sister lineages were γ-proteobacteria, δ-proteobacteria, and firmicutes, which were also the prokaryote genomes least frequently found as monophyletic groups in our trees. Although all 22 higher prokaryotic taxa sampled (crenarchaeotes, γ-proteobacteria, spirochaetes, chlamydias, etc.) harbor genes that branch as the sister to homologues present in the eukaryotic common ancestor, that is not evidence of 22 different prokaryotic cells participating at eukaryote origins because prokaryotic “lineages” have laterally acquired genes for more than 1.5 billion years since eukaryote origins. The data underscore the archaebacterial (host) nature of the eukaryotic informational genes and the eubacterial (mitochondrial) nature of eukaryotic energy metabolism. The network linking genes of the eukaryote ancestor to contemporary homologues distributed across prokaryotic genomes elucidates eukaryote gene origins in a

  20. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  1. Eukaryotic and Prokaryotic Cytoskeletons: Structure and Mechanics

    Science.gov (United States)

    Gopinathan, Ajay

    2013-03-01

    The eukaryotic cytoskeleton is an assembly of filamentous proteins and a host of associated proteins that collectively serve functional needs ranging from spatial organization and transport to the production and transmission of forces. These systems can exhibit a wide variety of non-equilibrium, self-assembled phases depending on context and function. While much recent progress has been made in understanding the self-organization, rheology and nonlinear mechanical properties of such active systems, in this talk, we will concentrate on some emerging aspects of cytoskeletal physics that are promising. One such aspect is the influence of cytoskeletal network topology and its dynamics on both active and passive intracellular transport. Another aspect we will highlight is the interplay between chirality of filaments, their elasticity and their interactions with the membrane that can lead to novel conformational states with functional implications. Finally we will consider homologs of cytoskeletal proteins in bacteria, which are involved in templating cell growth, segregating genetic material and force production, which we will discuss with particular reference to contractile forces during cell division. These prokaryotic structures function in remarkably similar yet fascinatingly different ways from their eukaryotic counterparts and can enrich our understanding of cytoskeletal functioning as a whole.

  2. Eukaryote-to-eukaryote gene transfer gives rise to genome mosaicism in euglenids

    Directory of Open Access Journals (Sweden)

    Weber Andreas PM

    2011-04-01

    Full Text Available Abstract Background Euglenophytes are a group of photosynthetic flagellates possessing a plastid derived from a green algal endosymbiont, which was incorporated into an ancestral host cell via secondary endosymbiosis. However, the impact of endosymbiosis on the euglenophyte nuclear genome is not fully understood due to its complex nature as a 'hybrid' of a non-photosynthetic host cell and a secondary endosymbiont. Results We analyzed an EST dataset of the model euglenophyte Euglena gracilis using a gene mining program designed to detect laterally transferred genes. We found E. gracilis genes showing affinity not only with green algae, from which the secondary plastid in euglenophytes evolved, but also red algae and/or secondary algae containing red algal-derived plastids. Phylogenetic analyses of these 'red lineage' genes suggest that E. gracilis acquired at least 14 genes via eukaryote-to-eukaryote lateral gene transfer from algal sources other than the green algal endosymbiont that gave rise to its current plastid. We constructed an EST library of the aplastidic euglenid Peranema trichophorum, which is a eukaryovorous relative of euglenophytes, and also identified 'red lineage' genes in its genome. Conclusions Our data show genome mosaicism in E. gracilis and P. trichophorum. One possible explanation for the presence of these genes in these organisms is that some or all of them were independently acquired by lateral gene transfer and contributed to the successful integration and functioning of the green algal endosymbiont as a secondary plastid. Alternative hypotheses include the presence of a phagocytosed alga as the single source of those genes, or a cryptic tertiary endosymbiont harboring secondary plastid of red algal origin, which the eukaryovorous ancestor of euglenophytes had acquired prior to the secondary endosymbiosis of a green alga.

  3. The evolutionary dynamics of operon distributions in eukaryote genomes.

    Science.gov (United States)

    Cutter, Asher D; Agrawal, Aneil F

    2010-06-01

    Genes in nematode and ascidian genomes frequently occur in operons--multiple genes sharing a common promoter to generate a polycistronic primary transcript--and such genes comprise 15-20% of the coding genome for Caenorhabditis elegans and Ciona intestinalis. Recent work in nematodes has demonstrated that the identity of genes within operons is highly conserved among species and that the unifying feature of genes within operons is that they are expressed in germline tissue. However, it is generally unknown what processes are responsible for generating the distribution of operon sizes across the genome, which are composed of up to eight genes per operon. Here we investigate several models for operon evolution to better understand their abundance, distribution of sizes, and evolutionary dynamics over time. We find that birth-death models of operon evolution reasonably describe the relative abundance of operons of different sizes in the C. elegans and Ciona genomes and generate predictions about the number of monocistronic, nonoperon genes that likely participate in the birth-death process. This theory, and applications to C. elegans and Ciona, motivates several new and testable hypotheses about eukaryote operon evolution.

  4. The language of methylation in genomics of eukaryotes.

    Science.gov (United States)

    Volpe, P

    2005-05-01

    Background studies have shown that 6-methylaminopurine (m6A) and 5-methylcytosine (m5C), detected in DNA, are products of its post-synthetic modification. At variance with bacterial genomes exhibiting both, eukaryotic genomes essentially carry only m5C in m5CpG doublets. This served to establish that, although a slight extra-S phase asymmetric methylation occurs de novo on 5'-CpC-3'/3'GpG-5', 5'-CpT-3'/3'-GpA-5', and 5'-CpA-3'/3'-GpT-5' dinucleotide pairs, a heavy methylation during S involves Okazaki fragments and thus semiconservatively newly made chains to guarantee genetic maintenance of -CH3 patterns in symmetrically dimethylated 5'-m5CpG-3'/3'-Gpm5C-5' dinucleotide pairs. On the other hand, whilst inverse correlation was observed between bulk DNA methylation, in S, and bulk RNA transcription, in G1 and G2, probes of methylated DNA helped to discover the presence of coding (exon) and uncoding (intron) sequences in the eukaryotic gene. These achievements led to the search for a language that genes regulated by methylation should have in common. Such a deciphering, initially providing restriction minimaps of hypermethylatable promoters and introns vs. hypomethylable exons, became feasible when bisulfite methodology allowed the direct sequencing of m5C. It emerged that, while in lymphocytes, where the transglutaminase gene (hTGc) is inactive, the promoter shows two fully methylated CpG-rich domains at 5 and one fully unmethylated CpG-rich domain at 3' (including the site +1 and a 5'-UTR), in HUVEC cells, where hTGc is active, in the first CpG-rich domain of its promoter four CpGs lack -CH3: a result suggesting new hypotheses on the mechanism of transcription, particularly in connection with radio-induced DNA demethylation.

  5. From the Cover: Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features

    Science.gov (United States)

    Derelle, Evelyne; Ferraz, Conchita; Rombauts, Stephane; Rouzé, Pierre; Worden, Alexandra Z.; Robbens, Steven; Partensky, Frédéric; Degroeve, Sven; Echeynié, Sophie; Cooke, Richard; Saeys, Yvan; Wuyts, Jan; Jabbari, Kamel; Bowler, Chris; Panaud, Olivier; Piégu, Benoît; Ball, Steven G.; Ral, Jean-Philippe; Bouget, François-Yves; Piganeau, Gwenael; de Baets, Bernard; Picard, André; Delseny, Michel; Demaille, Jacques; van de Peer, Yves; Moreau, Hervé

    2006-08-01

    The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry. genome heterogeneity | genome sequence | green alga | Prasinophyceae | gene prediction

  6. Positive selection for unpreferred codon usage in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    Galagan James E

    2007-07-01

    Full Text Available Abstract Background Natural selection has traditionally been understood as a force responsible for pushing genes to states of higher translational efficiency, whereas lower translational efficiency has been explained by neutral mutation and genetic drift. We looked for evidence of directional selection resulting in increased unpreferred codon usage (and presumably reduced translational efficiency in three divergent clusters of eukaryotic genomes using a simple optimal-codon-based metric (Kp/Ku. Results Here we show that for some genes natural selection is indeed responsible for causing accelerated unpreferred codon substitution, and document the scope of this selection. In Cryptococcus and to a lesser extent Drosophila, we find many genes showing a statistically significant signal of selection for unpreferred codon usage in one or more lineages. We did not find evidence for this type of selection in Saccharomyces. The signal of positive selection observed from unpreferred synonymous codon substitutions is coincident in Cryptococcus and Drosophila with the distribution of upstream open reading frames (uORFs, another genic feature known to reduce translational efficiency. Functional enrichment analysis of genes exhibiting low Kp/Ku ratios reveals that genes in regulatory roles are particularly subject to this type of selection. Conclusion Through genome-wide scans, we find recent selection for unpreferred codon usage at approximately 1% of genetic loci in a Cryptococcus and several genes in Drosophila. Unpreferred codons can impede translation efficiency, and we find that genes with translation-impeding uORFs are enriched for this selection signal. We find that regulatory genes are particularly likely to be subject to selection for unpreferred codon usage. Given that expression noise can propagate through regulatory cascades, and that low translational efficiency can reduce expression noise, this finding supports the hypothesis that translational

  7. Patterns of exon-intron architecture variation of genes in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    Chen Jian-Qun

    2009-01-01

    Full Text Available Abstract Background The origin and importance of exon-intron architecture comprises one of the remaining mysteries of gene evolution. Several studies have investigated the variations of intron length, GC content, ordinal position in a gene and divergence. However, there is little study about the structural variation of exons and introns. Results We investigated the length, GC content, ordinal position and divergence in both exons and introns of 13 eukaryotic genomes, representing plant and animal. Our analyses revealed that three basic patterns of exon-intron variation were present in nearly all analyzed genomes (P Conclusion Although the factors contributing to these patterns have not been identified, our results provide three important clues: common factor(s exist and may shape both exons and introns; the ordinal reduction patterns may reflect a time-orderly evolution; and the larger first and last exons may be splicing-required. These clues provide a framework for elucidating mechanisms involved in the organization of eukaryotic genomes and particularly in building exon-intron structures.

  8. Genomic and experimental evidence suggests that Verrucomicrobium spinosum interacts with eukaryotes

    Directory of Open Access Journals (Sweden)

    Michelle eSait

    2011-10-01

    Full Text Available Our knowledge of pathogens and symbionts is heavily biased towards phyla containing species that are straightforward to isolate in pure culture. Novel bacterial phyla are often represented by a handful of strains, and the number of species interacting with eukaryotes is likely underestimated. Identification of predicted pathogenesis and symbiosis determinants such as the Type III Secretion System (T3SS in the genomes of ‘free-living’ bacteria suggests that these microbes participate in uncharacterized interactions with eukaryotes. Our study aimed to test this hypothesis on Verrucomicrobium spinosum (phylum Verrucomicrobia and to begin characterization of its predicted T3SS. We showed the putative T3SS structural genes to be transcriptionally active, and that expression of predicted effector proteins was toxic to yeast in an established functional screen. Our results suggest that the predicted T3SS genes of V. spinosum could encode a functional T3SS, although further work is needed to determine whether V. spinosum produces a T3SS injectisome that delivers the predicted effectors. In the absence of a known eukaryotic host, we made use of invertebrate infection models. The injection or feeding of V. spinosum to Drosophila melanogaster and Caenorhabiditis elegans, respectively, was shown to result in increased mortality rates relative to controls, a phenomenon exaggerated in C. elegans mutants hypersensitive to pathogen infection. This finding, although not conclusively demonstrating pathogenesis, suggests that V. spinosum is capable of pathogenic activity towards an invertebrate host. Symbiotic interactions with a natural host provide an alternative explanation for the results seen in the invertebrate models. Further work is needed to determine whether V. spinosum can establish and maintain interactions with eukaryotic species found in its natural habitat, and whether the predicted T3SS is directly involved in pathogenic or symbiotic activity.

  9. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    Directory of Open Access Journals (Sweden)

    Hamilton John P

    2007-10-01

    Full Text Available Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1 the submission of gene annotation to an annotation project, 2 the review of the submitted models by project annotators, and 3 the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP, an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the

  10. Genome-wide analysis of eukaryote thaumatin-like proteins (TLPs with an emphasis on poplar

    Directory of Open Access Journals (Sweden)

    Duplessis Sébastien

    2011-02-01

    Full Text Available Abstract Background Plant inducible immunity includes the accumulation of a set of defense proteins during infection called pathogenesis-related (PR proteins, which are grouped into families termed PR-1 to PR-17. The PR-5 family is composed of thaumatin-like proteins (TLPs, which are responsive to biotic and abiotic stress and are widely studied in plants. TLPs were also recently discovered in fungi and animals. In the poplar genome, TLPs are over-represented compared with annual species and their transcripts strongly accumulate during stress conditions. Results Our analysis of the poplar TLP family suggests that the expansion of this gene family was followed by diversification, as differences in expression patterns and predicted properties correlate with phylogeny. In particular, we identified a clade of poplar TLPs that cluster to a single 350 kb locus of chromosome I and that are up-regulated by poplar leaf rust infection. A wider phylogenetic analysis of eukaryote TLPs - including plant, animal and fungi sequences - shows that TLP gene content and diversity increased markedly during land plant evolution. Mapping the reported functions of characterized TLPs to the eukaryote phylogenetic tree showed that antifungal or glycan-lytic properties are widespread across eukaryote phylogeny, suggesting that these properties are shared by most TLPs and are likely associated with the presence of a conserved acidic cleft in their 3D structure. Also, we established an exhaustive catalog of TLPs with atypical architectures such as small-TLPs, TLP-kinases and small-TLP-kinases, which have potentially developed alternative functions (such as putative receptor kinases for pathogen sensing and signaling. Conclusion Our study, based on the most recent plant genome sequences, provides evidence for TLP gene family diversification during land plant evolution. We have shown that the diverse functions described for TLPs are not restricted to specific clades but seem

  11. Genome-wide Purification of Extrachromosomal Circular DNA from Eukaryotic Cells

    DEFF Research Database (Denmark)

    Møller, Henrik D.; Bojsen, Rasmus Kenneth; Tachibana, Chris

    2016-01-01

    Extrachromosomal circular DNAs (eccDNAs) are common genetic elements in Saccharomyces cerevisiae and are reported in other eukaryotes as well. EccDNAs contribute to genetic variation among somatic cells in multicellular organisms and to evolution of unicellular eukaryotes. Sensitive methods...... for detecting eccDNA are needed to clarify how these elements affect genome stability and how environmental and biological factors induce their formation in eukaryotic cells. This video presents a sensitive eccDNA-purification method called Circle-Seq. The method encompasses column purification of circular DNA...... circularization is conserved between strains at these loci. In sum, the Circle-Seq method has broad applicability for genome-scale screening for eccDNA in eukaryotes as well as for detecting specific eccDNA types....

  12. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  13. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    Science.gov (United States)

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  14. An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm.

    Science.gov (United States)

    Chowdhury, Biswanath; Garai, Arnav; Garai, Gautam

    2017-10-24

    Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation. In this paper, we propose a new gene prediction technique based on Genetic Algorithm (GA) to determine the optimal positions of exons of a gene in a chromosome or genome. The correct identification of the coding and non-coding regions is difficult and computationally demanding. The proposed genetic-based method, named Gene Prediction with Genetic Algorithm (GPGA), reduces this problem by searching only one exon at a time instead of all exons along with its introns. This representation carries a significant advantage in that it breaks the entire gene-finding problem into a number of smaller sub-problems, thereby reducing the computational complexity. We tested the performance of the GPGA with existing benchmark datasets and compared the results with well-known and relevant techniques. The comparison shows the better or comparable performance of the proposed method. We also used GPGA for annotating the human chromosome 21 (HS21) using cross-species comparisons with the mouse orthologs. It was noted that the GPGA predicted true genes with better accuracy than other well-known approaches.

  15. Widespread Horizontal Gene Transfer from Circular Single-stranded DNA Viruses to Eukaryotic Genomes

    Directory of Open Access Journals (Sweden)

    Xie Jiatao

    2011-09-01

    Full Text Available Abstract Background In addition to vertical transmission, organisms can also acquire genes from other distantly related species or from their extra-chromosomal elements (plasmids and viruses via horizontal gene transfer (HGT. It has been suggested that phages represent substantial forces in prokaryotic evolution. In eukaryotes, retroviruses, which can integrate into host genome as an obligate step in their replication strategy, comprise approximately 8% of the human genome. Unlike retroviruses, few members of other virus families are known to transfer genes to host genomes. Results Here we performed a systematic search for sequences related to circular single-stranded DNA (ssDNA viruses in publicly available eukaryotic genome databases followed by comprehensive phylogenetic analysis. We conclude that the replication initiation protein (Rep-related sequences of geminiviruses, nanoviruses and circoviruses have been frequently transferred to a broad range of eukaryotic species, including plants, fungi, animals and protists. Some of the transferred viral genes were conserved and expressed, suggesting that these genes have been coopted to assume cellular functions in the host genomes. We also identified geminivirus-like and parvovirus-like transposable elements in genomes of fungi and lower animals, respectively, and thereby provide direct evidence that eukaryotic transposons could derive from ssDNA viruses. Conclusions Our discovery extends the host range of circular ssDNA viruses and sheds light on the origin and evolution of these viruses. It also suggests that ssDNA viruses act as an unforeseen source of genetic innovation in their hosts.

  16. MicroRNAs: The Mega Regulators in Eukaryotic Genomes

    Directory of Open Access Journals (Sweden)

    Iftekhar Ahmed Baloch

    2013-09-01

    Full Text Available MicroRNAs (miRNAs are endogenous, small, noncoding RNAs of 18-25 nucleotide (nt in length that negatively regulate their complementary messenger RNAs (mRNAs at the transcriptional and posttranscriptional level in many eukaryotic organisms. By affecting the gene regulation, miRNAs are likely to be concerned with most biological processes. Majority of the miRNA genes are found in intergenic regions or in anti-sense orientation to genes and have their own miRNA gene promoter and regulatory units. In contrast to their name and size, the miRNAs perform mega functions in eukaryotic organisms. They perform important functions in plants and animals during growth, organogenesis, transgene suppression, signaling pathway, environmental stresses, disease development and defense against the invading viruses. miRNAs are evolutionarily conserved from species to species within the same kingdom. However, there is a controversy among scientists about their conservation from animals to plants. Their conserved nature becomes an important logical tool for homologous discovery of miRNAs in other species. This review is aimed at describing some basic concepts regarding biogenesis and functions of miRNAs.

  17. N6-adenine DNA methylation demystified in eukaryotic genome: From biology to pathology.

    Science.gov (United States)

    Parashar, Nidarshana Chaturvedi; Parashar, Gaurav; Nayyar, Harsh; Sandhir, Rajat

    2018-01-01

    N6-methyl-2'-deoxyadenosine (m6dA) is a well characterized DNA modification in prokaryotes. Its existence in eukaryotic DNA remained doubtful until recently. Evidence suggests that the m6dA levels decrease with the increasing complexity of eukaryotic genomes. Analysis of m6dA levels in genome of lower eukaryotes reveals its role in gene regulation, nucleosome positioning and early development. In higher eukaryotes m6dA is enriched in nongenic region compared to genic region, preferentially in chromosome X and 13 suggesting a chromosome bias. High levels of m6dA during embryogenesis as compared to adult tissues are indicative of its importance during development and possible association with regeneration capabilities. Further, decreased levels of m6dA in diabetic patients has been correlated with expression of Fat mass and obesity-associated (FTO) which acts as m6A demethylase. m6dA levels have also been reported to be decreased in different types of cancers. The present review highlights the role of m6dA modification in eukaryotic genomes and its functional importance in regulation of physiological and pathological processes. Copyright © 2017 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  18. Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells

    Directory of Open Access Journals (Sweden)

    Valentina eBoeva

    2016-02-01

    Full Text Available Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.

  19. Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis

    DEFF Research Database (Denmark)

    Pedersen, Anders Gorm; Nielsen, Henrik

    1997-01-01

    Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role.This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known...

  20. MCM Paradox: Abundance of Eukaryotic Replicative Helicases and Genomic Integrity

    Directory of Open Access Journals (Sweden)

    Mitali Das

    2014-01-01

    Full Text Available As a crucial component of DNA replication licensing system, minichromosome maintenance (MCM 2–7 complex acts as the eukaryotic DNA replicative helicase. The six related MCM proteins form a heterohexamer and bind with ORC, CDC6, and Cdt1 to form the prereplication complex. Although the MCMs are well known as replicative helicases, their overabundance and distribution patterns on chromatin present a paradox called the “MCM paradox.” Several approaches had been taken to solve the MCM paradox and describe the purpose of excess MCMs distributed beyond the replication origins. Alternative functions of these MCMs rather than a helicase had also been proposed. This review focuses on several models and concepts generated to solve the MCM paradox coinciding with their helicase function and provides insight into the concept that excess MCMs are meant for licensing dormant origins as a backup during replication stress. Finally, we extend our view towards the effect of alteration of MCM level. Though an excess MCM constituent is needed for normal cells to withstand stress, there must be a delineation of the threshold level in normal and malignant cells. This review also outlooks the future prospects to better understand the MCM biology.

  1. Insights into the red algae and eukaryotic evolution from the genome of Porphyra umbilicalis (Bangiophyceae, Rhodophyta).

    Science.gov (United States)

    Brawley, Susan H; Blouin, Nicolas A; Ficko-Blean, Elizabeth; Wheeler, Glen L; Lohr, Martin; Goodson, Holly V; Jenkins, Jerry W; Blaby-Haas, Crysten E; Helliwell, Katherine E; Chan, Cheong Xin; Marriage, Tara N; Bhattacharya, Debashish; Klein, Anita S; Badis, Yacine; Brodie, Juliet; Cao, Yuanyu; Collén, Jonas; Dittami, Simon M; Gachon, Claire M M; Green, Beverley R; Karpowicz, Steven J; Kim, Jay W; Kudahl, Ulrich Johan; Lin, Senjie; Michel, Gurvan; Mittag, Maria; Olson, Bradley J S C; Pangilinan, Jasmyn L; Peng, Yi; Qiu, Huan; Shu, Shengqiang; Singer, John T; Smith, Alison G; Sprecher, Brittany N; Wagner, Volker; Wang, Wenfei; Wang, Zhi-Yong; Yan, Juying; Yarish, Charles; Zäuner-Riek, Simone; Zhuang, Yunyun; Zou, Yong; Lindquist, Erika A; Grimwood, Jane; Barry, Kerrie W; Rokhsar, Daniel S; Schmutz, Jeremy; Stiller, John W; Grossman, Arthur R; Prochnik, Simon E

    2017-08-01

    Porphyra umbilicalis (laver) belongs to an ancient group of red algae (Bangiophyceae), is harvested for human food, and thrives in the harsh conditions of the upper intertidal zone. Here we present the 87.7-Mbp haploid Porphyra genome (65.8% G + C content, 13,125 gene loci) and elucidate traits that inform our understanding of the biology of red algae as one of the few multicellular eukaryotic lineages. Novel features of the Porphyra genome shared by other red algae relate to the cytoskeleton, calcium signaling, the cell cycle, and stress-tolerance mechanisms including photoprotection. Cytoskeletal motor proteins in Porphyra are restricted to a small set of kinesins that appear to be the only universal cytoskeletal motors within the red algae. Dynein motors are absent, and most red algae, including Porphyra, lack myosin. This surprisingly minimal cytoskeleton offers a potential explanation for why red algal cells and multicellular structures are more limited in size than in most multicellular lineages. Additional discoveries further relating to the stress tolerance of bangiophytes include ancestral enzymes for sulfation of the hydrophilic galactan-rich cell wall, evidence for mannan synthesis that originated before the divergence of green and red algae, and a high capacity for nutrient uptake. Our analyses provide a comprehensive understanding of the red algae, which are both commercially important and have played a major role in the evolution of other algal groups through secondary endosymbioses.

  2. Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe.

    Science.gov (United States)

    Marques, Catarina A; Dickens, Nicholas J; Paape, Daniel; Campbell, Samantha J; McCulloch, Richard

    2015-10-19

    DNA replication initiates on defined genome sites, termed origins. Origin usage appears to follow common rules in the eukaryotic organisms examined to date: all chromosomes are replicated from multiple origins, which display variations in firing efficiency and are selected from a larger pool of potential origins. To ask if these features of DNA replication are true of all eukaryotes, we describe genome-wide origin mapping in the parasite Leishmania. Origin mapping in Leishmania suggests a striking divergence in origin usage relative to characterized eukaryotes, since each chromosome appears to be replicated from a single origin. By comparing two species of Leishmania, we find evidence that such origin singularity is maintained in the face of chromosome fusion or fission events during evolution. Mapping Leishmania origins suggests that all origins fire with equal efficiency, and that the genomic sites occupied by origins differ from related non-origins sites. Finally, we provide evidence that origin location in Leishmania displays striking conservation with Trypanosoma brucei, despite the latter parasite replicating its chromosomes from multiple, variable strength origins. The demonstration of chromosome replication for a single origin in Leishmania, a microbial eukaryote, has implications for the evolution of origin multiplicity and associated controls, and may explain the pervasive aneuploidy that characterizes Leishmania chromosome architecture.

  3. Optimizing eukaryotic cell hosts for protein production through systems biotechnology and genome-scale modeling.

    Science.gov (United States)

    Gutierrez, Jahir M; Lewis, Nathan E

    2015-07-01

    Eukaryotic cell lines, including Chinese hamster ovary cells, yeast, and insect cells, are invaluable hosts for the production of many recombinant proteins. With the advent of genomic resources, one can now leverage genome-scale computational modeling of cellular pathways to rationally engineer eukaryotic host cells. Genome-scale models of metabolism include all known biochemical reactions occurring in a specific cell. By describing these mathematically and using tools such as flux balance analysis, the models can simulate cell physiology and provide targets for cell engineering that could lead to enhanced cell viability, titer, and productivity. Here we review examples in which metabolic models in eukaryotic cell cultures have been used to rationally select targets for genetic modification, improve cellular metabolic capabilities, design media supplementation, and interpret high-throughput omics data. As more comprehensive models of metabolism and other cellular processes are developed for eukaryotic cell culture, these will enable further exciting developments in cell line engineering, thus accelerating recombinant protein production and biotechnology in the years to come. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Alternative Splicing: A Potential Source of Functional Innovation in the Eukaryotic Genome

    Directory of Open Access Journals (Sweden)

    Lu Chen

    2012-01-01

    Full Text Available Alternative splicing (AS is a common posttranscriptional process in eukaryotic organisms, by which multiple distinct functional transcripts are produced from a single gene. The release of the human genome draft revealed a much smaller number of genes than anticipated. Because of its potential role in expanding protein diversity, interest in alternative splicing has been increasing over the last decade. Although recent studies have shown that 94% human multiexon genes undergo AS, evolution of AS and thus its potential role in functional innovation in eukaryotic genomes remain largely unexplored. Here we review available evidence regarding the evolution of AS prevalence and functional role. In addition we stress the need to correct for the strong effect of transcript coverage in AS detection and set out a strategy to ultimately elucidate the extent of the role of AS in functional innovation on a genomic scale.

  5. Oceanographic structure drives the assembly processes of microbial eukaryotic communities

    Science.gov (United States)

    Monier, Adam; Comte, Jérôme; Babin, Marcel; Forest, Alexandre; Matsuoka, Atsushi; Lovejoy, Connie

    2015-01-01

    Arctic Ocean microbial eukaryote phytoplankton form subsurface chlorophyll maximum (SCM), where much of the annual summer production occurs. This SCM is particularly persistent in the Western Arctic Ocean, which is strongly salinity stratified. The recent loss of multiyear sea ice and increased particulate-rich river discharge in the Arctic Ocean results in a greater volume of fresher water that may displace nutrient-rich saltier waters to deeper depths and decrease light penetration in areas affected by river discharge. Here, we surveyed microbial eukaryotic assemblages in the surface waters, and within and below the SCM. In most samples, we detected the pronounced SCM that usually occurs at the interface of the upper mixed layer and Pacific Summer Water (PSW). Poorly developed SCM was seen under two conditions, one above PSW and associated with a downwelling eddy, and the second in a region influenced by the Mackenzie River plume. Four phylogenetically distinct communities were identified: surface, pronounced SCM, weak SCM and a deeper community just below the SCM. Distance–decay relationships and phylogenetic structure suggested distinct ecological processes operating within these communities. In the pronounced SCM, picophytoplanktons were prevalent and community assembly was attributed to water mass history. In contrast, environmental filtering impacted the composition of the weak SCM communities, where heterotrophic Picozoa were more numerous. These results imply that displacement of Pacific waters to greater depth and increased terrigenous input may act as a control on SCM development and result in lower net summer primary production with a more heterotroph dominated eukaryotic microbial community. PMID:25325383

  6. An algorithm for the reconstruction of consensus sequences of ancient segmental duplications and transposon copies in eukaryotic genomes.

    Science.gov (United States)

    Singh, Abanish; Keswani, Umeshkumar; Levine, David; Feschotte, Cedric

    2010-01-01

    Interspersed repeats, mostly resulting from the activity and accumulation of transposable elements, occupy a significant fraction of many eukaryotic genomes. More than half of human genomic sequence consists of known repeats, however a very large part has not yet been associated with neither repetitive structures nor functional units. We have postulated that most of the seemingly unique content of mammalian genomes is also a result of transposon activity, written software to look for weak signals which would help us reconstruct the ancient elements with substantially mutated copies, and integrated it into a system for de novo identification and classification of interspersed repeats. In this manuscript we describe our approach, and report on our methods for building the consensus sequences of these transposons.

  7. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies

    Directory of Open Access Journals (Sweden)

    Tom O. Delmont

    2016-03-01

    Full Text Available High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes.

  8. Evolution of domain promiscuity in eukaryotic genomes--a perspective from the inferred ancestral domain architectures.

    Science.gov (United States)

    Cohen-Gihon, Inbar; Fong, Jessica H; Sharan, Roded; Nussinov, Ruth; Przytycka, Teresa M; Panchenko, Anna R

    2011-03-01

    Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.

  9. Comparative genomics of eukaryotic small nucleolar RNAs reveals deep evolutionary ancestry amidst ongoing intragenomic mobility

    Directory of Open Access Journals (Sweden)

    Hoeppner Marc P

    2012-09-01

    Full Text Available Abstract Background Small nucleolar (snoRNAs are required for posttranscriptional processing and modification of ribosomal, spliceosomal and messenger RNAs. Their presence in both eukaryotes and archaea indicates that snoRNAs are evolutionarily ancient. The location of some snoRNAs within the introns of ribosomal protein genes has been suggested to belie an RNA world origin, with the exons of the earliest protein-coding genes having evolved around snoRNAs after the advent of templated protein synthesis. Alternatively, this intronic location may reflect more recent selection for coexpression of snoRNAs and ribosomal components, ensuring rRNA modification by snoRNAs during ribosome synthesis. To gain insight into the evolutionary origins of this genetic organization, we examined the antiquity of snoRNA families and the stability of their genomic location across 44 eukaryote genomes. Results We report that dozens of snoRNA families are traceable to the Last Eukaryotic Common Ancestor (LECA, but find only weak similarities between the oldest eukaryotic snoRNAs and archaeal snoRNA-like genes. Moreover, many of these LECA snoRNAs are located within the introns of host genes independently traceable to the LECA. Comparative genomic analyses reveal the intronic location of LECA snoRNAs is not ancestral however, suggesting the pattern we observe is the result of ongoing intragenomic mobility. Analysis of human transcriptome data indicates that the primary requirement for hosting intronic snoRNAs is a broad expression profile. Consistent with ongoing mobility across broadly-expressed genes, we report a case of recent migration of a non-LECA snoRNA from the intron of a ubiquitously expressed non-LECA host gene into the introns of two LECA genes during the evolution of primates. Conclusions Our analyses show that snoRNAs were a well-established family of RNAs at the time when eukaryotes began to diversify. While many are intronic, this association is not

  10. Structure of a eukaryotic SWEET transporter in a homotrimeric complex.

    Science.gov (United States)

    Tao, Yuyong; Cheung, Lily S; Li, Shuo; Eom, Joon-Seob; Chen, Li-Qing; Xu, Yan; Perry, Kay; Frommer, Wolf B; Feng, Liang

    2015-11-12

    Eukaryotes rely on efficient distribution of energy and carbon skeletons between organs in the form of sugars. Glucose in animals and sucrose in plants serve as the dominant distribution forms. Cellular sugar uptake and release require vesicular and/or plasma membrane transport proteins. Humans and plants use proteins from three superfamilies for sugar translocation: the major facilitator superfamily (MFS), the sodium solute symporter family (SSF; only in the animal kingdom), and SWEETs. SWEETs carry mono- and disaccharides across vacuolar or plasma membranes. Plant SWEETs play key roles in sugar translocation between compartments, cells, and organs, notably in nectar secretion, phloem loading for long distance translocation, pollen nutrition, and seed filling. Plant SWEETs cause pathogen susceptibility possibly by sugar leakage from infected cells. The vacuolar Arabidopsis thaliana AtSWEET2 sequesters sugars in root vacuoles; loss-of-function mutants show increased susceptibility to Pythium infection. Here we show that its orthologue, the vacuolar glucose transporter OsSWEET2b from rice (Oryza sativa), consists of an asymmetrical pair of triple-helix bundles, connected by an inversion linker transmembrane helix (TM4) to create the translocation pathway. Structural and biochemical analyses show OsSWEET2b in an apparent inward (cytosolic) open state forming homomeric trimers. TM4 tightly interacts with the first triple-helix bundle within a protomer and mediates key contacts among protomers. Structure-guided mutagenesis of the close paralogue SWEET1 from Arabidopsis identified key residues in substrate translocation and protomer crosstalk. Insights into the structure-function relationship of SWEETs are valuable for understanding the transport mechanism of eukaryotic SWEETs and may be useful for engineering sugar flux.

  11. Structural and evolutionary divergence of eukaryotic protein kinases in Apicomplexa

    Directory of Open Access Journals (Sweden)

    Talevich Eric

    2011-11-01

    Full Text Available Abstract Background The Apicomplexa constitute an evolutionarily divergent phylum of protozoan pathogens responsible for widespread parasitic diseases such as malaria and toxoplasmosis. Many cellular functions in these medically important organisms are controlled by protein kinases, which have emerged as promising drug targets for parasitic diseases. However, an incomplete understanding of how apicomplexan kinases structurally and mechanistically differ from their host counterparts has hindered drug development efforts to target parasite kinases. Results We used the wealth of sequence data recently made available for 15 apicomplexan species to identify the kinome of each species and quantify the evolutionary constraints imposed on each family of apicomplexan kinases. Our analysis revealed lineage-specific adaptations in selected families, namely cyclin-dependent kinase (CDK, calcium-dependent protein kinase (CDPK and CLK/LAMMER, which have been identified as important in the pathogenesis of these organisms. Bayesian analysis of selective constraints imposed on these families identified the sequence and structural features that most distinguish apicomplexan protein kinases from their homologs in model organisms and other eukaryotes. In particular, in a subfamily of CDKs orthologous to Plasmodium falciparum crk-5, the activation loop contains a novel PTxC motif which is absent from all CDKs outside Apicomplexa. Our analysis also suggests a convergent mode of regulation in a subset of apicomplexan CDPKs and mammalian MAPKs involving a commonly conserved arginine in the αC helix. In all recognized apicomplexan CLKs, we find a set of co-conserved residues involved in substrate recognition and docking that are distinct from metazoan CLKs. Conclusions We pinpoint key conserved residues that can be predicted to mediate functional differences from eukaryotic homologs in three identified kinase families. We discuss the structural, functional and

  12. Genome sequence analysis indicates that the model eukaryote Nematostella vectensis harbors bacterial consorts.

    Science.gov (United States)

    Artamonova, Irena I; Mushegian, Arcady R

    2013-11-01

    Analysis of the genome sequence of the starlet sea anemone, Nematostella vectensis, reveals many genes whose products are phylogenetically closer to proteins encoded by bacteria or bacteriophages than to any metazoan homologs. One explanation for such sequence affinities could be that these genes have been horizontally transferred from bacteria to the Nematostella lineage. We show, however, that bacterium-like and phage-like genes sequenced by the N. vectensis genome project tend to cluster on separate scaffolds, which typically do not include eukaryotic genes and differ from the latter in their GC contents. Moreover, most of the bacterium-like genes in N. vectensis either lack introns or the introns annotated in such genes are false predictions that, when translated, often restore the missing portions of their predicted protein products. In a freshwater cnidarian, Hydra, for which a proteobacterial endosymbiont is known, these gene features have been used to delineate the DNA of that endosymbiont sampled by the genome sequencing project. We predict that a large fraction of bacterium-like genes identified in the N. vectensis genome similarly are drawn from the contemporary bacterial consorts of the starlet sea anemone. These uncharacterized bacteria associated with N. vectensis are a proteobacterium and a representative of the phylum Bacteroidetes, each represented in the database by an apparently random sample of informational and operational genes. A substantial portion of a putative bacteriophage genome was also detected, which would be especially unlikely to have been transferred to a eukaryote.

  13. Expression screening, protein purification and NMR analysis of human protein domains for structural genomics

    NARCIS (Netherlands)

    Folkers, G.E.|info:eu-repo/dai/nl/162277202; van Buuren, B.N.M.; Kaptein, R.|info:eu-repo/dai/nl/074334603

    2004-01-01

    Structural genomics, the determination of protein structures on a genome-wide scale, is still in its infancy for eukaryotes due to the number and size of their genes. Low protein expression and solubility of eukaryotic geneproducts are the major bottlenecks in high-throughput (HTP) recombinant

  14. Novel Features of Eukaryotic Photosystem II Revealed by Its Crystal Structure Analysis from a Red Alga*

    OpenAIRE

    Ago, Hideo; Adachi, Hideyuki; Umena, Yasufumi; Tashiro, Takayoshi; Kawakami, Keisuke; Kamiya, Nobuo; Tian, Lirong; Han, Guangye; Kuang, Tingyun; Liu, Zheyi; Wang, Fangjun; Zou, Hanfa; Enami, Isao; Miyano, Masashi; Shen, Jian-Ren

    2016-01-01

    Photosystem II (PSII) catalyzes light-induced water splitting, leading to the evolution of molecular oxygen indispensible for life on the earth. The crystal structure of PSII from cyanobacteria has been solved at an atomic level, but the structure of eukaryotic PSII has not been analyzed. Because eukaryotic PSII possesses additional subunits not found in cyanobacterial PSII, it is important to solve the structure of eukaryotic PSII to elucidate their detailed functions, as well as evolutionar...

  15. An Interactive Exercise To Learn Eukaryotic Cell Structure and Organelle Function.

    Science.gov (United States)

    Klionsky, Daniel J.; Tomashek, John J.

    1999-01-01

    Describes a cooperative, interactive problem-solving exercise for studying eukaryotic cell structure and function. Highlights the dynamic aspects of movement through the cell. Contains 15 references. (WRM)

  16. The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation

    Energy Technology Data Exchange (ETDEWEB)

    Blanc, Guillaume; Agarkova, Irina; Grimwood, Jane; Kuo, Alan; Brueggeman, Andrew; Dunigan, David D.; Gurnon, James; Ladunga, Istvan; Lindquist, Erika; Lucas, Susan; Pangilinan, Jasmyn; Proschold, Thomas; Salamov, Asaf; Schmutz, Jeremy; Weeks, Donald; Tamada, Takashi; Lomsadze, Alexandre; Borodovsky, Mark; Claverie, Jean-Michel; Grigoriev, Igor V.; Van Etten, James L.

    2012-02-13

    Background Little is known about the mechanisms of adaptation of life to the extreme environmental conditions encountered in polar regions. Here we present the genome sequence of a unicellular green alga from the division chlorophyta, Coccomyxa subellipsoidea C-169, which we will hereafter refer to as C-169. This is the first eukaryotic microorganism from a polar environment to have its genome sequenced. Results The 48.8 Mb genome contained in 20 chromosomes exhibits significant synteny conservation with the chromosomes of its relatives Chlorella variabilis and Chlamydomonas reinhardtii. The order of the genes is highly reshuffled within synteny blocks, suggesting that intra-chromosomal rearrangements were more prevalent than inter-chromosomal rearrangements. Remarkably, Zepp retrotransposons occur in clusters of nested elements with strictly one cluster per chromosome probably residing at the centromere. Several protein families overrepresented in C. subellipsoidae include proteins involved in lipid metabolism, transporters, cellulose synthases and short alcohol dehydrogenases. Conversely, C-169 lacks proteins that exist in all other sequenced chlorophytes, including components of the glycosyl phosphatidyl inositol anchoring system, pyruvate phosphate dikinase and the photosystem 1 reaction center subunit N (PsaN). Conclusions We suggest that some of these gene losses and gains could have contributed to adaptation to low temperatures. Comparison of these genomic features with the adaptive strategies of psychrophilic microbes suggests that prokaryotes and eukaryotes followed comparable evolutionary routes to adapt to cold environments.

  17. Programmable Site-Specific Nucleases for Targeted Genome Engineering in Higher Eukaryotes.

    Science.gov (United States)

    Govindan, Ganesan; Ramalingam, Sivaprakash

    2016-11-01

    Recent advances in the targeted genome engineering enable molecular biologists to generate sequence specific modifications with greater efficiency and higher specificity in complex eukaryotic genomes. Programmable site-specific DNA cleavage reagents and cellular DNA repair mechanisms have made this possible. These reagents have become powerful tools for delivering a site-specific genomic double-strand break (DSB) at the desired chromosomal locus, which produces sequence alterations through error-prone non-homologous end joining (NHEJ) resulting in gene inactivations/knockouts. Alternatively, the DSB can be repaired through homology-directed repair (HDR) using a donor DNA template, which leads to the introduction of desired sequence modifications at the predetermined site. Here, we summarize the role of three classes of nucleases; zinc finger nucleases (ZFNs), transcription activator like effector nucleases (TALENs), and clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR associated protein 9 (Cas9) system in achieving targeted genome modifications. Further, we discuss the progress towards the applications of programmable site-specific nucleases (SSNs) in treating human diseases and other biological applications in economically important higher eukaryotic organisms such as plants and livestock. J. Cell. Physiol. 231: 2380-2392, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  18. Evolutionary dynamics of the kinetochore network in eukaryotes as revealed by comparative genomics

    NARCIS (Netherlands)

    van Hooff, Jolien Je; Tromer, Eelco; van Wijk, Leny; Snel, Berend; Kops, Geert Jpl

    2017-01-01

    During eukaryotic cell division, the sister chromatids of duplicated chromosomes are pulled apart by microtubules, which connect via kinetochores. The kinetochore is a multiprotein structure that links centromeres to microtubules, and that emits molecular signals in order to safeguard the equal

  19. Structure of the prolyl-tRNA synthetase from the eukaryotic pathogen Giardia lamblia

    Energy Technology Data Exchange (ETDEWEB)

    Larson, Eric T.; Kim, Jessica E.; Napuli, Alberto J.; Verlinde, Christophe L. M. J.; Fan, Erkang; Zucker, Frank H.; Van Voorhis, Wesley C.; Buckner, Frederick S.; Hol, Wim G. J.; Merritt, Ethan A., E-mail: merritt@u.washington.edu [Medical Structural Genomics of Pathogenic Protozoa, (United States); University of Washington, Seattle, WA 98195 (United States)

    2012-09-01

    The structure of Giardia prolyl-tRNA synthetase cocrystallized with proline and ATP shows evidence for half-of-the-sites activity, leading to a corresponding mixture of reaction substrates and product (prolyl-AMP) in the two active sites of the dimer. The genome of the human intestinal parasite Giardia lamblia contains only a single aminoacyl-tRNA synthetase gene for each amino acid. The Giardia prolyl-tRNA synthetase gene product was originally misidentified as a dual-specificity Pro/Cys enzyme, in part owing to its unexpectedly high off-target activation of cysteine, but is now believed to be a normal representative of the class of archaeal/eukaryotic prolyl-tRNA synthetases. The 2.2 Å resolution crystal structure of the G. lamblia enzyme presented here is thus the first structure determination of a prolyl-tRNA synthetase from a eukaryote. The relative occupancies of substrate (proline) and product (prolyl-AMP) in the active site are consistent with half-of-the-sites reactivity, as is the observed biphasic thermal denaturation curve for the protein in the presence of proline and MgATP. However, no corresponding induced asymmetry is evident in the structure of the protein. No thermal stabilization is observed in the presence of cysteine and ATP. The implied low affinity for the off-target activation product cysteinyl-AMP suggests that translational fidelity in Giardia is aided by the rapid release of misactivated cysteine.

  20. C-methods to study 3D organization of the eukaryotic genome

    Directory of Open Access Journals (Sweden)

    Iarovaia O. V.

    2012-07-01

    Full Text Available It is becoming increasingly evident that spatial organization of the eukaryotic genome plays an important role in regulation of gene expression. The three-dimensional (3D genome organization can be studied using different types of microscopy, in particular those coupled with fluorescence in situ hybridization. However, when it comes to the analysis of spatial interaction between specific genome regions, much higher performance demonstrate chromosome conformation capture (3C methods. They are based on the proximity ligation approach which consists in preferential ligation of the ends of DNA fragments joined via protein bridges in living cells by formaldehyde fixation. It is assumed that such bridges link DNA fragments that are located in close spatial proximity in the cell nucleus. In this review we describe current 3C-based approaches, from 3C and ChiP-loop to Hi-C and ChiA-PET, going under the collective name of C-methods.

  1. Comprehensive analysis of endogenous bornavirus-like elements in eukaryote genomes

    Science.gov (United States)

    Horie, Masayuki; Kobayashi, Yuki; Suzuki, Yoshiyuki; Tomonaga, Keizo

    2013-01-01

    Bornaviruses are the only animal RNA viruses that establish a persistent infection in their host cell nucleus. Studies of bornaviruses have provided unique information about viral replication strategies and virus–host interactions. Although bornaviruses do not integrate into the host genome during their replication cycle, we and others have recently reported that there are DNA sequences derived from the mRNAs of ancient bornaviruses in the genomes of vertebrates, including humans, and these have been designated endogenous borna-like (EBL) elements. Therefore, bornaviruses have been interacting with their hosts as driving forces in the evolution of host genomes in a previously unexpected way. Studies of EBL elements have provided new models for virology, evolutionary biology and general cell biology. In this review, we summarize the data on EBL elements including what we have newly identified in eukaryotes genomes, and discuss the biological significance of EBL elements, with a focus on EBL nucleoprotein elements in mammalian genomes. Surprisingly, EBL elements were detected in the genomes of invertebrates, suggesting that the host range of bornaviruses may be much wider than previously thought. We also review our new data on non-retroviral integration of Borna disease virus. PMID:23938751

  2. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank; Platt, Darren

    2006-02-06

    The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.

  3. Crystal structures of two eukaryotic nucleases involved in RNA metabolism

    DEFF Research Database (Denmark)

    Jonstrup, Anette Thyssen; Midtgaard, Søren Fuglsang; Van, Lan Bich

    as well as the controlled turnover of these in response to changing surrounding conditions is of vital importance to ensure optimal fitness of a cell. Central to both these processes is the degradation of RNA, either as a means of decreasing the level of particular RNAs or as a way to get rid of aberrant...... form the 3'-end of mRNA, is normally the first and also rate-limiting step in cellular mRNA degradation and therefore a key process in the control of eukaryotic mRNA turnover. Since Ccr4p is believed to be the main deadenylase the precise role of Pop2p in the complex is less clear. Nevertheless, Pop2p....... In the nucleus Rrp6p associates with the exosome and participates in the degradation of improperly processed precursor mRNAs and trimming of stable RNAs. The crystal structure of S. cerevisiae Rrp6p presented here displays a conserved DEDD nuclease core with a flanking HRDC domain believed to be involved in RNA...

  4. [Bacterial infections as seen from the eukaryotic genome: DNA double strand breaks, inflammation and cancer].

    Science.gov (United States)

    Lemercier, Claudie

    2014-01-01

    An increasing number of studies report that infection by pathogenic bacteria alters the host genome, producing highly hazardous DNA double strand breaks for the eukaryotic cell. Even when DNA repair occurs, it often leaves "scars" on chromosomes that might generate genomic instability at the next cell division. Chronic intestinal inflammation promotes the expansion of genotoxic bacteria in the intestinal microbiote which in turn triggers tumor formation and colon carcinomas. Bacteria act at the level of the host DNA repair machinery. They also highjack the host cell cycle to allow themselves time for replication in an appropriate reservoir. However, except in the case of bacteria carrying the CDT nuclease, the molecular mechanisms responsible for DNA lesions are not well understood, even if reactive oxygen species released during infection make good candidates. © 2014 médecine/sciences – Inserm.

  5. Prokaryotic genes in eukaryotic genome sequences: when to infer horizontal gene transfer and when to suspect an actual microbe.

    Science.gov (United States)

    Artamonova, Irena I; Lappi, Tanya; Zudina, Liudmila; Mushegian, Arcady R

    2015-07-01

    Assessment of phylogenetic positions of predicted gene and protein sequences is a routine step in any genome project, useful for validating the species' taxonomic position and for evaluating hypotheses about genome evolution and function. Several recent eukaryotic genome projects have reported multiple gene sequences that were much more similar to homologues in bacteria than to any eukaryotic sequence. In the spirit of the times, horizontal gene transfer from bacteria to eukaryotes has been invoked in some of these cases. Here, we show, using comparative sequence analysis, that some of those bacteria-like genes indeed appear likely to have been horizontally transferred from bacteria to eukaryotes. In other cases, however, the evidence strongly indicates that the eukaryotic DNA sequenced in the genome project contains a sample of non-integrated DNA from the actual bacteria, possibly providing a window into the host microbiome. Recent literature suggests also that common reagents, kits and laboratory equipment may be systematically contaminated with bacterial DNA, which appears to be sampled by metagenome projects non-specifically. We review several bioinformatic criteria that help to distinguish putative horizontal gene transfers from the admixture of genes from autonomously replicating bacteria in their hosts' genome databases or from the reagent contamination. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.

  6. Neutral Theory Predicts the Relative Abundance and Diversity of Genetic Elements in a Broad Array of Eukaryotic Genomes

    Science.gov (United States)

    Serra, François; Becher, Verónica; Dopazo, Hernán

    2013-01-01

    It is universally true in ecological communities, terrestrial or aquatic, temperate or tropical, that some species are very abundant, others are moderately common, and the majority are rare. Likewise, eukaryotic genomes also contain classes or “species” of genetic elements that vary greatly in abundance: DNA transposons, retrotransposons, satellite sequences, simple repeats and their less abundant functional sequences such as RNA or genes. Are the patterns of relative species abundance and diversity similar among ecological communities and genomes? Previous dynamical models of genomic diversity have focused on the selective forces shaping the abundance and diversity of transposable elements (TEs). However, ideally, models of genome dynamics should consider not only TEs, but also the diversity of all genetic classes or “species” populating eukaryotic genomes. Here, in an analysis of the diversity and abundance of genetic elements in >500 eukaryotic chromosomes, we show that the patterns are consistent with a neutral hypothesis of genome assembly in virtually all chromosomes tested. The distributions of relative abundance of genetic elements are quite precisely predicted by the dynamics of an ecological model for which the principle of functional equivalence is the main assumption. We hypothesize that at large temporal scales an overarching neutral or nearly neutral process governs the evolution of abundance and diversity of genetic elements in eukaryotic genomes. PMID:23798991

  7. Neutral theory predicts the relative abundance and diversity of genetic elements in a broad array of eukaryotic genomes.

    Directory of Open Access Journals (Sweden)

    François Serra

    Full Text Available It is universally true in ecological communities, terrestrial or aquatic, temperate or tropical, that some species are very abundant, others are moderately common, and the majority are rare. Likewise, eukaryotic genomes also contain classes or "species" of genetic elements that vary greatly in abundance: DNA transposons, retrotransposons, satellite sequences, simple repeats and their less abundant functional sequences such as RNA or genes. Are the patterns of relative species abundance and diversity similar among ecological communities and genomes? Previous dynamical models of genomic diversity have focused on the selective forces shaping the abundance and diversity of transposable elements (TEs. However, ideally, models of genome dynamics should consider not only TEs, but also the diversity of all genetic classes or "species" populating eukaryotic genomes. Here, in an analysis of the diversity and abundance of genetic elements in >500 eukaryotic chromosomes, we show that the patterns are consistent with a neutral hypothesis of genome assembly in virtually all chromosomes tested. The distributions of relative abundance of genetic elements are quite precisely predicted by the dynamics of an ecological model for which the principle of functional equivalence is the main assumption. We hypothesize that at large temporal scales an overarching neutral or nearly neutral process governs the evolution of abundance and diversity of genetic elements in eukaryotic genomes.

  8. Neutral theory predicts the relative abundance and diversity of genetic elements in a broad array of eukaryotic genomes.

    Science.gov (United States)

    Serra, François; Becher, Verónica; Dopazo, Hernán

    2013-01-01

    It is universally true in ecological communities, terrestrial or aquatic, temperate or tropical, that some species are very abundant, others are moderately common, and the majority are rare. Likewise, eukaryotic genomes also contain classes or "species" of genetic elements that vary greatly in abundance: DNA transposons, retrotransposons, satellite sequences, simple repeats and their less abundant functional sequences such as RNA or genes. Are the patterns of relative species abundance and diversity similar among ecological communities and genomes? Previous dynamical models of genomic diversity have focused on the selective forces shaping the abundance and diversity of transposable elements (TEs). However, ideally, models of genome dynamics should consider not only TEs, but also the diversity of all genetic classes or "species" populating eukaryotic genomes. Here, in an analysis of the diversity and abundance of genetic elements in >500 eukaryotic chromosomes, we show that the patterns are consistent with a neutral hypothesis of genome assembly in virtually all chromosomes tested. The distributions of relative abundance of genetic elements are quite precisely predicted by the dynamics of an ecological model for which the principle of functional equivalence is the main assumption. We hypothesize that at large temporal scales an overarching neutral or nearly neutral process governs the evolution of abundance and diversity of genetic elements in eukaryotic genomes.

  9. Structural Characterization of a Eukaryotic Cyanase from Tetranychus urticae.

    Science.gov (United States)

    Schlachter, Caleb R; Klapper, Vincent; Wybouw, Nicky; Radford, Taylor; Van Leeuwen, Thomas; Grbic, Miodrag; Chruszcz, Maksymilian

    2017-07-12

    The two-spotted spider mite Tetranychus urticae is a polyphagous agricultural pest and poses a high risk to global crop production as it is rapidly developing pesticide resistance. Genomic and transcriptomic analysis has revealed the presence of a remarkable cyanase gene in T. urticae and related mite species within the Acariformes lineage. Cyanase catalyzes the detoxification of cyanate and is potentially an attractive protein target for the development of new acaricides. Phylogenetic analysis indicates that within the Acariformes, the cyanase gene originates from a single horizontal gene transfer event, which precedes subsequent speciation. Our structural studies presented here compare and contrast prokaryotic cyanases to T. urticae cyanase, which all form homodecamers and have conserved active site residues, but display different surface areas between homodimers in the overall decameric structure.

  10. An HMM-based comparative genomic framework for detecting introgression in eukaryotes.

    Directory of Open Access Journals (Sweden)

    Kevin J Liu

    2014-06-01

    Full Text Available One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM-a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs to simultaneously capture the (potentially reticulate evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes. Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.

  11. Lessons from Structural Genomics*

    Science.gov (United States)

    Terwilliger, Thomas C.; Stuart, David; Yokoyama, Shigeyuki

    2010-01-01

    A decade of structural genomics, the large-scale determination of protein structures, has generated a wealth of data and many important lessons for structural biology and for future large-scale projects. These lessons include a confirmation that it is possible to construct large-scale facilities that can determine the structures of a hundred or more proteins per year, that these structures can be of high quality, and that these structures can have an important impact. Technology development has played a critical role in structural genomics, the difficulties at each step of determining a structure of a particular protein can be quantified, and validation of technologies is nearly as important as the technologies themselves. Finally, rapid deposition of data in public databases has increased the impact and usefulness of the data and international cooperation has advanced the field and improved data sharing. PMID:19416074

  12. Lessons from structural genomics.

    Science.gov (United States)

    Terwilliger, Thomas C; Stuart, David; Yokoyama, Shigeyuki

    2009-01-01

    A decade of structural genomics, the large-scale determination of protein structures, has generated a wealth of data and many important lessons for structural biology and for future large-scale projects. These lessons include a confirmation that it is possible to construct large-scale facilities that can determine the structures of a hundred or more proteins per year, that these structures can be of high quality, and that these structures can have an important impact. Technology development has played a critical role in structural genomics, the difficulties at each step of determining a structure of a particular protein can be quantified, and validation of technologies is nearly as important as the technologies themselves. Finally, rapid deposition of data in public databases has increased the impact and usefulness of the data and international cooperation has advanced the field and improved data sharing.

  13. Preferential duplication of intermodular hub genes: an evolutionary signature in eukaryotes genome networks.

    Directory of Open Access Journals (Sweden)

    Ricardo M Ferreira

    Full Text Available Whole genome protein-protein association networks are not random and their topological properties stem from genome evolution mechanisms. In fact, more connected, but less clustered proteins are related to genes that, in general, present more paralogs as compared to other genes, indicating frequent previous gene duplication episodes. On the other hand, genes related to conserved biological functions present few or no paralogs and yield proteins that are highly connected and clustered. These general network characteristics must have an evolutionary explanation. Considering data from STRING database, we present here experimental evidence that, more than not being scale free, protein degree distributions of organisms present an increased probability for high degree nodes. Furthermore, based on this experimental evidence, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated with a probability that linearly grows with gene degree and decreases with its clustering coefficient. For the first time a model yields results that simultaneously describe different topological distributions. Also, this model correctly predicts that, to produce protein-protein association networks with number of links and number of nodes in the observed range for Eukaryotes, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. This scenario implies a universal mechanism for genome evolution.

  14. A general cloning system to selectively isolate any eukaryotic or prokaryotic genomic region in yeast

    Science.gov (United States)

    Noskov, Vladimir N; Kouprina, Natalay; Leem, Sun-Hee; Ouspenski, Ilia; Barrett, J Carl; Larionov, Vladimir

    2003-01-01

    Background Transformation-associated recombination (TAR) cloning in yeast is a unique method for selective isolation of large chromosomal fragments or entire genes from complex genomes. The technique involves homologous recombination, during yeast spheroplast transformation, between genomic DNA and a TAR vector that has short (~ 60 bp) 5' and 3' gene targeting sequences (hooks). Result TAR cloning requires that the cloned DNA fragment carry at least one autonomously replicating sequence (ARS) that can function as the origin of replication in yeast, which prevents wide application of the method. In this paper, we describe a novel TAR cloning system that allows isolation of genomic regions lacking yeast ARS-like sequences. ARS is inserted into the TAR vector along with URA3 as a counter-selectable marker. The hooks are placed between the TATA box and the transcription initiation site of URA3. Insertion of any sequence between hooks results in inactivation of URA3 expression. That inactivation confers resistance to 5-fluoroorotic acid, allowing selection of TAR cloning events against background vector recircularization events. Conclusion The new system greatly expands the area of application of TAR cloning by allowing isolation of any chromosomal region from eukaryotic and prokaryotic genomes regardless of the presence of autonomously replicating sequences. PMID:12720573

  15. A general cloning system to selectively isolate any eukaryotic or prokaryotic genomic region in yeast

    Directory of Open Access Journals (Sweden)

    Barrett J Carl

    2003-04-01

    Full Text Available Abstract Background Transformation-associated recombination (TAR cloning in yeast is a unique method for selective isolation of large chromosomal fragments or entire genes from complex genomes. The technique involves homologous recombination, during yeast spheroplast transformation, between genomic DNA and a TAR vector that has short (~ 60 bp 5' and 3' gene targeting sequences (hooks. Result TAR cloning requires that the cloned DNA fragment carry at least one autonomously replicating sequence (ARS that can function as the origin of replication in yeast, which prevents wide application of the method. In this paper, we describe a novel TAR cloning system that allows isolation of genomic regions lacking yeast ARS-like sequences. ARS is inserted into the TAR vector along with URA3 as a counter-selectable marker. The hooks are placed between the TATA box and the transcription initiation site of URA3. Insertion of any sequence between hooks results in inactivation of URA3 expression. That inactivation confers resistance to 5-fluoroorotic acid, allowing selection of TAR cloning events against background vector recircularization events. Conclusion The new system greatly expands the area of application of TAR cloning by allowing isolation of any chromosomal region from eukaryotic and prokaryotic genomes regardless of the presence of autonomously replicating sequences.

  16. Leucine-Rich repeat receptor kinases are sporadically distributed in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    Diévart Anne

    2011-12-01

    Full Text Available Abstract Background Plant leucine-rich repeat receptor-like kinases (LRR-RLKs are receptor kinases that contain LRRs in their extracellular domain. In the last 15 years, many research groups have demonstrated major roles played by LRR-RLKs in plants during almost all developmental processes throughout the life of the plant and in defense/resistance against a large range of pathogens. Recently, a breakthrough has been made in this field that challenges the dogma of the specificity of plant LRR-RLKs. Results We analyzed ~1000 complete genomes and show that LRR-RK genes have now been identified in 8 non-plant genomes. We performed an exhaustive phylogenetic analysis of all of these receptors, revealing that all of the LRR-containing receptor subfamilies form lineage-specific clades. Our results suggest that the association of LRRs with RKs appeared independently at least four times in eukaryotic evolutionary history. Moreover, the molecular evolutionary history of the LRR-RKs found in oomycetes is reminiscent of the pattern observed in plants: expansion with amplification/deletion and evolution of the domain organization leading to the functional diversification of members of the gene family. Finally, the expression data suggest that oomycete LRR-RKs may play a role in several stages of the oomycete life cycle. Conclusions In view of the key roles that LRR-RLKs play throughout the entire lifetime of plants and plant-environment interactions, the emergence and expansion of this type of receptor in several phyla along the evolution of eukaryotes, and particularly in oomycete genomes, questions their intrinsic functions in mimicry and/or in the coevolution of receptors between hosts and pathogens.

  17. Inter-species differences of co-expression of neighboring genes in eukaryotic genomes

    Directory of Open Access Journals (Sweden)

    Inaoka Hidenori

    2004-01-01

    Full Text Available Abstract Background There is increasing evidence that gene order within the eukaryotic genome is not random. In yeast and worm, adjacent or neighboring genes tend to be co-expressed. Clustering of co-expressed genes has been found in humans, worm and fruit flies. However, in mice and rats, an effect of chromosomal distance (CD on co-expression has not been investigated yet. Also, no cross-species comparison has been made so far. We analyzed the effect of CD as well as normalized distance (ND using expression data in six eukaryotic species: yeast, fruit fly, worm, rat, mouse and human. Results We analyzed 24 sets of expression data from the six species. Highly co-expressed pairs were sorted into bins of equal sized intervals of CD, and a co-expression rate (CoER in each bin was calculated. In all datasets, a higher CoER was obtained in a short CD range than a long distance range. These results show that across all studied species, there was a consistent effect of CD on co-expression. However, the results using the ND show more diversity. Intra- and inter-species comparisons of CoER reveal that there are significant differences in the co-expression rates of neighboring genes among the species. A pair-wise BLAST analysis finds 8 – 30 % of the highly co-expressed pairs are duplic ated genes. Conclusion We confirmed that in the six eukaryotic species, there was a consistent tendency that neighboring genes are likely to be co-expressed. Results of pair-wised BLAST indicate a significant effect of non-duplicated pairs on co-expression. A comparison of CD and ND suggests the dominant effect of CD.

  18. Visualization of Genome Signatures of Eukaryote Genomes by Batch-Learning Self-Organizing Map with a Special Emphasis on Drosophila Genomes

    Directory of Open Access Journals (Sweden)

    Takashi Abe

    2014-01-01

    Full Text Available A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method “BLSOM” for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.

  19. Precise Editing at DNA Replication Forks Enables Multiplex Genome Engineering in Eukaryotes.

    Science.gov (United States)

    Barbieri, Edward M; Muir, Paul; Akhuetie-Oni, Benjamin O; Yellman, Christopher M; Isaacs, Farren J

    2017-11-15

    We describe a multiplex genome engineering technology in Saccharomyces cerevisiae based on annealing synthetic oligonucleotides at the lagging strand of DNA replication. The mechanism is independent of Rad51-directed homologous recombination and avoids the creation of double-strand DNA breaks, enabling precise chromosome modifications at single base-pair resolution with an efficiency of >40%, without unintended mutagenic changes at the targeted genetic loci. We observed the simultaneous incorporation of up to 12 oligonucleotides with as many as 60 targeted mutations in one transformation. Iterative transformations of a complex pool of oligonucleotides rapidly produced large combinatorial genomic diversity >105. This method was used to diversify a heterologous β-carotene biosynthetic pathway that produced genetic variants with precise mutations in promoters, genes, and terminators, leading to altered carotenoid levels. Our approach of engineering the conserved processes of DNA replication, repair, and recombination could be automated and establishes a general strategy for multiplex combinatorial genome engineering in eukaryotes. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118.

    Science.gov (United States)

    Novo, Maite; Bigey, Frédéric; Beyne, Emmanuelle; Galeote, Virginie; Gavory, Frédérick; Mallet, Sandrine; Cambon, Brigitte; Legras, Jean-Luc; Wincker, Patrick; Casaregola, Serge; Dequin, Sylvie

    2009-09-22

    Saccharomyces cerevisiae has been used for millennia in winemaking, but little is known about the selective forces acting on the wine yeast genome. We sequenced the complete genome of the diploid commercial wine yeast EC1118, resulting in an assembly of 31 scaffolds covering 97% of the S288c reference genome. The wine yeast differed strikingly from the other S. cerevisiae isolates in possessing 3 unique large regions, 2 of which were subtelomeric, the other being inserted within an EC1118 chromosome. These regions encompass 34 genes involved in key wine fermentation functions. Phylogeny and synteny analyses showed that 1 of these regions originated from a species closely related to the Saccharomyces genus, whereas the 2 other regions were of non-Saccharomyces origin. We identified Zygosaccharomyces bailii, a major contaminant of wine fermentations, as the donor species for 1 of these 2 regions. Although natural hybridization between Saccharomyces strains has been described, this report provides evidence that gene transfer may occur between Saccharomyces and non-Saccharomyces species. We show that the regions identified are frequent and differentially distributed among S. cerevisiae clades, being found almost exclusively in wine strains, suggesting acquisition through recent transfer events. Overall, these data show that the wine yeast genome is subject to constant remodeling through the contribution of exogenous genes. Our results suggest that these processes are favored by ecologic proximity and are involved in the molecular adaptation of wine yeasts to conditions of high sugar, low nitrogen, and high ethanol concentrations.

  1. Amino acids biosynthesis and nitrogen assimilation pathways: a great genomic deletion during eukaryotes evolution

    Science.gov (United States)

    2011-01-01

    Background Besides being building blocks for proteins, amino acids are also key metabolic intermediates in living cells. Surprisingly a variety of organisms are incapable of synthesizing some of them, thus named Essential Amino Acids (EAAs). How certain ancestral organisms successfully competed for survival after losing key genes involved in amino acids anabolism remains an open question. Comparative genomics searches on current protein databases including sequences from both complete and incomplete genomes among diverse taxonomic groups help us to understand amino acids auxotrophy distribution. Results Here, we applied a methodology based on clustering of homologous genes to seed sequences from autotrophic organisms Saccharomyces cerevisiae (yeast) and Arabidopsis thaliana (plant). Thus we depict evidences of presence/absence of EAA biosynthetic and nitrogen assimilation enzymes at phyla level. Results show broad loss of the phenotype of EAAs biosynthesis in several groups of eukaryotes, followed by multiple secondary gene losses. A subsequent inability for nitrogen assimilation is observed in derived metazoans. Conclusions A Great Deletion model is proposed here as a broad phenomenon generating the phenotype of amino acids essentiality followed, in metazoans, by organic nitrogen dependency. This phenomenon is probably associated to a relaxed selective pressure conferred by heterotrophy and, taking advantage of available homologous clustering tools, a complete and updated picture of it is provided. PMID:22369087

  2. The success of structural genomics

    OpenAIRE

    Terwilliger, Thomas C.

    2011-01-01

    The International Conference on Structural Genomics (ICSG 2011, http://www.icsg2011.org), held in Toronto Canada May 10–14, 2011 was a rich and exciting demonstration of how far structural genomics has come. Structural genomics has now matured into a field that includes both structure and the biology that structure enables. This has allowed targeting based on systematic approaches and on known biological importance and allows biochemical studies to be closely tied to structure determination. ...

  3. Data management in structural genomics: an overview.

    Science.gov (United States)

    Haquin, Sabrina; Oeuillet, Eric; Pajon, Anne; Harris, Mark; Jones, Alwyn T; van Tilbeurgh, Herman; Markley, John L; Zolnai, Zolt; Poupon, Anne

    2008-01-01

    Data management has been identified as a crucial issue in all large-scale experimental projects. In this type of project, many different persons manipulate multiple objects in different locations; thus, unless complete and accurate records are maintained, it is extremely difficult to understand exactly what has been done, when it was done, who did it, and what exact protocol was used. All of this information is essential for use in publications, reusing successful protocols, determining why a target has failed, and validating and optimizing protocols. Although data management solutions have been in place for certain focused activities (e.g., genome sequencing and microarray experiments), they are just emerging for more widespread projects, such as structural genomics, metabolomics, and systems biology as a whole. The complexity of experimental procedures, and the diversity and high rate of development of protocols used in a single center, or across various centers, have important consequences for the design of information management systems. Because procedures are carried out by both machines and hand, the system must be capable of handling data entry both from robotic systems and by means of a user-friendly interface. The information management system needs to be flexible so it can handle changes in existing protocols or newly added protocols. Because no commercial information management systems have had the needed features, most structural genomics groups have developed their own solutions. This chapter discusses the advantages of using a LIMS (laboratory information management system), for day-to-day management of structural genomics projects, and also for data mining. This chapter reviews different solutions currently in place or under development with emphasis on three systems developed by the authors: Xtrack, Sesame (developed at the Center for Eukaryotic Structural Genomics under the US Protein Structural Genomics Initiative), and HalX (developed at the

  4. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

    Science.gov (United States)

    Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

    2016-04-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.

  5. Archaeal Genome Guardians Give Insights into Eukaryotic DNA Replication and Damage Response Proteins

    Directory of Open Access Journals (Sweden)

    David S. Shin

    2014-01-01

    Full Text Available As the third domain of life, archaea, like the eukarya and bacteria, must have robust DNA replication and repair complexes to ensure genome fidelity. Archaea moreover display a breadth of unique habitats and characteristics, and structural biologists increasingly appreciate these features. As archaea include extremophiles that can withstand diverse environmental stresses, they provide fundamental systems for understanding enzymes and pathways critical to genome integrity and stress responses. Such archaeal extremophiles provide critical data on the periodic table for life as well as on the biochemical, geochemical, and physical limitations to adaptive strategies allowing organisms to thrive under environmental stress relevant to determining the boundaries for life as we know it. Specifically, archaeal enzyme structures have informed the architecture and mechanisms of key DNA repair proteins and complexes. With added abilities to temperature-trap flexible complexes and reveal core domains of transient and dynamic complexes, these structures provide insights into mechanisms of maintaining genome integrity despite extreme environmental stress. The DNA damage response protein structures noted in this review therefore inform the basis for genome integrity in the face of environmental stress, with implications for all domains of life as well as for biomanufacturing, astrobiology, and medicine.

  6. The classification, structure and functioning of Ago proteins in Eukaryotes

    Directory of Open Access Journals (Sweden)

    Aleksandra Poterala

    2016-09-01

    Full Text Available Ago proteins are members of the highly specialized and conserved Argonaute family, primarily responsible for regulation of gene expression. As a part of RNA-induced silencing complexes (RISCs Ago proteins are responsible for binding a short RNA and cleavage/inhibition of translation of target mRNAs. Phosphorylation may work as the switch between those two functions, but the role of magnesium ion concentration is also taken into consideration. Recent reports indicate that Ago proteins can interact with an mRNA and cause inhibition of translation without the participation of a short RNA. As key elements in RNA interference processes, Ago proteins are an important and intensively exploited area of research. Furthermore, these proteins are involved in the repair of DNA double-strand breaks by homologous recombination, modifications of chromatin, and alternative splicing. Their role in the cell cycle and senescence is also being studied. In addition, Ago expression is tissue-specific, which potentially may be used for diagnostic purposes. Understanding the mechanisms of Ago functioning is therefore crucial for understanding many cellular processes. The following article presents a detailed description of the Ago proteins including their post-translational modifications, recent data and hypotheses concerning their interactions with short RNAs and mRNAs as well as the mechanisms of siRNA/miRNA sorting into individual members of the Ago subfamily, and their role in eukaryotic cells. The latest classification of Ago proteins within the Argonaute family based on evolutionary studies and their possible interactions with DNA are also described.

  7. Dormant origins as a built-in safeguard in eukaryotic DNA replication against genome instability and disease development.

    Science.gov (United States)

    Shima, Naoko; Pederson, Kayla D

    2017-08-01

    DNA replication is a prerequisite for cell proliferation, yet it can be increasingly challenging for a eukaryotic cell to faithfully duplicate its genome as its size and complexity expands. Dormant origins now emerge as a key component for cells to successfully accomplish such a demanding but essential task. In this perspective, we will first provide an overview of the fundamental processes eukaryotic cells have developed to regulate origin licensing and firing. With a special focus on mammalian systems, we will then highlight the role of dormant origins in preventing replication-associated genome instability and their functional interplay with proteins involved in the DNA damage repair response for tumor suppression. Lastly, deficiencies in the origin licensing machinery will be discussed in relation to their influence on stem cell maintenance and human diseases. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. The success of structural genomics.

    Science.gov (United States)

    Terwilliger, Thomas C

    2011-07-01

    The International Conference on Structural Genomics (ICSG 2011, http://sgc.utoronto.ca/ICSG2011/index.php ) [corrected], held in Toronto Canada May 10-14, 2011 was a rich and exciting demonstration of how far structural genomics has come. Structural genomics has now matured into a field that includes both structure and the biology that structure enables. This has allowed targeting based on systematic approaches and on known biological importance and allows biochemical studies to be closely tied to structure determination. The wealth of purified proteins, clones, and chemical probes produced by structural genomics groups will enable a vast amount of follow-on research. The technologies, the structures, and the biology that were described at the meeting were at the cutting edge of science. Structural genomics has become a success.

  9. Hybrid and rogue kinases encoded in the genomes of model eukaryotes.

    Directory of Open Access Journals (Sweden)

    Ramaswamy Rakshambikai

    Full Text Available The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes-S.cerevisiae, D. melanogaster, C.elegans, M.musculus, T.rubripes and H.sapiens-and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P.falciparum sheds light on possible strategies in host-pathogen interactions.

  10. Eukaryotic Ribonucleases P/MRP: the Crystal Structure of the P3 Domain

    Energy Technology Data Exchange (ETDEWEB)

    Perederina, A.; Esakova, O; Quan, C; Khanova, E; Krasilnikov, A

    2010-01-01

    Ribonuclease (RNase) P is a site-specific endoribonuclease found in all kingdoms of life. Typical RNase P consists of a catalytic RNA component and a protein moiety. In the eukaryotes, the RNase P lineage has split into two, giving rise to a closely related enzyme, RNase MRP, which has similar components but has evolved to have different specificities. The eukaryotic RNases P/MRP have acquired an essential helix-loop-helix protein-binding RNA domain P3 that has an important function in eukaryotic enzymes and distinguishes them from bacterial and archaeal RNases P. Here, we present a crystal structure of the P3 RNA domain from Saccharomyces cerevisiae RNase MRP in a complex with RNase P/MRP proteins Pop6 and Pop7 solved to 2.7 {angstrom}. The structure suggests similar structural organization of the P3 RNA domains in RNases P/MRP and possible functions of the P3 domains and proteins bound to them in the stabilization of the holoenzymes' structures as well as in interactions with substrates. It provides the first insight into the structural organization of the eukaryotic enzymes of the RNase P/MRP family.

  11. Eukaryotic ribonucleases P/MRP: the crystal structure of the P3 domain.

    Science.gov (United States)

    Perederina, Anna; Esakova, Olga; Quan, Chao; Khanova, Elena; Krasilnikov, Andrey S

    2010-02-17

    Ribonuclease (RNase) P is a site-specific endoribonuclease found in all kingdoms of life. Typical RNase P consists of a catalytic RNA component and a protein moiety. In the eukaryotes, the RNase P lineage has split into two, giving rise to a closely related enzyme, RNase MRP, which has similar components but has evolved to have different specificities. The eukaryotic RNases P/MRP have acquired an essential helix-loop-helix protein-binding RNA domain P3 that has an important function in eukaryotic enzymes and distinguishes them from bacterial and archaeal RNases P. Here, we present a crystal structure of the P3 RNA domain from Saccharomyces cerevisiae RNase MRP in a complex with RNase P/MRP proteins Pop6 and Pop7 solved to 2.7 A. The structure suggests similar structural organization of the P3 RNA domains in RNases P/MRP and possible functions of the P3 domains and proteins bound to them in the stabilization of the holoenzymes' structures as well as in interactions with substrates. It provides the first insight into the structural organization of the eukaryotic enzymes of the RNase P/MRP family.

  12. Extreme genome diversity in the hyper-prevalent parasitic eukaryote Blastocystis.

    Science.gov (United States)

    Gentekaki, Eleni; Curtis, Bruce A; Stairs, Courtney W; Klimeš, Vladimír; Eliáš, Marek; Salas-Leiva, Dayana E; Herman, Emily K; Eme, Laura; Arias, Maria C; Henrissat, Bernard; Hilliou, Frédérique; Klute, Mary J; Suga, Hiroshi; Malik, Shehre-Banoo; Pightling, Arthur W; Kolisko, Martin; Rachubinski, Richard A; Schlacht, Alexander; Soanes, Darren M; Tsaousis, Anastasios D; Archibald, John M; Ball, Steven G; Dacks, Joel B; Clark, C Graham; van der Giezen, Mark; Roger, Andrew J

    2017-09-01

    Blastocystis is the most prevalent eukaryotic microbe colonizing the human gut, infecting approximately 1 billion individuals worldwide. Although Blastocystis has been linked to intestinal disorders, its pathogenicity remains controversial because most carriers are asymptomatic. Here, the genome sequence of Blastocystis subtype (ST) 1 is presented and compared to previously published sequences for ST4 and ST7. Despite a conserved core of genes, there is unexpected diversity between these STs in terms of their genome sizes, guanine-cytosine (GC) content, intron numbers, and gene content. ST1 has 6,544 protein-coding genes, which is several hundred more than reported for ST4 and ST7. The percentage of proteins unique to each ST ranges from 6.2% to 20.5%, greatly exceeding the differences observed within parasite genera. Orthologous proteins also display extreme divergence in amino acid sequence identity between STs (i.e., 59%-61% median identity), on par with observations of the most distantly related species pairs of parasite genera. The STs also display substantial variation in gene family distributions and sizes, especially for protein kinase and protease gene families, which could reflect differences in virulence. It remains to be seen to what extent these inter-ST differences persist at the intra-ST level. A full 26% of genes in ST1 have stop codons that are created on the mRNA level by a novel polyadenylation mechanism found only in Blastocystis. Reconstructions of pathways and organellar systems revealed that ST1 has a relatively complete membrane-trafficking system and a near-complete meiotic toolkit, possibly indicating a sexual cycle. Unlike some intestinal protistan parasites, Blastocystis ST1 has near-complete de novo pyrimidine, purine, and thiamine biosynthesis pathways and is unique amongst studied stramenopiles in being able to metabolize α-glucans rather than β-glucans. It lacks all genes encoding heme-containing cytochrome P450 proteins

  13. Extreme genome diversity in the hyper-prevalent parasitic eukaryote Blastocystis

    Science.gov (United States)

    Gentekaki, Eleni; Stairs, Courtney W.; Klimeš, Vladimír; Eliáš, Marek; Salas-Leiva, Dayana E.; Herman, Emily K.; Eme, Laura; Arias, Maria C.; Henrissat, Bernard; Hilliou, Frédérique; Klute, Mary J.; Suga, Hiroshi; Malik, Shehre-Banoo; Pightling, Arthur W.; Kolisko, Martin; Rachubinski, Richard A.; Schlacht, Alexander; Soanes, Darren M.; Tsaousis, Anastasios D.; Archibald, John M.; Ball, Steven G.; Dacks, Joel B.; Clark, C. Graham; van der Giezen, Mark; Roger, Andrew J.

    2017-01-01

    Blastocystis is the most prevalent eukaryotic microbe colonizing the human gut, infecting approximately 1 billion individuals worldwide. Although Blastocystis has been linked to intestinal disorders, its pathogenicity remains controversial because most carriers are asymptomatic. Here, the genome sequence of Blastocystis subtype (ST) 1 is presented and compared to previously published sequences for ST4 and ST7. Despite a conserved core of genes, there is unexpected diversity between these STs in terms of their genome sizes, guanine-cytosine (GC) content, intron numbers, and gene content. ST1 has 6,544 protein-coding genes, which is several hundred more than reported for ST4 and ST7. The percentage of proteins unique to each ST ranges from 6.2% to 20.5%, greatly exceeding the differences observed within parasite genera. Orthologous proteins also display extreme divergence in amino acid sequence identity between STs (i.e., 59%–61% median identity), on par with observations of the most distantly related species pairs of parasite genera. The STs also display substantial variation in gene family distributions and sizes, especially for protein kinase and protease gene families, which could reflect differences in virulence. It remains to be seen to what extent these inter-ST differences persist at the intra-ST level. A full 26% of genes in ST1 have stop codons that are created on the mRNA level by a novel polyadenylation mechanism found only in Blastocystis. Reconstructions of pathways and organellar systems revealed that ST1 has a relatively complete membrane-trafficking system and a near-complete meiotic toolkit, possibly indicating a sexual cycle. Unlike some intestinal protistan parasites, Blastocystis ST1 has near-complete de novo pyrimidine, purine, and thiamine biosynthesis pathways and is unique amongst studied stramenopiles in being able to metabolize α-glucans rather than β-glucans. It lacks all genes encoding heme-containing cytochrome P450 proteins

  14. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

    Directory of Open Access Journals (Sweden)

    Jonathan A Eisen

    2006-09-01

    Full Text Available The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC, which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases, using diverse resources (e.g., proteases and transporters, and generating structural complexity (e.g., kinesins and dyneins. In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates, no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from

  15. An integrated approach for genome annotation of the eukaryotic thermophile Chaetomium thermophilum

    Science.gov (United States)

    Bock, Thomas; Chen, Wei-Hua; Ori, Alessandro; Malik, Nayab; Silva-Martin, Noella; Huerta-Cepas, Jaime; Powell, Sean T.; Kastritis, Panagiotis L.; Smyshlyaev, Georgy; Vonkova, Ivana; Kirkpatrick, Joanna; Doerks, Tobias; Nesme, Leo; Baßler, Jochen; Kos, Martin; Hurt, Ed; Carlomagno, Teresa; Gavin, Anne-Claude; Barabas, Orsolya; Müller, Christoph W.; van Noort, Vera; Beck, Martin; Bork, Peer

    2014-01-01

    The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community. PMID:25398899

  16. Solution NMR in structural genomics.

    Science.gov (United States)

    Yee, Adelinda; Gutmanas, Aleksandras; Arrowsmith, Cheryl H

    2006-10-01

    Structural genomics (also known as structural proteomics) aims to generate accurate three-dimensional models for all folded, globular proteins and domains in the protein universe to understand the relationship between protein sequence, structure and function. NMR spectroscopy of small (structural genomics projects for more than six years now. Recent advances coming from traditional NMR structural biology laboratories as well as large scale centers and consortia using NMR for structural genomics promise to facilitate NMR analysis making it even a more efficient and increasingly automated procedure.

  17. Comparative and functional genomics of Legionella identified eukaryotic like proteins as key players in host-pathogen interactions

    Directory of Open Access Journals (Sweden)

    Laura eGomez-Valero

    2011-10-01

    Full Text Available Although best known for its ability to cause severe pneumonia in people whose immune defenses are weakened, Legionella pneumophila and Legionella longbeachae are two species of a large genus of bacteria that are ubiquitous in nature, where they parasitize protozoa. Adaptation to the host environment and exploitation of host cell functions are critical for the success of these intracellular pathogens. The establishment and publication of the complete genome sequences of L. pneumophila and L. longbeachae isolates paved the way for major breakthroughs in understanding the biology of these organisms. In this review we present the knowledge gained from the analyses and comparison of the complete genome sequences of different L. pneumophila and L. longbeachae strains. Emphasis is given on putative virulence and Legionella life cycle related functions, such as the identification of an extended array of eukaryotic-like proteins, many of which have been shown to modulate host cell functions to the pathogen's advantage. Surprisingly, many of the eukaryotic domain proteins identified in L. pneumophila as well as many substrates of the Dot/Icm type IV secretion system essential for intracellular replication are different between these two species, although they cause the same disease. Finally, evolutionary aspects regarding the eukaryotic like proteins in Legionella are discussed.

  18. Structural variations in pig genomes

    NARCIS (Netherlands)

    Paudel, Y.

    2015-01-01

    Abstract Paudel, Y. (2015). Structural variations in pig genomes. PhD thesis, Wageningen University, the Netherlands Structural variations are chromosomal rearrangements such as insertions-deletions (INDELs), duplications, inversions, translocations, and copy number variations

  19. Three-dimensional structural analysis of eukaryotic flagella/cilia by electron cryo-tomography

    Energy Technology Data Exchange (ETDEWEB)

    Bui, Khanh Huy; Pigino, Gaia; Ishikawa, Takashi, E-mail: takashi.ishikawa@psi.ch [Paul Scherrer Institute, 5232 Villigen PSI (Switzerland); ETH Zurich (Switzerland)

    2011-01-01

    Based on the molecular architecture revealed by electron cryo-tomography, the mechanism of the bending motion of eukaryotic flagella/cilia is discussed. Electron cryo-tomography is a potential approach to analyzing the three-dimensional conformation of frozen hydrated biological macromolecules using electron microscopy. Since projections of each individual object illuminated from different orientations are merged, electron tomography is capable of structural analysis of such heterogeneous environments as in vivo or with polymorphism, although radiation damage and the missing wedge are severe problems. Here, recent results on the structure of eukaryotic flagella, which is an ATP-driven bending organelle, from green algae Chlamydomonas are presented. Tomographic analysis reveals asymmetric molecular arrangements, especially that of the dynein motor proteins, in flagella, giving insight into the mechanism of planar asymmetric bending motion. Methodological challenges to obtaining higher-resolution structures from this technique are also discussed.

  20. Camps 2.0: exploring the sequence and structure space of prokaryotic, eukaryotic, and viral membrane proteins.

    Science.gov (United States)

    Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij

    2012-03-01

    Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.

  1. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. Structural genomics in endocrinology

    NARCIS (Netherlands)

    Smit, J. W.; Romijn, J. A.

    2001-01-01

    Traditionally, endocrine research evolved from the phenotypical characterisation of endocrine disorders to the identification of underlying molecular pathophysiology. This approach has been, and still is, extremely successful. The introduction of genomics and proteomics has resulted in a reversal of

  3. Constitutive aneuploidy and genomic instability in the single-celled eukaryote Giardia intestinalis.

    Science.gov (United States)

    Tůmová, Pavla; Uzlíková, Magdalena; Jurczyk, Tomáš; Nohýnková, Eva

    2016-08-01

    Giardia intestinalis is an important single-celled human pathogen. Interestingly, this organism has two equal-sized transcriptionally active nuclei, each considered diploid. By evaluating condensed chromosome numbers and visualizing homologous chromosomes by fluorescent in situ hybridization, we determined that the Giardia cells are constitutively aneuploid. We observed karyotype inter-and intra-population heterogeneity in eight cell lines from two clinical isolates, suggesting constant karyotype evolution during in vitro cultivation. High levels of chromosomal instability and frequent mitotic missegregations observed in four cell lines correlated with a proliferative disadvantage and growth retardation. Other cell lines, although derived from the same clinical isolate, revealed a stable yet aneuploid karyotype. We suggest that both chromatid missegregations and structural rearrangements contribute to shaping the Giardia genome, leading to whole-chromosome aneuploidy, unequal gene distribution, and a genomic divergence of the two nuclei within one cell. Aneuploidy in Giardia is further propagated without p53-mediated cell cycle arrest and might have been a key mechanism in generating the genetic diversity of this human pathogen. © 2016 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  4. Scaling properties and fractality in the distribution of coding segments in eukaryotic genomes revealed through a block entropy approach

    Science.gov (United States)

    Athanasopoulou, Labrini; Athanasopoulos, Stavros; Karamanos, Kostas; Almirantis, Yannis

    2010-11-01

    Statistical methods, including block entropy based approaches, have already been used in the study of long-range features of genomic sequences seen as symbol series, either considering the full alphabet of the four nucleotides or the binary purine or pyrimidine character set. Here we explore the alternation of short protein-coding segments with long noncoding spacers in entire chromosomes, focusing on the scaling properties of block entropy. In previous studies, it has been shown that the sizes of noncoding spacers follow power-law-like distributions in most chromosomes of eukaryotic organisms from distant taxa. We have developed a simple evolutionary model based on well-known molecular events (segmental duplications followed by elimination of most of the duplicated genes) which reproduces the observed linearity in log-log plots. The scaling properties of block entropy H(n) have been studied in several works. Their findings suggest that linearity in semilogarithmic scale characterizes symbol sequences which exhibit fractal properties and long-range order, while this linearity has been shown in the case of the logistic map at the Feigenbaum accumulation point. The present work starts with the observation that the block entropy of the Cantor-like binary symbol series scales in a similar way. Then, we perform the same analysis for the full set of human chromosomes and for several chromosomes of other eukaryotes. A similar but less extended linearity in semilogarithmic scale, indicating fractality, is observed, while randomly formed surrogate sequences clearly lack this type of scaling. Genomic sequences always present entropy values much lower than their random surrogates. Symbol sequences produced by the aforementioned evolutionary model follow the scaling found in genomic sequences, thus corroborating the conjecture that “segmental duplication-gene elimination” dynamics may have contributed to the observed long rangeness in the coding or noncoding alternation in

  5. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  6. Ultradian clocks in eukaryotic microbes: from behavioural observation to functional genomics.

    Science.gov (United States)

    Kippert, F; Hunt, P

    2000-01-01

    Period homeostasis is the defining characteristic of a biological clock. Strict period homeostasis is found for the ultradian clocks of eukaryotic microbes. In addition to being temperature-compensated, the period of these rhythms is unaffected by differences in nutrient composition or changes in other environmental variables. The best-studied examples of ultradian clocks are those of the ciliates Paramecium tetraurelia and Tetrahymena sp. and of the fission yeast, Schizosaccharomyces pombe. In these single cell eukaryotes, up to seven different parameters display ultradian rhythmicity with the same, species- and strain-specific period. In fission yeast, the molecular genetic analysis of ultradian clock mechanisms has begun with the systematic analysis of mutants in identified candidate genes. More than 40 "clock mutants" have already been identified, most of them affected in components of major regulatory and signalling pathways. These results indicate a high degree of complexity for a eukaryotic clock mechanism. BioEssays 22:16-22, 2000. Copyright 2000 John Wiley & Sons, Inc.

  7. Structures to complement the archaeo-eukaryotic primases catalytic cycle description: What's next?

    Directory of Open Access Journals (Sweden)

    Julien Boudet

    2015-01-01

    Primase activity has been studied in the last decades but the detailed molecular steps explaining some unique features remain unclear. High-resolution structures of free and bound primases domains have brought significant insights in the understanding of the primase reaction cycle. Here, we give a short review of the structural work conducted in the field of archaeo-eukaryotic primases and we underline the missing “pictures” of the active forms of the enzyme which are of major interest. We organized our analysis with respect to the progression through the catalytic pathway.

  8. Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves

    Directory of Open Access Journals (Sweden)

    Harter Klaus

    2006-11-01

    Full Text Available Abstract Background The discovery of cis-regulatory motifs still remains a challenging task even though the number of sequenced genomes is constantly growing. Computational analyses using pattern search algorithms have been valuable in phylogenetic footprinting approaches as have expression profile experiments to predict co-occurring motifs. Surprisingly little is known about the nature of cis-regulatory element (CRE distribution in promoters. Results In this paper we used the Motif Mapper open-source collection of visual basic scripts for the analysis of motifs in any aligned set of DNA sequences. We focused on promoter motif distribution curves to identify positional over-representation of DNA motifs. Using differentially aligned datasets from the model species Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae, we convincingly demonstrated the importance of the position and orientation for motif discovery. Analysis with known CREs and all possible hexanucleotides showed that some functional elements gather close to the transcription and translation initiation sites and that elements other than the TATA-box motif are conserved between eukaryote promoters. While a high background frequency usually decreases the effectiveness of such an enumerative investigation, we improved our analysis by conducting motif distribution maps using large datasets. Conclusion This is the first study to reveal positional over-representation of CREs and promoter motifs in a cross-species approach. CREs and motifs shared between eukaryotic promoters support the observation that an eukaryotic promoter structure has been conserved throughout evolutionary time. Furthermore, with the information on positional enrichment of a motif or a known functional CRE, it is possible to get a more detailed insight into where an element appears to function. This in turn might accelerate the in depth examination of known and yet unknown

  9. Structural basis for the initiation of eukaryotic transcription-coupled DNA repair.

    Science.gov (United States)

    Xu, Jun; Lahiri, Indrajit; Wang, Wei; Wier, Adam; Cianfrocco, Michael A; Chong, Jenny; Hare, Alissa A; Dervan, Peter B; DiMaio, Frank; Leschziner, Andres E; Wang, Dong

    2017-11-30

    Eukaryotic transcription-coupled repair (TCR) is an important and well-conserved sub-pathway of nucleotide excision repair that preferentially removes DNA lesions from the template strand that block translocation of RNA polymerase II (Pol II). Cockayne syndrome group B (CSB, also known as ERCC6) protein in humans (or its yeast orthologues, Rad26 in Saccharomyces cerevisiae and Rhp26 in Schizosaccharomyces pombe) is among the first proteins to be recruited to the lesion-arrested Pol II during the initiation of eukaryotic TCR. Mutations in CSB are associated with the autosomal-recessive neurological disorder Cockayne syndrome, which is characterized by progeriod features, growth failure and photosensitivity. The molecular mechanism of eukaryotic TCR initiation remains unclear, with several long-standing unanswered questions. How cells distinguish DNA lesion-arrested Pol II from other forms of arrested Pol II, the role of CSB in TCR initiation, and how CSB interacts with the arrested Pol II complex are all unknown. The lack of structures of CSB or the Pol II-CSB complex has hindered our ability to address these questions. Here we report the structure of the S. cerevisiae Pol II-Rad26 complex solved by cryo-electron microscopy. The structure reveals that Rad26 binds to the DNA upstream of Pol II, where it markedly alters its path. Our structural and functional data suggest that the conserved Swi2/Snf2-family core ATPase domain promotes the forward movement of Pol II, and elucidate key roles for Rad26 in both TCR and transcription elongation.

  10. Structural Genomics of Protein Phosphatases

    Energy Technology Data Exchange (ETDEWEB)

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  11. Crystal Structure of a Legionella pneumophila Ecto -Triphosphate Diphosphohydrolase, A Structural and Functional Homolog of the Eukaryotic NTPDases

    Energy Technology Data Exchange (ETDEWEB)

    Vivian, Julian P.; Riedmaier, Patrice; Ge, Honghua; Le Nours, Jérôme; Sansom, Fiona M.; Wilce, Matthew C.J.; Byres, Emma; Dias, Manisha; Schmidberger, Jason W.; Cowan, Peter J.; d' Apice, Anthony J.F.; Hartland, Elizabeth L.; Rossjohn, Jamie; Beddoe, Travis (Monash); (Melbourne)

    2010-04-19

    Many pathogenic bacteria have sophisticated mechanisms to interfere with the mammalian immune response. These include the disruption of host extracellular ATP levels that, in humans, is tightly regulated by the nucleoside triphosphate diphosphohydrolase family (NTPDases). NTPDases are found almost exclusively in eukaryotes, the notable exception being their presence in some pathogenic prokaryotes. To address the function of bacterial NTPDases, we describe the structures of an NTPDase from the pathogen Legionella pneumophila (Lpg1905/Lp1NTPDase) in its apo state and in complex with the ATP analog AMPPNP and the subtype-specific NTPDase inhibitor ARL 67156. Lp1NTPDase is structurally and catalytically related to eukaryotic NTPDases and the structure provides a basis for NTPDase-specific inhibition. Furthermore, we demonstrate that the activity of Lp1NTPDase correlates directly with intracellular replication of Legionella within macrophages. Collectively, these findings provide insight into the mechanism of this enzyme and highlight its role in host-pathogen interactions.

  12. A comparison of the crystal structures of eukaryotic and bacterial SSU ribosomal RNAs reveals common structural features in the hypervariable regions.

    Directory of Open Access Journals (Sweden)

    Jung C Lee

    Full Text Available While the majority of the ribosomal RNA structure is conserved in the three major domains of life--archaea, bacteria, and eukaryotes, specific regions of the rRNA structure are unique to at least one of these three primary forms of life. In particular, the comparative secondary structure for the eukaryotic SSU rRNA contains several regions that are different from the analogous regions in the bacteria. Our detailed analysis of two recently determined eukaryotic 40S ribosomal crystal structures, Tetrahymena thermophila and Saccharomyces cerevisiae, and the comparison of these results with the bacterial Thermus thermophilus 30S ribosomal crystal structure: (1 revealed that the vast majority of the comparative structure model for the eukaryotic SSU rRNA is substantiated, including the secondary structure that is similar to both bacteria and archaea as well as specific for the eukaryotes, (2 resolved the secondary structure for regions of the eukaryotic SSU rRNA that were not determined with comparative methods, (3 identified eukaryotic helices that are equivalent to the bacterial helices in several of the hypervariable regions, (4 revealed that, while the coaxially stacked compound helix in the 540 region in the central domain maintains the constant length of 10 base pairs, its two constituent helices contain 5+5 bp rather than the 6+4 bp predicted with comparative analysis of archaeal and eukaryotic SSU rRNAs.

  13. Structural view on recycling of archaeal and eukaryotic ribosomes after canonical termination and ribosome rescue.

    Science.gov (United States)

    Franckenberg, Sibylle; Becker, Thomas; Beckmann, Roland

    2012-12-01

    Ribosome recycling usually occurs after canonical termination triggered by a stop codon. Additionally, ribosomes that are stalled by aberrant mRNAs need to be recognized and subsequently recycled. In eukaryotes and archaea, the factors involved in canonical termination and ribosome rescue are structurally and functionally related. Both termination and ribosome rescue are mediated by class I release factors (eRF1/aRF1 in eukaryotic/archaeal termination) or their paralogs (Pelota/aPelota for ribosome rescue) and homologs of translational GTPases (eRF3/aEF1α in termination, Hbs1/aEF1α in ribosome rescue). These events are followed by recycling of the ribosome. Recently the ATPase ABCE1 was shown to be the main ribosome recycling factor. In concert with eRF1 or Pelota, ABCE1 dissociates the ribosome into subunits. During the past two years, several structures of ribosome rescue and ribosome recycling complexes have been solved by cryo-electron microscopy and crystallography. These structures along with recent functional data make it possible to propose a molecular model of these late translation events in termination and recycling. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Comparative genomic analysis reveals a diverse repertoire of genes involved in prokaryote-eukaryote interactions within the Pseudovibrio genus.

    Directory of Open Access Journals (Sweden)

    Stefano eRomano

    2016-03-01

    Full Text Available Strains of the Pseudovibrio genus have been detected worldwide, mainly as part of bacterial communities associated with marine invertebrates, particularly sponges. This recurrent association has been considered as an indication of a symbiotic relationship between these microbes and their host. Until recently, the availability of only two genomes, belonging to closely related strains, has limited the knowledge on the genomic and physiological features of the genus to a single phylogenetic lineage.Here we present 10 newly sequenced genomes of Pseudovibrio strains isolated from marine sponges from the west coast of Ireland, and including the other two publicly available genomes we performed an extensive comparative genomic analysis. Homogeneity was apparent in terms of both the orthologous genes and the metabolic features shared amongst the 12 strains. At the genomic level, a key physiological difference observed amongst the isolates was the presence only in strain P. axinellae AD2 of genes encoding proteins involved in assimilatory nitrate reduction, which was then proved experimentally. We then focused on studying those systems known to be involved in the interactions with eukaryotic and prokaryotic cells. This analysis revealed that the genus harbors a large diversity of toxin-like proteins, secretion systems and their potential effectors. Their distribution in the genus was not always consistent with the phylogenetic relationship of the strains. Finally, our analyses identified new genomic islands encoding potential toxin-immunity systems, previously unknown in the genus.Our analyses shed new light on the Pseudovibrio genus, indicating a large diversity of both metabolic features and systems for interacting with the host. The diversity in both distribution and abundance of these systems amongst the strains underlines how metabolically and phylogenetically similar bacteria may use different strategies to interact with the host and find a niche

  15. Structural genomics of human proteins.

    Science.gov (United States)

    Osman, Khan Tanjid; Edwards, Aled

    2014-01-01

    Structural genomics efforts focused on the human proteome have had three aims: to understand the structural and functional variations within protein families; to understand the structural basis of disease and genetic variation; and to determine the structures of human integral membrane proteins. The overarching theme is to advance the understanding of human health and to provide a structural platform to aid in the development of therapeutics. A decade or more of work in this field has identified optimal experimental strategies that can be used to expedite expression and crystallization of human proteins-and we provide some guidance to this end.

  16. Molecular Data are Transforming Hypotheses on the Origin and Diversification of Eukaryotes.

    Science.gov (United States)

    Tekle, Yonas I; Parfrey, Laura Wegener; Katz, Laura A

    2009-06-01

    The explosion of molecular data has transformed hypotheses on both the origin of eukaryotes and the structure of the eukaryotic tree of life. Early ideas about the evolution of eukaryotes arose through analyses of morphology by light microscopy and later electron microscopy. Though such studies have proven powerful at resolving more recent events, theories on origins and diversification of eukaryotic life have been substantially revised in light of analyses of molecular data including gene and, increasingly, whole genome sequences. By combining these approaches, progress has been made in elucidating both the origin and diversification of eukaryotes. Yet many aspects of the evolution of eukaryotic life remain to be illuminated.

  17. The protein structure initiative structural genomics knowledgebase.

    Science.gov (United States)

    Berman, Helen M; Westbrook, John D; Gabanyi, Margaret J; Tao, Wendy; Shah, Raship; Kouranov, Andrei; Schwede, Torsten; Arnold, Konstantin; Kiefer, Florian; Bordoli, Lorenza; Kopp, Jürgen; Podvinec, Michael; Adams, Paul D; Carter, Lester G; Minor, Wladek; Nair, Rajesh; La Baer, Joshua

    2009-01-01

    The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities.

  18. Structure of Prokaryotic Polyamine Deacetylase Reveals Evolutionary Functional Relationships with Eukaryotic Histone Deacetylases

    Energy Technology Data Exchange (ETDEWEB)

    P Lombardi; H Angell; D Whittington; E Flynn; K Rajashankar; D Christianson

    2011-12-31

    Polyamines are a ubiquitous class of polycationic small molecules that can influence gene expression by binding to nucleic acids. Reversible polyamine acetylation regulates nucleic acid binding and is required for normal cell cycle progression and proliferation. Here, we report the structures of Mycoplana ramosa acetylpolyamine amidohydrolase (APAH) complexed with a transition state analogue and a hydroxamate inhibitor and an inactive mutant complexed with two acetylpolyamine substrates. The structure of APAH is the first of a histone deacetylase-like oligomer and reveals that an 18-residue insert in the L2 loop promotes dimerization and the formation of an 18 {angstrom} long 'L'-shaped active site tunnel at the dimer interface, accessible only to narrow and flexible substrates. The importance of dimerization for polyamine deacetylase function leads to the suggestion that a comparable dimeric or double-domain histone deacetylase could catalyze polyamine deacetylation reactions in eukaryotes.

  19. Crystal structure of the homology domain of the eukaryotic DNA replication proteins Sld3/Treslin.

    Science.gov (United States)

    Itou, Hiroshi; Muramatsu, Sachiko; Shirakihara, Yasuo; Araki, Hiroyuki

    2014-09-02

    The initiation of eukaryotic chromosomal DNA replication requires the formation of an active replicative helicase at the replication origins of chromosomal DNA. Yeast Sld3 and its metazoan counterpart Treslin are the hub proteins mediating protein associations critical for the helicase formation. Here, we show the crystal structure of the central domain of Sld3 that is conserved in Sld3/Treslin family of proteins. The domain consists of two segments with 12 helices and is sufficient to bind to Cdc45, the essential helicase component. The structure model of the Sld3-Cdc45 complex, which is crucial for the formation of the active helicase, is proposed. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Methods and Applications of CRISPR-Mediated Base Editing in Eukaryotic Genomes.

    Science.gov (United States)

    Hess, Gaelen T; Tycko, Josh; Yao, David; Bassik, Michael C

    2017-10-05

    The past several years have seen an explosion in development of applications for the CRISPR-Cas9 system, from efficient genome editing, to high-throughput screening, to recruitment of a range of DNA and chromatin-modifying enzymes. While homology-directed repair (HDR) coupled with Cas9 nuclease cleavage has been used with great success to repair and re-write genomes, recently developed base-editing systems present a useful orthogonal strategy to engineer nucleotide substitutions. Base editing relies on recruitment of cytidine deaminases to introduce changes (rather than double-stranded breaks and donor templates) and offers potential improvements in efficiency while limiting damage and simplifying the delivery of editing machinery. At the same time, these systems enable novel mutagenesis strategies to introduce sequence diversity for engineering and discovery. Here, we review the different base-editing platforms, including their deaminase recruitment strategies and editing outcomes, and compare them to other CRISPR genome-editing technologies. Additionally, we discuss how these systems have been applied in therapeutic, engineering, and research settings. Lastly, we explore future directions of this emerging technology. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Comparative genomic analysis of multi-subunit tethering complexes demonstrates an ancient pan-eukaryotic complement and sculpting in Apicomplexa.

    Directory of Open Access Journals (Sweden)

    Christen M Klinger

    Full Text Available Apicomplexa are obligate intracellular parasites that cause tremendous disease burden world-wide. They utilize a set of specialized secretory organelles in their invasive process that require delivery of components for their biogenesis and function, yet the precise mechanisms underpinning such processes remain unclear. One set of potentially important components is the multi-subunit tethering complexes (MTCs, factors increasingly implicated in all aspects of vesicle-target interactions. Prompted by the results of previous studies indicating a loss of membrane trafficking factors in Apicomplexa, we undertook a bioinformatic analysis of MTC conservation. Building on knowledge of the ancient presence of most MTC proteins, we demonstrate the near complete retention of MTCs in the newly available genomes for Guillardiatheta and Bigelowiellanatans. The latter is a key taxonomic sampling point as a basal sister taxa to the group including Apicomplexa. We also demonstrate an ancient origin of the CORVET complex subunits Vps8 and Vps3, as well as the TRAPPII subunit Tca17. Having established that the lineage leading to Apicomplexa did at one point possess the complete eukaryotic complement of MTC components, we undertook a deeper taxonomic investigation in twelve apicomplexan genomes. We observed excellent conservation of the VpsC core of the HOPS and CORVET complexes, as well as the core TRAPP subunits, but sparse conservation of TRAPPII, COG, Dsl1, and HOPS/CORVET-specific subunits. However, those subunits that we did identify appear to be expressed with similar patterns to the fully conserved MTC proteins, suggesting that they may function as minimal complexes or with analogous partners. Strikingly, we failed to identify any subunits of the exocyst complex in all twelve apicomplexan genomes, as well as the dinoflagellate Perkinsus marinus. Overall, we demonstrate reduction of MTCs in Apicomplexa and their ancestors, consistent with modification during

  2. Functional insights from structural genomics.

    Science.gov (United States)

    Forouhar, Farhad; Kuzin, Alexandre; Seetharaman, Jayaraman; Lee, Insun; Zhou, Weihong; Abashidze, Mariam; Chen, Yang; Yong, Wei; Janjua, Haleema; Fang, Yingyi; Wang, Dongyan; Cunningham, Kellie; Xiao, Rong; Acton, Thomas B; Pichersky, Eran; Klessig, Daniel F; Porter, Carl W; Montelione, Gaetano T; Tong, Liang

    2007-09-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF_0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP_1951), and a 12-stranded beta-barrel with a novel fold (V. parahaemolyticus VPA1032).

  3. Structure of a Eukaryotic CLC Transporter Defines an Intermediate State in the Transport Cycle

    Energy Technology Data Exchange (ETDEWEB)

    Feng, Liang; Campbell, Ernest B.; Hsiung, Yichun; MacKinnon, Roderick (Rockefeller)

    2010-12-02

    CLC proteins transport chloride (Cl{sup -}) ions across cell membranes to control the electrical potential of muscle cells, transfer electrolytes across epithelia, and control the pH and electrolyte composition of intracellular organelles. Some members of this protein family are Cl{sup -} ion channels, whereas others are secondary active transporters that exchange Cl{sup -} ions and protons (H{sup +}) with a 2:1 stoichiometry. We have determined the structure of a eukaryotic CLC transporter at 3.5 angstrom resolution. Cytoplasmic cystathionine beta-synthase (CBS) domains are strategically positioned to regulate the ion-transport pathway, and many disease-causing mutations in human CLCs reside on the CBS-transmembrane interface. Comparison with prokaryotic CLC shows that a gating glutamate residue changes conformation and suggests a basis for 2:1 Cl{sup -}/H{sup +} exchange and a simple mechanistic connection between CLC channels and transporters.

  4. Structural Genomics on the Web

    OpenAIRE

    Wixon, Jo

    2001-01-01

    In this review we provide a brief guide to some of the resources and databases that can be used to locate information and aid research in the growing field of structural genomics. The review will provide examples, for less experienced users, of what can be achieved using a selection of the available sites. We hope that this will encourage you to use these sites to their full potential and whet your appetite to search for other related sites.

  5. Effect of environmental variables on eukaryotic microbial community structure of land-fast Arctic sea ice.

    Science.gov (United States)

    Eddie, Brian; Juhl, Andrew; Krembs, Christopher; Baysinger, Charles; Neuer, Susanne

    2010-03-01

    Sea ice microbial community structure affects carbon and nutrient cycling in polar seas, but its susceptibility to changing environmental conditions is not well understood. We studied the eukaryotic microbial community in sea ice cores recovered near Point Barrow, AK in May 2006 by documenting the composition of the community in relation to vertical depth within the cores, as well as light availability (mainly as variable snow cover) and nutrient concentrations. We applied a combination of epifluorescence microscopy, denaturing gradient gel electrophoresis and clone libraries of a section of the 18S rRNA gene in order to compare the community structure of the major eukaryotic microbial phylotypes in the ice. We find that the community composition of the sea ice is more affected by the depth horizon in the ice than by light availability, although there are significant differences in the abundance of some groups between light regimes. Epifluorescence microscopy shows a shift from predominantly heterotrophic life styles in the upper ice to autotrophy prevailing in the bottom ice. This is supported by the statistical analysis of the similarity between the samples based on the denaturing gradient gel electrophoresis banding patterns, which shows a clear difference between upper and lower ice sections with respect to phylotypes and their proportional abundance. Clone libraries constructed using diatom-specific primers confirm the high diversity of diatoms in the sea ice, and support the microscopic counts. Evidence of protistan grazing upon diatoms was also found in lower sections of the core, with implications for carbon and nutrient recycling in the ice.

  6. 2004 Structural, Function and Evolutionary Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  7. Structural and functional analysis of rice genome

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 83; Issue 1. Structural and functional analysis of rice genome ... Abstract. Rice is an excellent system for plant genomics as it represents a modest size genome of 430 Mb. It feeds more than half the population of the world. Draft sequences of the rice genome, derived by ...

  8. Protein production from the structural genomics perspective: achievements and future needs.

    Science.gov (United States)

    Almo, Steven C; Garforth, Scott J; Hillerich, Brandan S; Love, James D; Seidel, Ronald D; Burley, Stephen K

    2013-06-01

    Despite a multitude of recent technical breakthroughs speeding high-resolution structural analysis of biological macromolecules, production of sufficient quantities of well-behaved, active protein continues to represent the rate-limiting step in many structure determination efforts. These challenges are only amplified when considered in the context of ongoing structural genomics efforts, which are now contending with multi-domain eukaryotic proteins, secreted proteins, and ever-larger macromolecular assemblies. Exciting new developments in eukaryotic expression platforms, including insect and mammalian-based systems, promise enhanced opportunities for structural approaches to some of the most important biological problems. Development and implementation of automated eukaryotic expression techniques promises to significantly improve production of materials for structural, functional, and biomedical research applications. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  10. The quaternary structure of the eukaryotic DNA replication proteins Sld7 and Sld3.

    Science.gov (United States)

    Itou, Hiroshi; Shirakihara, Yasuo; Araki, Hiroyuki

    2015-08-01

    The initiation of eukaryotic chromosomal DNA replication requires the formation of an active replicative helicase at the replication origins of chromosomes. Yeast Sld3 and its metazoan counterpart treslin are the hub proteins mediating protein associations critical for formation of the helicase. The Sld7 protein interacts with Sld3, and the complex formed is thought to regulate the function of Sld3. Although Sld7 is a non-essential DNA replication protein that is found in only a limited range of yeasts, its depletion slowed the growth of cells and caused a delay in the S phase. Recently, the Mdm2-binding protein was found to bind to treslin in humans, and its depletion causes defects in cells similar to the depletion of Sld7 in yeast, suggesting their functional relatedness and importance during the initiation step of DNA replication. Here, the crystal structure of Sld7 in complex with Sld3 is presented. Sld7 comprises two structural domains. The N-terminal domain of Sld7 binds to Sld3, and the C-terminal domains connect two Sld7 molecules in an antiparallel manner. The quaternary structure of the Sld3-Sld7 complex shown from the crystal structures appears to be suitable to activate two helicase molecules loaded onto replication origins in a head-to-head manner.

  11. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

    Science.gov (United States)

    Merchant, Nirav

    2016-01-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957

  12. Structural Genomics of Bacterial Virulence Factors

    National Research Council Canada - National Science Library

    Liddington, Robert

    2004-01-01

    We are applying a comprehensive yet focused structural genomics approach to determine the atomic resolution crystal structures of key bacterial virulence factors from high priority bacterial pathogens...

  13. Effect of disinfectant, water age, and pipe materials on bacterial and eukaryotic community structure in drinking water biofilm.

    Science.gov (United States)

    Wang, Hong; Masters, Sheldon; Edwards, Marc A; Falkinham, Joseph O; Pruden, Amy

    2014-01-01

    Availability of safe, pathogen-free drinking water is vital to public health; however, it is impossible to deliver sterile drinking water to consumers. Recent microbiome research is bringing new understanding to the true extent and diversity of microbes that inhabit water distribution systems. The purpose of this study was to determine how water chemistry in main distribution lines shape the microbiome in drinking water biofilms and to explore potential associations between opportunistic pathogens and indigenous drinking water microbes. Effects of disinfectant (chloramines, chlorine), water age (2.3 days, 5.7 days), and pipe material (cement, iron, PVC) were compared in parallel triplicate simulated water distribution systems. Pyrosequencing was employed to characterize bacteria and terminal restriction fragment polymorphism was used to profile both bacteria and eukaryotes inhabiting pipe biofilms. Disinfectant and water age were both observed to be strong factors in shaping bacterial and eukaryotic community structures. Pipe material only influenced the bacterial community structure (ANOSIM test, P water age on both bacteria and eukaryotes were noted. Disinfectant concentration had the strongest effect on bacteria, while dissolved oxygen appeared to be a major driver for eukaryotes (BEST test). Several correlations of similarity metrics among populations of bacteria, eukaryotes, and opportunistic pathogens, as well as one significant association between mycobacterial and proteobacterial operational taxonomic units, provides insight into means by which manipulating the microbiome may lead to new avenues for limiting the growth of opportunistic pathogens (e.g., Legionella) or other nuisance organisms (e.g., nitrifiers).

  14. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  15. The structure of TON1937 from archaeon Thermococcus onnurineus NA1 reveals a eukaryotic HEAT-like architecture.

    Science.gov (United States)

    Jeong, Jae-Hee; Kim, Yi-Seul; Rojviriya, Catleya; Cha, Hyung Jin; Ha, Sung-Chul; Kim, Yeon-Gil

    2013-10-01

    The members of the ARM/HEAT repeat-containing protein superfamily in eukaryotes have been known to mediate protein-protein interactions by using their concave surface. However, little is known about the ARM/HEAT repeat proteins in prokaryotes. Here we report the crystal structure of TON1937, a hypothetical protein from the hyperthermophilic archaeon Thermococcus onnurineus NA1. The structure reveals a crescent-shaped molecule composed of a double layer of α-helices with seven anti-parallel α-helical repeats. A structure-based sequence alignment of the α-helical repeats identified a conserved pattern of hydrophobic or aliphatic residues reminiscent of the consensus sequence of eukaryotic HEAT repeats. The individual repeats of TON1937 also share high structural similarity with the canonical eukaryotic HEAT repeats. In addition, the concave surface of TON1937 is proposed to be its potential binding interface based on this structural comparison and its surface properties. These observations lead us to speculate that the archaeal HEAT-like repeats of TON1937 have evolved to engage in protein-protein interactions in the same manner as eukaryotic HEAT repeats. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. Design and chemical synthesis of eukaryotic chromosomes.

    Science.gov (United States)

    Xie, Ze-Xiong; Liu, Duo; Li, Bing-Zhi; Zhao, Meng; Zeng, Bo-Xuan; Wu, Yi; Shen, Yue; Lin, Tao; Yang, Ping; Dai, Junbiao; Cai, Yizhi; Yang, Huanming; Yuan, Ying-Jin

    2017-11-27

    Following the discovery of the DNA double helix structure and the advancement of genome sequencing, we have entered a promising stage with regard to genome writing. Recently, a milestone breakthrough was achieved in the chemical synthesis of designer yeast chromosomes. Here, we review the systematic approaches to the de novo synthesis of designer eukaryotic chromosomes, with an emphasis on technologies and methodologies that enable design, building, testing and debugging. The achievement of chemically synthesized genomes with customized genetic features offers an opportunity to rebuild genome organization, remold biological functions and promote life evolution, which will be of great benefit for application in medicine and industrial manufacturing.

  17. Structural characterization of genomes by large scale sequence-structure threading

    Directory of Open Access Journals (Sweden)

    Cherkasov Artem

    2004-04-01

    Full Text Available Abstract Background Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms (including worm, fly, mouse and human totaling 167,888 genes. Results The reported data represent first rather general evaluation of performance of full sequence-structure threading on multiple genomes providing opportunity to evaluate its general applicability for large scale studies. According to the estimated results the sequence-structure threading has assigned protein folds to more then 60% of eukaryotic, 68% of archaeal and 70% of bacterial proteomes. The repertoires of protein classes, architectures, topologies and homologous superfamilies (according to the CATH 2.4 classification have been established for distant organisms and superkingdoms. It has been found that the average abundance of CATH classes decreases from "alpha and beta" to "mainly beta", followed by "mainly alpha" and "few secondary structures". 3-Layer (aba Sandwich has been characterized as the most abundant protein architecture and Rossman fold as the most common topology. Conclusion The analysis of genomic occurrences of CATH 2.4 protein homologous superfamilies and topologies has revealed the power-law character of their distributions. The corresponding double logarithmic "frequency – genomic occurrence" dependences characteristic of scale-free systems have been established for individual organisms and for three superkingdoms. Supplementary materials to this works are available at 1.

  18. Unmet challenges of structural genomics.

    Science.gov (United States)

    Chruszcz, Maksymilian; Domagalski, Marcin; Osinski, Tomasz; Wlodawer, Alexander; Minor, Wladek

    2010-10-01

    Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to the determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3D models. This situation has prompted us to review the challenges that remain unmet by SG, as well as the areas in which the potential impact of SG could exceed what has been achieved so far. Copyright © 2010 Elsevier Ltd. All rights reserved.

  19. Postcards from the edge: structural genomics of archaeal viruses.

    Science.gov (United States)

    Krupovic, Mart; White, Malcolm F; Forterre, Patrick; Prangishvili, David

    2012-01-01

    Ever since their discovery, archaeal viruses have fascinated biologists with their unusual virion morphotypes and their ability to thrive in extreme environments. Attempts to understand the biology of these viruses through genome sequence analysis were not efficient. Genomes of archaeoviruses proved to be terra incognita with only a few genes with predictable functions but uncertain provenance. In order to facilitate functional characterization of archaeal virus proteins, several research groups undertook a structural genomics approach. This chapter summarizes the outcome of these efforts. High-resolution structures of 30 proteins encoded by archaeal viruses have been solved so far. Some of these proteins possess new structural folds, whereas others display previously known topologies, albeit without detectable sequence similarity to their structural homologues. Structures of the major capsid proteins have illuminated intriguing evolutionary connections between viruses infecting hosts from different domains of life and also revealed new structural folds not yet observed in currently known bacterial and eukaryotic viruses. Structural studies, discussed here, have advanced our understanding of the archaeal virosphere and provided precious information on different aspects of biology of archaeal viruses and evolution of viruses in general. Copyright © 2012 Elsevier Inc. All rights reserved.

  20. The structural basis of substrate recognition by the eukaryotic chaperonin TRiC/CCT.

    Science.gov (United States)

    Joachimiak, Lukasz A; Walzthoeni, Thomas; Liu, Corey W; Aebersold, Ruedi; Frydman, Judith

    2014-11-20

    The eukaryotic chaperonin TRiC (also called CCT) is the obligate chaperone for many essential proteins. TRiC is hetero-oligomeric, comprising two stacked rings of eight different subunits each. Subunit diversification from simpler archaeal chaperonins appears linked to proteome expansion. Here, we integrate structural, biophysical, and modeling approaches to identify the hitherto unknown substrate-binding site in TRiC and uncover the basis of substrate recognition. NMR and modeling provided a structural model of a chaperonin-substrate complex. Mutagenesis and crosslinking-mass spectrometry validated the identified substrate-binding interface and demonstrate that TRiC contacts full-length substrates combinatorially in a subunit-specific manner. The binding site of each subunit has a distinct, evolutionarily conserved pattern of polar and hydrophobic residues specifying recognition of discrete substrate motifs. The combinatorial recognition of polypeptides broadens the specificity of TRiC and may direct the topology of bound polypeptides along a productive folding trajectory, contributing to TRiC's unique ability to fold obligate substrates.

  1. Protein production from the structural genomics perspective: achievements and future needs

    OpenAIRE

    Almo, Steven C; Garforth, Scott J; Hillerich, Brandan S; Love, James D; Seidel, Ronald D; Burley, Stephen K

    2013-01-01

    Despite a multitude of recent technical breakthroughs speeding high-resolution structural analysis of biological macromolecules, production of sufficient quantities of well-behaved, active protein continues to represent the rate-limiting step in many structure determination efforts. These challenges are only amplified when considered in the context of ongoing structural genomics efforts, which are now contending with multi-domain eukaryotic proteins, secreted proteins, and ever-larger macromo...

  2. Vertical structure of small eukaryotes in three lakes that differ by their trophic status: a quantitative approach.

    Science.gov (United States)

    Lepère, Cecile; Masquelier, Sylvie; Mangot, Jean-François; Debroas, Didier; Domaizon, Isabelle

    2010-12-01

    In lakes, the diversity of eukaryotic picoplankton has been recently studied by the analysis of 18S ribosomal RNA gene sequences; however, quantitative data are rare. In this study, the vertical structure and abundance of the small eukaryotic size fraction (0.2-5 μm) were investigated in three lakes by tyramide signal amplification-fluorescent in situ hybridization targeting six phylogenetic groups: Chlorophyta, Haptophyta, Cercozoa, LKM11, Perkinsozoa and fungi. The groups targeted in this study are found in all lakes; however, both the abundance and structure of small eukaryotes are dependent on the system's productivity and depth. These data highlighted the presence of Chlorophyta contributing on an average to 19.3%, 14.7% and 41.2% of total small eukaryotes in lakes Bourget, Aydat and Pavin, respectively. This study also revealed the unexpected importance of Haptophyta, reaching 62.8% of eukaryotes in the euphotic zone of Lake Bourget. The high proportions of these pigmented cells highlight the underestimation of these groups by PCR-based methods. The presence of pigmented Chlorophyta in the deepest zones of the lakes suggests a mixotrophic behaviour of these taxa. We also confirmed the presence of putative parasites such as Perkinsozoa (5.1% of small eukaryotes in Lake Pavin and Bourget) and, with lower abundances, fungi (targeted by the MY1574 probe). Cells targeted by LKM11 probes represented the second group of abundance within heterotrophs. Open questions regarding the functional roles of the targeted groups arise from this study, especially regarding parasitism and mixotrophy, which are interactions poorly taken into account in planktonic food web models.

  3. Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters

    Directory of Open Access Journals (Sweden)

    Gagniuc Paul

    2012-09-01

    Full Text Available Abstract Background The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species? Results In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from Arabidopsis thaliana, Drosophila melanogaster, Homo sapiens and Oryza sativa, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on

  4. Structure and Dynamics of Membrane Proteins and Membrane Associated Proteins with Native Bicelles from Eukaryotic Tissues.

    Science.gov (United States)

    Smrt, Sean T; Draney, Adrian W; Singaram, Indira; Lorieau, Justin L

    2017-10-10

    In vitro studies of protein structure, function, and dynamics typically preclude the complex range of molecular interactions found in living tissues. In vivo studies elucidate these complex relationships, yet they are typically incompatible with the extensive and controlled biophysical experiments available in vitro. We present an alternative approach by extracting membranes from eukaryotic tissues to produce native bicelles to capture the rich and complex molecular environment of in vivo studies while retaining the advantages of in vitro experiments. Native bicelles derived from chicken egg or mouse cerebrum tissues contain a rich composition of phosphatidylcholine (PC), phosphatidylethanolamine (PE), phosphatidylglycerol (PG), phosphatidylserine (PS), phosphatidylinositol (PI), phosphatidic acid (PA), lysolipids, cholesterol, ceramides (CM), and sphingomyelin (SM). The bicelles also contain source-specific lipids such as triacylglycerides (TAGs) and sulfatides from egg and brain tissues, respectively. With the influenza hemagglutinin fusion peptide (HAfp) and the C-terminal Src homology domain of lymphocyte-specific protein-tyrosine kinase (lck-cSH2), we show that membrane proteins and membrane associated proteins reconstituted in native bicelles produce high-resolution NMR data and probe native protein-lipid interactions.

  5. Surprising prokaryotic and eukaryotic diversity, community structure and biogeography of Ethiopian soda lakes.

    Directory of Open Access Journals (Sweden)

    Anders Lanzén

    Full Text Available Soda lakes are intriguing ecosystems harboring extremely productive microbial communities in spite of their extreme environmental conditions. This makes them valuable model systems for studying the connection between community structure and abiotic parameters such as pH and salinity. For the first time, we apply high-throughput sequencing to accurately estimate phylogenetic richness and composition in five soda lakes, located in the Ethiopian Rift Valley. The lakes were selected for their contrasting pH, salinities and stratification and several depths or spatial positions were covered in each lake. DNA was extracted and analyzed from all lakes at various depths and RNA extracted from two of the lakes, analyzed using both amplicon- and shotgun sequencing. We reveal a surprisingly high biodiversity in all of the studied lakes, similar to that of freshwater lakes. Interestingly, diversity appeared uncorrelated or positively correlated to pH and salinity, with the most "extreme" lakes showing the highest richness. Together, pH, dissolved oxygen, sodium- and potassium concentration explained approximately 30% of the compositional variation between samples. A diversity of prokaryotic and eukaryotic taxa could be identified, including several putatively involved in carbon-, sulfur- or nitrogen cycling. Key processes like methane oxidation, ammonia oxidation and 'nitrifier denitrification' were also confirmed by mRNA transcript analyses.

  6. ATLs and BTLs, plant-specific and general eukaryotic structurally-related E3 ubiquitin ligases.

    Science.gov (United States)

    Guzmán, Plinio

    2014-02-01

    Major components of the ubiquitin proteasome system are the enzymes that operate on the transfer of ubiquitin to selected target substrate, known as ubiquitin ligases. The RING finger is a domain that is present in key classes of ubiquitin ligases. This domain coordinates the interaction with a suitable E2 conjugase and the transfer of ubiquitin from the E2 to protein targets. Additional domains coupled to the same polypeptide are important for modulating the function of these ubiquitin ligases. Plants contain several types of E3 ubiquitin ligases that in many cases have expanded as multigene families. Some families are specific to the plant lineage, whereas others may have a common ancestor among plants and other eukaryotic lineages. Arabidopsis Tóxicos en Levadura (ATLs) and BCA2 zinc finger ATLs (BTLs) are two families of ubiquitin ligases that share some common structural features. These are intronless genes that encode a highly related RING finger domain, and yet during evolutionary history, their mode of gene expansion and function is rather different. In each of these two families, the co-occurrence of transmembrane helices or C2/C2 (BZF finger) domains with a selected variation on the RING finger has been subjected to strong selection pressure in order to preserve their unique domain architectures during evolution. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  7. The PRESAGE database for structural genomics.

    OpenAIRE

    Brenner, S E; Barken, D; Levitt, M

    1999-01-01

    The PRESAGE database is a collaborative resource for structural genomics. It provides a database of proteins to which researchers add annotations indicating current experimental status, structural predictions and suggestions. The database is intended to enhance communication among structural genomics researchers and aid dissemination of their results. The PRESAGE database may be accessed at http://presage.stanford.edu/

  8. Unravelling cis-regulatory elements in the genome of the smallest photosynthetic eukaryote: phylogenetic footprinting in Ostreococcus.

    Science.gov (United States)

    Piganeau, Gwenael; Vandepoele, Klaas; Gourbière, Sébastien; Van de Peer, Yves; Moreau, Hervé

    2009-09-01

    We used a phylogenetic footprinting approach, adapted to high levels of divergence, to estimate the level of constraint in intergenic regions of the extremely gene dense Ostreococcus algae genomes (Chlorophyta, Prasinophyceae). We first benchmarked our method against the Saccharomyces sensu stricto genome data and found that the proportion of conserved non-coding sites was consistent with those obtained with methods using calibration by the neutral substitution rate. We then applied our method to the complete genomes of Ostreococcus tauri and O. lucimarinus, which are the most divergent species from the same genus sequenced so far. We found that 77% of intergenic regions in Ostreococcus still contain some phylogenetic footprints, as compared to 88% for Saccharomyces, corresponding to an average rate of constraint on intergenic region of 17% and 30%, respectively. A comparison with some known functional cis-regulatory elements enabled us to investigate whether some transcriptional regulatory pathways were conserved throughout the green lineage. Strikingly, the size of the phylogenetic footprints depends on gene orientation of neighboring genes, and appears to be genus-specific. In Ostreococcus, 5' intergenic regions contain four times more conserved sites than 3' intergenic regions, whereas in yeast a higher frequency of constrained sites in intergenic regions between genes on the same DNA strand suggests a higher frequency of bidirectional regulatory elements. The phylogenetic footprinting approach can be used despite high levels of divergence in the ultrasmall Ostreococcus algae, to decipher structure of constrained regulatory motifs, and identify putative regulatory pathways conserved within the green lineage.

  9. Structural organization of very small chromosomes: study on a single-celled evolutionary distant eukaryote Giardia intestinalis.

    Science.gov (United States)

    Tůmová, Pavla; Uzlíková, Magdalena; Wanner, Gerhard; Nohýnková, Eva

    2015-03-01

    During mitotic prophase, chromosomes of the pathogenic unicellular eukaryote Giardia intestinalis condense in each of the cell's two nuclei. In this study, Giardia chromosomes were investigated using light microscopy, high-resolution field emission scanning electron microscopy, and in situ hybridization. For the first time, we describe the overall morphology, condensation stages, and mitotic segregation of these chromosomes. Despite the absence of several genes involved in the cohesion and condensation pathways in the Giardia genome, we observed chromatin organization similar to those found in eukaryotes, i.e., 10-nm nucleosomal fibrils, 30-nm fibrils coiled to chromomeres or in parallel arrangements, and closely aligned sister chromatids. DNA molecules of Giardia terminate with telomeric repeats that we visualized on each of the four chromatid endings of metaphase chromosomes. Giardia chromosomes lack primary and secondary constrictions, thus preventing their classification based on the position of the centromere. The anaphase poleward segregation of sister chromatids is atypical in orientation and tends to generate lagging chromatids between daughter nuclei. In the Giardia genome database, we identified two putative members of the kleisin family thought to be responsible for condensin ring establishment. Thus far, Giardia chromosomes (300 nm to 1.5 μm) are the smallest chromosomes that were analyzed at the ultrastructural level. This study complements the existing molecular and sequencing data on Giardia chromosomes with cytological and ultrastructural information.

  10. Structural and dynamic characterization of eukaryotic gene regulatory protein domains in solution

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Andrew Loyd [Univ. of California, Berkeley, CA (United States). Dept. of Chemistry

    1996-05-01

    Solution NMR was primarily used to characterize structure and dynamics in two different eukaryotic protein systems: the δ-Al-ε activation domain from c-jun and the Drosophila RNA-binding protein Sex-lethal. The second system is the Drosophila Sex-lethal (Sxl) protein, an RNA-binding protein which is the ``master switch`` in sex determination. Sxl contains two adjacent RNA-binding domains (RBDs) of the RNP consensus-type. The NMR spectrum of the second RBD (Sxl-RBD2) was assigned using multidimensional heteronuclear NMR, and an intermediate-resolution family of structures was calculated from primarily NOE distance restraints. The overall fold was determined to be similar to other RBDs: a βαβ-βαβ pattern of secondary structure, with the two helices packed against a 4-stranded anti-parallel β-sheet. In addition 15N T1, T2, and 15N/1H NOE relaxation measurements were carried out to characterize the backbone dynamics of Sxl-RBD2 in solution. RNA corresponding to the polypyrimidine tract of transformer pre-mRNA was generated and titrated into 3 different Sxl-RBD protein constructs. Combining Sxl-RBD1+2 (bht RBDs) with this RNA formed a specific, high affinity protein/RNA complex that is amenable to further NMR characterization. The backbone 1H, 13C, and 15N resonances of Sxl-RBD1+2 were assigned using a triple-resonance approach, and 15N relaxation experiments were carried out to characterize the backbone dynamics of this complex. The changes in chemical shift in Sxl-RBD1+2 upon binding RNA are observed using Sxl-RBD2 as a substitute for unbound Sxl-RBD1+2. This allowed the binding interface to be qualitatively mapped for the second domain.

  11. The ARTT motif and a unified structural understanding of substraterecognition in ADP ribosylating bacterial toxins and eukaryotic ADPribosyltransferases

    Energy Technology Data Exchange (ETDEWEB)

    Han, S.; Tainer, J.A.

    2001-08-01

    ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core has been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within the ARTT

  12. Ruler arrays reveal haploid genomic structural variation.

    Directory of Open Access Journals (Sweden)

    P Alexander Rolfe

    Full Text Available Despite the known relevance of genomic structural variants to pathogen behavior, cancer, development, and evolution, certain repeat based structural variants may evade detection by existing high-throughput techniques. Here, we present ruler arrays, a technique to detect genomic structural variants including insertions and deletions (indels, duplications, and translocations. A ruler array exploits DNA polymerase's processivity to detect physical distances between defined genomic sequences regardless of the intervening sequence. The method combines a sample preparation protocol, tiling genomic microarrays, and a new computational analysis. The analysis of ruler array data from two genomic samples enables the identification of structural variation between the samples. In an empirical test between two closely related haploid strains of yeast ruler arrays detected 78% of the structural variants larger than 100 bp.

  13. Eukaryotic beta-alanine synthases are functionally related but have a high degree of structural diversity

    DEFF Research Database (Denmark)

    Gojkovic, Zoran; Sandrini, Michael; Piskur, Jure

    2001-01-01

    activity was used to clone analogous genes from different eukaryotes. Putative PYD3 sequences from the yeast S. kluyveri, the slime mold Dictyostelium discoideum, and the fruit fly Drosophila melanogaster complemented the pyd3 defect. When the S. kluyveri PYD3 gene was expressed in S. cerevisiae, which has...

  14. Genome-wide analysis of core promoter structures in Schizosaccharomyces pombe with DeepCAGE

    Science.gov (United States)

    Li, Hua; Hou, Jingyi; Bai, Ling; Hu, Chuansheng; Tong, Pan; Kang, Yani; Zhao, Xiaodong; Shao, Zhifeng

    2015-01-01

    The core promoter, which immediately flanks the transcription start site (TSS), plays a critical role in transcriptional regulation of eukaryotes. Recent studies on higher eukaryotes have revealed an unprecedented complexity of core promoter structures that underscores diverse regulatory mechanisms of gene expression. For unicellular eukaryotes, however, the structures of core promoters have not been investigated in detail. As an important model organism, Schizosaccharomyces pombe still lacks the precise annotation for TSSs, thus hampering the analysis of core promoter structures and their relationship to higher eukaryotes. Here we used a deep sequencing-based approach (DeepCAGE) to generate 16 million uniquely mapped tags, corresponding to 93,736 positions in the S. pombe genome. The high-resolution TSS landscape enabled identification of over 8,000 core promoters, characterization of 4 promoter classes and observation of widespread alternative promoters. The landscape also allowed precise determination of the representative TSSs within core promoters, thus redefining the 5' UTR for 82.8% of S. pombe genes. We further identified the consensus initiator (Inr) sequence – PyPyPuN(A/C)(C/A), the TATA-enriched region (between position −25 and −37) and an Inr immediate downstream motif – CC(T/A)(T/C)(T/C/A)(A/G)CCA(A/T/C), all of which were associated with highly expressed promoters. In conclusion, the detailed analysis of core promoters not only significantly improves the genome annotation of S. pombe, but also reveals that this unicellular eukaryote shares a highly similar organization in the core promoters with higher eukaryotes. These findings lend additional evidence for the power of this model system in delineating complex regulatory processes in multicellular organisms, despite its perceived simplicity. PMID:25747261

  15. Genome-wide analysis of core promoter structures in Schizosaccharomyces pombe with DeepCAGE.

    Science.gov (United States)

    Li, Hua; Hou, Jingyi; Bai, Ling; Hu, Chuansheng; Tong, Pan; Kang, Yani; Zhao, Xiaodong; Shao, Zhifeng

    2015-01-01

    The core promoter, which immediately flanks the transcription start site (TSS), plays a critical role in transcriptional regulation of eukaryotes. Recent studies on higher eukaryotes have revealed an unprecedented complexity of core promoter structures that underscores diverse regulatory mechanisms of gene expression. For unicellular eukaryotes, however, the structures of core promoters have not been investigated in detail. As an important model organism, Schizosaccharomyces pombe still lacks the precise annotation for TSSs, thus hampering the analysis of core promoter structures and their relationship to higher eukaryotes. Here we used a deep sequencing-based approach (DeepCAGE) to generate 16 million uniquely mapped tags, corresponding to 93,736 positions in the S. pombe genome. The high-resolution TSS landscape enabled identification of over 8,000 core promoters, characterization of 4 promoter classes and observation of widespread alternative promoters. The landscape also allowed precise determination of the representative TSSs within core promoters, thus redefining the 5' UTR for 82.8% of S. pombe genes. We further identified the consensus initiator (Inr) sequence--PyPyPuN(A/C)(C/A), the TATA-enriched region (between position -25 and -37) and an Inr immediate downstream motif--CC(T/A)(T/C)(T/C/A)(A/G)CCA(A/T/C), all of which were associated with highly expressed promoters. In conclusion, the detailed analysis of core promoters not only significantly improves the genome annotation of S. pombe, but also reveals that this unicellular eukaryote shares a highly similar organization in the core promoters with higher eukaryotes. These findings lend additional evidence for the power of this model system in delineating complex regulatory processes in multicellular organisms, despite its perceived simplicity.

  16. Genomics technologies to study structural variations in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Cardone Maria Francesca

    2016-01-01

    Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.

  17. Target selection for structural genomics: an overview.

    Science.gov (United States)

    Marsden, Russell L; Orengo, Christine A

    2008-01-01

    The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics "pipeline." Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding "problematic" proteins while also ensuring the ultimate goals of the project are followed.

  18. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    Energy Technology Data Exchange (ETDEWEB)

    Arneodo, Alain, E-mail: alain.arneodo@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Vaillant, Cedric, E-mail: cedric.vaillant@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Audit, Benjamin, E-mail: benjamin.audit@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Argoul, Francoise, E-mail: francoise.argoul@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); D' Aubenton-Carafa, Yves, E-mail: daubenton@cgm.cnrs-gif.f [Centre de Genetique Moleculaire, CNRS, Allee de la Terrasse, 91198 Gif-sur-Yvette (France); Thermes, Claude, E-mail: claude.thermes@cgm.cnrs-gif.f [Centre de Genetique Moleculaire, CNRS, Allee de la Terrasse, 91198 Gif-sur-Yvette (France)

    2011-02-15

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  19. The Protein Data Bank and structural genomics

    OpenAIRE

    Westbrook, John; Feng, Zukang; Chen, Li; Yang, Huanwang; Berman, Helen M.

    2003-01-01

    The Protein Data Bank (PDB; http://www.pdb.org/) continues to be actively involved in various aspects of the informatics of structural genomics projects—developing and maintaining the Target Registration Database (TargetDB), organizing data dictionaries that will define the specification for the exchange and deposition of data with the structural genomics centers and creating software tools to capture data from standard structure determination applications.

  20. The Protein Data Bank and structural genomics.

    Science.gov (United States)

    Westbrook, John; Feng, Zukang; Chen, Li; Yang, Huanwang; Berman, Helen M

    2003-01-01

    The Protein Data Bank (PDB; http://www.pdb.org/) continues to be actively involved in various aspects of the informatics of structural genomics projects--developing and maintaining the Target Registration Database (TargetDB), organizing data dictionaries that will define the specification for the exchange and deposition of data with the structural genomics centers and creating software tools to capture data from standard structure determination applications.

  1. Origin and evolution of the self-organizing cytoskeleton in the network of eukaryotic organelles.

    Science.gov (United States)

    Jékely, Gáspár

    2014-09-02

    The eukaryotic cytoskeleton evolved from prokaryotic cytomotive filaments. Prokaryotic filament systems show bewildering structural and dynamic complexity and, in many aspects, prefigure the self-organizing properties of the eukaryotic cytoskeleton. Here, the dynamic properties of the prokaryotic and eukaryotic cytoskeleton are compared, and how these relate to function and evolution of organellar networks is discussed. The evolution of new aspects of filament dynamics in eukaryotes, including severing and branching, and the advent of molecular motors converted the eukaryotic cytoskeleton into a self-organizing "active gel," the dynamics of which can only be described with computational models. Advances in modeling and comparative genomics hold promise of a better understanding of the evolution of the self-organizing cytoskeleton in early eukaryotes, and its role in the evolution of novel eukaryotic functions, such as amoeboid motility, mitosis, and ciliary swimming. Copyright © 2014 Cold Spring Harbor Laboratory Press; all rights reserved.

  2. Structural basis for the initiation of eukaryotic transcription-coupled DNA repair

    OpenAIRE

    Xu, Jun; Lahiri, Indrajit; Wang, Wei; Wier, Adam; Cianfrocco, Michael A.; Chong, Jenny; Hare, Alissa A.; Dervan, Peter B.; DiMaio, Frank; Leschziner, Andres E.; Wang, Dong

    2017-01-01

    Eukaryotic transcription-coupled repair (TCR) is an important and well-conserved sub-pathway of nucleotide excision repair that preferentially removes DNA lesions from the template strand that block translocation of RNA polymerase II (Pol II). Cockayne syndrome group B (CSB, also known as ERCC6) protein in humans (or its yeast orthologues, Rad26 in Saccharomyces cerevisiae and Rhp26 in Schizosaccharomyces pombe) is among the first proteins to be recruited to the lesion-arrested Pol II during ...

  3. Conservation of functional domain structure in bicarbonate-regulated “soluble” adenylyl cyclases in bacteria and eukaryotes

    Science.gov (United States)

    Kobayashi, Mime; Buck, Jochen; Levin, Lonny R.

    2013-01-01

    Soluble adenylyl cyclase (sAC) is an evolutionarily conserved bicarbonate sensor. In mammals, it is responsible for bicarbonate-induced, cAMP-dependent processes in sperm required for fertilization and postulated to be involved in other bicarbonate- and carbon dioxide-dependent functions throughout the body. Among eukaryotes, sAC-like cyclases have been detected in mammals and in the fungi Dictyostelium; these enzymes display extensive similarity extending through two cyclase catalytic domains and a long carboxy terminal extension. sAC-like cyclases are also found in a number of bacterial phyla (Cyanobacteria, Actinobacteria, and Proteobacteria), but these enzymes generally possess only a single catalytic domain and little, if any, homology with the remainder of the mammalian protein. Database mining through a number of recently sequenced genomes identified sAC orthologues in additional metazoan phyla (Arthropoda and Chordata) and additional bacterial phyla (Chloroflexi). Interestingly, the Chloroflexi sAC-like cyclases, a family of three enzymes from the thermophilic eubacterium, Chloroflexus aurantiacus, are more similar to eukaryotic sAC-like cyclases (i.e., mammalian sAC and Dictyostelium SgcA) than they are to other bacterial adenylyl cyclases (ACs) (i.e., from Cyanobacteria). The Chloroflexus sAC-like cyclases each possess two cyclase catalytic domains and extensive similarity with mammalian enzymes through their carboxy termini. We cloned one of the Chloroflexus sAC-like cyclases and confirmed it to be stimulated by bicarbonate. These data extend the family of organisms possessing bicarbonate-responsive ACs to numerous phyla within the bacterial and eukaryotic kingdoms. PMID:15322879

  4. Chapter 6: Structural variation and medical genomics.

    Science.gov (United States)

    Raphael, Benjamin J

    2012-01-01

    Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

  5. Chapter 6: Structural variation and medical genomics.

    Directory of Open Access Journals (Sweden)

    Benjamin J Raphael

    Full Text Available Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs, have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

  6. Of bits and bugs--on the use of bioinformatics and a bacterial crystal structure to solve a eukaryotic repeat-protein structure.

    Directory of Open Access Journals (Sweden)

    Almut Graebsch

    Full Text Available Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins.

  7. Molecular paleontology and complexity in the last eukaryotic common ancestor.

    Science.gov (United States)

    Koumandou, V Lila; Wickstead, Bill; Ginger, Michael L; van der Giezen, Mark; Dacks, Joel B; Field, Mark C

    2013-01-01

    Eukaryogenesis, the origin of the eukaryotic cell, represents one of the fundamental evolutionary transitions in the history of life on earth. This event, which is estimated to have occurred over one billion years ago, remains rather poorly understood. While some well-validated examples of fossil microbial eukaryotes for this time frame have been described, these can provide only basic morphology and the molecular machinery present in these organisms has remained unknown. Complete and partial genomic information has begun to fill this gap, and is being used to trace proteins and cellular traits to their roots and to provide unprecedented levels of resolution of structures, metabolic pathways and capabilities of organisms at these earliest points within the eukaryotic lineage. This is essentially allowing a molecular paleontology. What has emerged from these studies is spectacular cellular complexity prior to expansion of the eukaryotic lineages. Multiple reconstructed cellular systems indicate a very sophisticated biology, which by implication arose following the initial eukaryogenesis event but prior to eukaryotic radiation and provides a challenge in terms of explaining how these early eukaryotes arose and in understanding how they lived. Here, we provide brief overviews of several cellular systems and the major emerging conclusions, together with predictions for subsequent directions in evolution leading to extant taxa. We also consider what these reconstructions suggest about the life styles and capabilities of these earliest eukaryotes and the period of evolution between the radiation of eukaryotes and the eukaryogenesis event itself.

  8. Two-dimensional data binning for the analysis of genome architecture in filamentous plant pathogens and other eukaryotes.

    Science.gov (United States)

    Saunders, Diane G O; Win, Joe; Kamoun, Sophien; Raffaele, Sylvain

    2014-01-01

    Genome architecture often reflects an organism's lifestyle and can therefore provide insights into gene function, regulation, and adaptation. In several lineages of plant pathogenic fungi and oomycetes, characteristic repeat-rich and gene-sparse regions harbor pathogenicity-related genes such as effectors. In these pathogens, analysis of genome architecture has assisted the mining for novel candidate effector genes and investigations into patterns of gene regulation and evolution at the whole genome level. Here we describe a two-dimensional data binning method in R with a heatmap-style graphical output to facilitate analysis and visualization of whole genome architecture. The method is flexible, combining whole genome architecture heatmaps with scatter plots of the genomic environment of selected gene sets. This enables analysis of specific values associated with genes such as gene expression and sequence polymorphisms, according to genome architecture. This method enables the investigation of whole genome architecture and reveals local properties of genomic neighborhoods in a clear and concise manner.

  9. Eukaryotic beta-alanine synthases are functionally related but have a high degree of structural diversity.

    Science.gov (United States)

    Gojković, Z; Sandrini, M P; Piskur, J

    2001-01-01

    beta-Alanine synthase (EC 3.5.1.6), which catalyzes the final step of pyrimidine catabolism, has only been characterized in mammals. A Saccharomyces kluyveri pyd3 mutant that is unable to grow on N-carbamyl-beta-alanine as the sole nitrogen source and exhibits diminished beta-alanine synthase activity was used to clone analogous genes from different eukaryotes. Putative PYD3 sequences from the yeast S. kluyveri, the slime mold Dictyostelium discoideum, and the fruit fly Drosophila melanogaster complemented the pyd3 defect. When the S. kluyveri PYD3 gene was expressed in S. cerevisiae, which has no pyrimidine catabolic pathway, it enabled growth on N-carbamyl-beta-alanine as the sole nitrogen source. The D. discoideum and D. melanogaster PYD3 gene products are similar to mammalian beta-alanine synthases. In contrast, the S. kluyveri protein is quite different from these and more similar to bacterial N-carbamyl amidohydrolases. All three beta-alanine synthases are to some degree related to various aspartate transcarbamylases, which catalyze the second step of the de novo pyrimidine biosynthetic pathway. PYD3 expression in yeast seems to be inducible by dihydrouracil and N-carbamyl-beta-alanine, but not by uracil. This work establishes S. kluyveri as a model organism for studying pyrimidine degradation and beta-alanine production in eukaryotes. PMID:11454750

  10. Contemplating effects of genomic structural variation.

    Science.gov (United States)

    Buchanan, Janet A; Scherer, Stephen W

    2008-09-01

    Two developments have sparked new directions in the genetics-to-genomics transition for research and medical applications: the advance of whole-genome assays by array or DNA sequencing technologies, and the discovery among human genomes of extensive submicroscopic genomic structural variation, including copy number variation. For health care to benefit from interpretation of genomic data, we need to know how these variants contribute to the phenotype of the individual. Research is revealing the spectrum, both in size and complexity, of structural genotypic variation, and its association with a broad range of human phenotypes. Genomic disorders associated with relatively large, recurrent contiguous variants have been recognized for some time, as have certain Mendelian traits associated with functional disruption of single genes by structural variation. More recent examples from phenotype- and genotype-driven studies demonstrate a greater level of complexity, with evidence of incremental dosage effects, gene interaction networks, buffering and modifiers, and position effects. Mechanisms underlying such variation are emerging to provide a handle on the bulk of human variation, which is associated with complex traits and adaptive potential. Interpreting genotypes for personalized health care and communicating knowledge to the individual will be significant challenges for genomics professionals.

  11. Discovery of novel DENN proteins: implications for the evolution of eukaryotic intracellular membrane structures and human disease

    Directory of Open Access Journals (Sweden)

    Dapeng eZhang

    2012-12-01

    Full Text Available The tripartite DENN module, comprised of a N-terminal longin domain, followed by DENN and d-DENN domains, is a GDP-GTP exchange factor (GEFs for Rab GTPases, which are regulators of practically all membrane trafficking events in eukaryotes. Using sequence and structure analysis we identify multiple novel homologs of the DENN module, many of which can be traced back to the ancestral eukaryote. These findings provide unexpected leads regarding key cellular processes such as autophagy, vesicle-vacuole interactions, chromosome segregation and human disease. Of these, SMCR8, the folliculin interacting protein-1 and 2 (FNIP1 and FNIP2, nitrogen permease regulator 2 (NPR2 and NPR3 are proposed to function in recruiting Rab GTPases during different steps of autophagy, fusion of autophagosomes with the vacuole and regulation of cellular metabolism. Another novel DENN protein identified in this study is C9ORF72; expansions of the hexanucleotide GGGGCC in its first intron have been recently implicated in amyotrophic lateral sclerosis (ALS and fronto-temporal dementia (FTD. While this mutation is proposed to cause a RNA-level defect, the identification of C9ORF72 as a potential DENN-type GEF raises the possibility that at least part of the pathology might relate to a specific Rab-dependent vesicular trafficking process, as has been observed in the case of some other neurological conditions with similar phenotypes. We present evidence that the longin domain, such as those found in the DENN module, are likely to have been ultimately derived from the related domains found in prokaryotic GTPase-activating proteins of MglA-like GTPases. Thus, the origin of the longin domains from this ancient GTPase-interacting domain, concomitant with the radiation of GTPases, especially of the Rab clade, played an important role in the dynamics of eukaryotic intracellular membrane systems.

  12. Discovery of Novel DENN Proteins: Implications for the Evolution of Eukaryotic Intracellular Membrane Structures and Human Disease.

    Science.gov (United States)

    Zhang, Dapeng; Iyer, Lakshminarayan M; He, Fang; Aravind, L

    2012-01-01

    The tripartite DENN module, comprised of a N-terminal longin domain, followed by DENN, and d-DENN domains, is a GDP-GTP exchange factor (GEFs) for Rab GTPases, which are regulators of practically all membrane trafficking events in eukaryotes. Using sequence and structure analysis we identify multiple novel homologs of the DENN module, many of which can be traced back to the ancestral eukaryote. These findings provide unexpected leads regarding key cellular processes such as autophagy, vesicle-vacuole interactions, chromosome segregation, and human disease. Of these, SMCR8, the folliculin interacting protein-1 and 2 (FNIP1 and FNIP2), nitrogen permease regulator 2 (NPR2), and NPR3 are proposed to function in recruiting Rab GTPases during different steps of autophagy, fusion of autophagosomes with the vacuole and regulation of cellular metabolism. Another novel DENN protein identified in this study is C9ORF72; expansions of the hexanucleotide GGGGCC in its first intron have been recently implicated in amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia (FTD). While this mutation is proposed to cause a RNA-level defect, the identification of C9ORF72 as a potential DENN-type GEF raises the possibility that at least part of the pathology might relate to a specific Rab-dependent vesicular trafficking process, as has been observed in the case of some other neurological conditions with similar phenotypes. We present evidence that the longin domain, such as those found in the DENN module, are likely to have been ultimately derived from the related domains found in prokaryotic GTPase-activating proteins of MglA-like GTPases. Thus, the origin of the longin domains from this ancient GTPase-interacting domain, concomitant with the radiation of GTPases, especially of the Rab clade, played an important role in the dynamics of eukaryotic intracellular membrane systems.

  13. Endosymbiosis and Eukaryotic Cell Evolution.

    Science.gov (United States)

    Archibald, John M

    2015-10-05

    Understanding the evolution of eukaryotic cellular complexity is one of the grand challenges of modern biology. It has now been firmly established that mitochondria and plastids, the classical membrane-bound organelles of eukaryotic cells, evolved from bacteria by endosymbiosis. In the case of mitochondria, evidence points very clearly to an endosymbiont of α-proteobacterial ancestry. The precise nature of the host cell that partnered with this endosymbiont is, however, very much an open question. And while the host for the cyanobacterial progenitor of the plastid was undoubtedly a fully-fledged eukaryote, how - and how often - plastids moved from one eukaryote to another during algal diversification is vigorously debated. In this article I frame modern views on endosymbiotic theory in a historical context, highlighting the transformative role DNA sequencing played in solving early problems in eukaryotic cell evolution, and posing key unanswered questions emerging from the age of comparative genomics. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint.

    Science.gov (United States)

    Marsden, Russell L; Lewis, Tony A; Orengo, Christine A

    2007-03-09

    Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterized families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterized domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.

  15. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

    Directory of Open Access Journals (Sweden)

    Orengo Christine A

    2007-03-01

    Full Text Available Abstract Background Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI, the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. Results In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. Conclusion This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.

  16. Structures of eukaryotic ribosomal stalk proteins and its complex with trichosanthin, and their implications in recruiting ribosome-inactivating proteins to the ribosomes.

    Science.gov (United States)

    Choi, Andrew K H; Wong, Eddie C K; Lee, Ka-Ming; Wong, Kam-Bo

    2015-02-25

    Ribosome-inactivating proteins (RIP) are RNA N-glycosidases that inactivate ribosomes by specifically depurinating a conserved adenine residue at the α-sarcin/ricin loop of 28S rRNA. Recent studies have pointed to the involvement of the C-terminal domain of the eukaryotic stalk proteins in facilitating the toxic action of RIPs. This review highlights how structural studies of eukaryotic stalk proteins provide insights into the recruitment of RIPs to the ribosomes. Since the C-terminal domain of eukaryotic stalk proteins is involved in specific recognition of elongation factors and some eukaryote-specific RIPs (e.g., trichosanthin and ricin), we postulate that these RIPs may have evolved to hijack the translation-factor-recruiting function of ribosomal stalk in reaching their target site of rRNA.

  17. Bacterial eukaryotic type serine-threonine protein kinases: from structural biology to targeted anti-infective drug design.

    Science.gov (United States)

    Danilenko, Valery N; Osolodkin, Dmitry I; Lakatosh, Sergey A; Preobrazhenskaya, Maria N; Shtil, Alexander A

    2011-01-01

    Signaling through protein kinases is an evolutionary conserved, widespread language of biological regulation. The eukaryotic type serine-threonine protein kinases (STPKs) found in normal human microbiote and in pathogenic bacteria play a key role in regulation of microbial survival, virulence and pathogenicity. Therefore, down-regulation of bacterial STPKs emerges as an attractive approach to cure infections. In this review we focused on actinobacterial STPKs to demonstrate that these enzymes can be used for crystal structure studies, modeling of 3D structure, construction of test systems and design of novel chemical libraries of low molecule as weight inhibitors. In particular, the prototypic pharmacological antagonists of Mycobacterium tuberculosis STPKs are perspective for development of a novel generation of drugs to combat the socially important disease. These inhibitors may modulate both actinobacterial and host STPKs and trigger programmed death of pathogenic bacteria.

  18. Origins and evolution of viruses of eukaryotes: The ultimate modularity

    Energy Technology Data Exchange (ETDEWEB)

    Koonin, Eugene V., E-mail: koonin@ncbi.nlm.nih.gov [National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 (United States); Dolja, Valerian V., E-mail: doljav@science.oregonstate.edu [Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331 (United States); Krupovic, Mart, E-mail: krupovic@pasteur.fr [Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Department of Microbiology, Paris 75015 (France)

    2015-05-15

    Viruses and other selfish genetic elements are dominant entities in the biosphere, with respect to both physical abundance and genetic diversity. Various selfish elements parasitize on all cellular life forms. The relative abundances of different classes of viruses are dramatically different between prokaryotes and eukaryotes. In prokaryotes, the great majority of viruses possess double-stranded (ds) DNA genomes, with a substantial minority of single-stranded (ss) DNA viruses and only limited presence of RNA viruses. In contrast, in eukaryotes, RNA viruses account for the majority of the virome diversity although ssDNA and dsDNA viruses are common as well. Phylogenomic analysis yields tangible clues for the origins of major classes of eukaryotic viruses and in particular their likely roots in prokaryotes. Specifically, the ancestral genome of positive-strand RNA viruses of eukaryotes might have been assembled de novo from genes derived from prokaryotic retroelements and bacteria although a primordial origin of this class of viruses cannot be ruled out. Different groups of double-stranded RNA viruses derive either from dsRNA bacteriophages or from positive-strand RNA viruses. The eukaryotic ssDNA viruses apparently evolved via a fusion of genes from prokaryotic rolling circle-replicating plasmids and positive-strand RNA viruses. Different families of eukaryotic dsDNA viruses appear to have originated from specific groups of bacteriophages on at least two independent occasions. Polintons, the largest known eukaryotic transposons, predicted to also form virus particles, most likely, were the evolutionary intermediates between bacterial tectiviruses and several groups of eukaryotic dsDNA viruses including the proposed order “Megavirales” that unites diverse families of large and giant viruses. Strikingly, evolution of all classes of eukaryotic viruses appears to have involved fusion between structural and replicative gene modules derived from different sources

  19. The cancer genome: from structure to function.

    Science.gov (United States)

    Geurts van Kessel, Ad

    2014-06-01

    The 2014 joint meeting of the International Society for Cellular Oncology (ISCO) and the European Workshop on Cytogenetics and Molecular Genetics of Solid Tumors (EWCMST), organized by Nick Gilbert, Juan Cigudosa and Bauke Ylstra, was held from 11 to 14 May in Malaga, Spain. Since the previous meeting in 2012, the ever increasing availability of new sequencing technologies has enabled the analysis of cancer genomes at an increasingly greater detail. In addition to structural changes in the genome (i.e., translocations, deletions, amplifications), frequent mutations in important regulatory genes have been found to occur, as also frequent alterations in a large number of epigenetic factors. The challenge now is to relate structural changes in cancer genomes to the underlying disease mechanisms and to reveal opportunities for the design of novel (targeted) therapies. During the meeting, various topics related to these challenges and opportunities were addressed, including those dealing with functional genomics, genome instability, biomarkers and diagnostics, cancer genetics and epigenomics. Special attention was paid to therapy-driven cancer evolution (keynote lecture) and relationships between DNA repair, cancer and ageing (Prof. Ploem lecture). Based on the information presented at the meeting, several aspects of the cancer genome and its functional implications are provided in this report.

  20. Genome structural variation discovery and genotyping

    OpenAIRE

    Alkan, Can; Coe, Bradley P.; Eichler, Evan E.

    2011-01-01

    Comparisons of human genomes show that more base pairs are altered as a result of structural variation — including copy number variation — than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some glo...

  1. Structural genomics: bridging functional genomics and structure-based drug design.

    Science.gov (United States)

    Buchanan, Sean G

    2002-05-01

    Considerable advances in structural genomics have been witnessed in the last year. Several pilot studies have begun to report their initial results, and new centers have been funded to join the endeavor. The legacies of the genome sequencing efforts, namely high-throughput molecular biology and whole-organism genome sequences, have been integrated as front-end modules for structural genomics pipelines. Impressive advances have been made in NMR spectroscopy and X-ray crystallography. New methods in structural bioinformatics and computational chemistry have been published that provide the means to exploit the wealth of new information in drug discovery. Not surprisingly, the biopharmaceutical industry has been quick to recognize the benefits of these new developments and has begun to adopt them. This article reviews recent results from structural genomics initiatives and the potential applications of new information and technologies in the drug discovery process.

  2. Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  3. Universal internucleotide statistics in full genomes: a footprint of the DNA structure and packaging?

    Science.gov (United States)

    Bogachev, Mikhail I; Kayumov, Airat R; Bunde, Armin

    2014-01-01

    Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same [Formula: see text]-exponential form. While in prokaryotes a single [Formula: see text]-exponential function makes the best fit, in eukaryotes the PDF contains additionally a second [Formula: see text]-exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first [Formula: see text]-exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second [Formula: see text]-exponential is a specific marker of the large-scale eukaryotic DNA organization.

  4. Universal internucleotide statistics in full genomes: a footprint of the DNA structure and packaging?

    Directory of Open Access Journals (Sweden)

    Mikhail I Bogachev

    Full Text Available Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same [Formula: see text]-exponential form. While in prokaryotes a single [Formula: see text]-exponential function makes the best fit, in eukaryotes the PDF contains additionally a second [Formula: see text]-exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first [Formula: see text]-exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second [Formula: see text]-exponential is a specific marker of the large-scale eukaryotic DNA organization.

  5. Genome structural variation discovery and genotyping.

    Science.gov (United States)

    Alkan, Can; Coe, Bradley P; Eichler, Evan E

    2011-05-01

    Comparisons of human genomes show that more base pairs are altered as a result of structural variation - including copy number variation - than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.

  6. Gene Composer in a structural genomics environment.

    Science.gov (United States)

    Lorimer, Don; Raymond, Amy; Mixon, Mark; Burgin, Alex; Staker, Bart; Stewart, Lance

    2011-09-01

    The structural genomics effort at the Seattle Structural Genomics Center for Infectious Disease (SSGCID) requires the manipulation of large numbers of amino-acid sequences and the underlying DNA sequences which are to be cloned into expression vectors. To improve efficiency in high-throughput protein structure determination, a database software package, Gene Composer, has been developed which facilitates the information-rich design of protein constructs and their underlying gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bioinformatics steps used in modern structure-guided protein engineering and synthetic gene engineering. An example of the structure determination of H1N1 RNA-dependent RNA polymerase PB2 subunit is given.

  7. Structure of Mth11/Mth Rpp29, an essential protein subunit of archaeal and eukaryotic RNase P.

    Science.gov (United States)

    Boomershine, William P; McElroy, Craig A; Tsai, Hsin-Yue; Wilson, Ross C; Gopalan, Venkat; Foster, Mark P

    2003-12-23

    We have determined the solution structure of Mth11 (Mth Rpp29), an essential subunit of the RNase P enzyme from the archaebacterium Methanothermobacter thermoautotrophicus (Mth). RNase P is a ubiquitous ribonucleoprotein enzyme primarily responsible for cleaving the 5' leader sequence during maturation of tRNAs in all three domains of life. In eubacteria, this enzyme is made up of two subunits: a large RNA ( approximately 120 kDa) responsible for mediating catalysis, and a small protein cofactor ( approximately 15 kDa) that modulates substrate recognition and is required for efficient in vivo catalysis. In contrast, multiple proteins are associated with eukaryotic and archaeal RNase P, and these proteins exhibit no recognizable homology to the conserved bacterial protein subunit. In reconstitution experiments with recombinantly expressed and purified protein subunits, we found that Mth Rpp29, a homolog of the Rpp29 protein subunit from eukaryotic RNase P, is an essential protein component of the archaeal holoenzyme. Consistent with its role in mediating protein-RNA interactions, we report that Mth Rpp29 is a member of the oligonucleotide/oligosaccharide binding fold family. In addition to a structured beta-barrel core, it possesses unstructured N- and C-terminal extensions bearing several highly conserved amino acid residues. To identify possible RNA contacts in the protein-RNA complex, we examined the interaction of the 11-kDa protein with the full 100-kDa Mth RNA subunit by using NMR chemical shift perturbation. Our findings represent a critical step toward a structural model of the RNase P holoenzyme from archaebacteria and higher organisms.

  8. ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes.

    Science.gov (United States)

    Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

    2009-01-01

    We have developed ExDom, a unique database for the comparative analysis of the exon-intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon-intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon-intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon-intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/.

  9. Consistent mutational paths predict eukaryotic thermostability

    Directory of Open Access Journals (Sweden)

    van Noort Vera

    2013-01-01

    Full Text Available Abstract Background Proteomes of thermophilic prokaryotes have been instrumental in structural biology and successfully exploited in biotechnology, however many proteins required for eukaryotic cell function are absent from bacteria or archaea. With Chaetomium thermophilum, Thielavia terrestris and Thielavia heterothallica three genome sequences of thermophilic eukaryotes have been published. Results Studying the genomes and proteomes of these thermophilic fungi, we found common strategies of thermal adaptation across the different kingdoms of Life, including amino acid biases and a reduced genome size. A phylogenetics-guided comparison of thermophilic proteomes with those of other, mesophilic Sordariomycetes revealed consistent amino acid substitutions associated to thermophily that were also present in an independent lineage of thermophilic fungi. The most consistent pattern is the substitution of lysine by arginine, which we could find in almost all lineages but has not been extensively used in protein stability engineering. By exploiting mutational paths towards the thermophiles, we could predict particular amino acid residues in individual proteins that contribute to thermostability and validated some of them experimentally. By determining the three-dimensional structure of an exemplar protein from C. thermophilum (Arx1, we could also characterise the molecular consequences of some of these mutations. Conclusions The comparative analysis of these three genomes not only enhances our understanding of the evolution of thermophily, but also provides new ways to engineer protein stability.

  10. A eukaryotic genome of 660 kb: electrophoretic karyotype of nucleomorph and cell nucleus of the cryptomonad alga, Pyrenomonas salina.

    OpenAIRE

    Eschbach, S.; Hofmann, C J; Maier, U. G.; Sitte, P; Hansmann, P.

    1991-01-01

    Cryptomonads are unicellular algae with chloroplasts surrounded by four membranes. Between the inner and the outer pairs of membranes is a narrow plasmatic compartment which contains a nucleus-like organelle called the nucleomorph. Using pulsed field gel electrophoresis it is shown that the nucleomorph of the cryptomonad Pyrenomonas salina contains three linear chromosomes of 195 kb, 225 kb and 240 kb all of which encode rRNAs. Thus, this vestigial nucleus has a haploid genome size of 660 kb,...

  11. Using Genomics for Natural Product Structure Elucidation.

    Science.gov (United States)

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  12. The tuberculosis structural genomics consortium: a structural genomics approach to drug discovery.

    Science.gov (United States)

    Musa, Tracey L; Ioerger, Thomas R; Sacchettini, James C

    2009-01-01

    Structural genomics is changing the way we study and understand biological systems, providing insight into the biology and life cycle of an organism at the molecular level through determination of protein structures. Structural genomics can be a particularly useful tool in the study of infectious diseases, especially to facilitate the development of new chemotherapeutics by providing a structural foundation for drug discovery. The Tuberculosis Structural Genomics Consortium (TBSGC) is applying a structural genomics approach to solving the structures of biologically and medically important proteins in the pathogenic bacterium Mycobacterium tuberculosis, adding to the scientific knowledge base essential for developing novel and effective antitubercular drugs. Tuberculosis (TB) has been declared a global health emergency by the World Health Organization (WHO). With the rise in the number of multidrug resistant (MDR) and extensively drug resistant (XDR) TB strains, the need for more effective TB treatments has become urgent. In contrast to other structural genomics projects, the TBSGC specifically prioritizes proteins based on their potential as drug targets. We describe the consortium's high-throughput (HT) structure determination pipeline that enables an efficient distribution of resources while also incorporating knowledge from several scientific fields. The success of this pipeline is illustrated in the number of successful structure solutions as demonstrated in the case studies presented in this chapter. Copyright 2009 Elsevier Inc. All rights reserved.

  13. Exploiting Genome Structure in Association Analysis

    Science.gov (United States)

    Kim, Seyoung

    2014-01-01

    Abstract A genome-wide association study involves examining a large number of single-nucleotide polymorphisms (SNPs) to identify SNPs that are significantly associated with the given phenotype, while trying to reduce the false positive rate. Although haplotype-based association methods have been proposed to accommodate correlation information across nearby SNPs that are in linkage disequilibrium, none of these methods directly incorporated the structural information such as recombination events along chromosome. In this paper, we propose a new approach called stochastic block lasso for association mapping that exploits prior knowledge on linkage disequilibrium structure in the genome such as recombination rates and distances between adjacent SNPs in order to increase the power of detecting true associations while reducing false positives. Following a typical linear regression framework with the genotypes as inputs and the phenotype as output, our proposed method employs a sparsity-enforcing Laplacian prior for the regression coefficients, augmented by a first-order Markov process along the sequence of SNPs that incorporates the prior information on the linkage disequilibrium structure. The Markov-chain prior models the structural dependencies between a pair of adjacent SNPs, and allows us to look for association SNPs in a coupled manner, combining strength from multiple nearby SNPs. Our results on HapMap-simulated datasets and mouse datasets show that there is a significant advantage in incorporating the prior knowledge on linkage disequilibrium structure for marker identification under whole-genome association. PMID:21548809

  14. Evaluating protein structures determined by structural genomics consortia.

    Science.gov (United States)

    Bhattacharya, Aneerban; Tejero, Roberto; Montelione, Gaetano T

    2007-03-01

    Structural genomics projects are providing large quantities of new 3D structural data for proteins. To monitor the quality of these data, we have developed the protein structure validation software suite (PSVS), for assessment of protein structures generated by NMR or X-ray crystallographic methods. PSVS is broadly applicable for structure quality assessment in structural biology projects. The software integrates under a single interface analyses from several widely-used structure quality evaluation tools, including PROCHECK (Laskowski et al., J Appl Crystallog 1993;26:283-291), MolProbity (Lovell et al., Proteins 2003;50:437-450), Verify3D (Luthy et al., Nature 1992;356:83-85), ProsaII (Sippl, Proteins 1993;17: 355-362), the PDB validation software, and various structure-validation tools developed in our own laboratory. PSVS provides standard constraint analyses, statistics on goodness-of-fit between structures and experimental data, and knowledge-based structure quality scores in standardized format suitable for database integration. The analysis provides both global and site-specific measures of protein structure quality. Global quality measures are reported as Z scores, based on calibration with a set of high-resolution X-ray crystal structures. PSVS is particularly useful in assessing protein structures determined by NMR methods, but is also valuable for assessing X-ray crystal structures or homology models. Using these tools, we assessed protein structures generated by the Northeast Structural Genomics Consortium and other international structural genomics projects, over a 5-year period. Protein structures produced from structural genomics projects exhibit quality score distributions similar to those of structures produced in traditional structural biology projects during the same time period. However, while some NMR structures have structure quality scores similar to those seen in higher-resolution X-ray crystal structures, the majority of NMR structures

  15. Production and characterization of novel recombinant adeno-associated virus replicative-form genomes: a eukaryotic source of DNA for gene transfer.

    Directory of Open Access Journals (Sweden)

    Lina Li

    Full Text Available Conventional non-viral gene transfer uses bacterial plasmid DNA containing antibiotic resistance genes, cis-acting bacterial sequence elements, and prokaryotic methylation patterns that may adversely affect transgene expression and vector stability in vivo. Here, we describe novel replicative forms of a eukaryotic vector DNA that consist solely of an expression cassette flanked by adeno-associated virus (AAV inverted terminal repeats. Extensive structural analyses revealed that this AAV-derived vector DNA consists of linear, duplex molecules with covalently closed ends (termed closed-ended, linear duplex, or "CELiD", DNA. CELiD vectors, produced in Sf9 insect cells, require AAV rep gene expression for amplification. Amounts of CELiD DNA produced from insect cell lines stably transfected with an ITR-flanked transgene exceeded 60 mg per 5 × 10(9 Sf9 cells, and 1-15 mg from a comparable number of parental Sf9 cells in which the transgene was introduced via recombinant baculovirus infection. In mice, systemically delivered CELiD DNA resulted in long-term, stable transgene expression in the liver. CELiD vectors represent a novel eukaryotic alternative to bacterial plasmid DNA.

  16. High-throughput crystallography for structural genomics.

    Science.gov (United States)

    Joachimiak, Andrzej

    2009-10-01

    Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now more than 55000 protein structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal, and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact.

  17. Structural genomics on membrane proteins: mini review.

    Science.gov (United States)

    Lundstrom, K

    2004-08-01

    Structural genomics, structure-based analysis of gene products, has so far mainly concentrated on soluble proteins because of their less demanding requirements for overexpression, purification and crystallisation compared to membrane proteins. This so-called "low-hanging fruit" approach has generated more than 25,000 structures deposited in databases. In contrast, the substantially more complex membrane proteins, in relation to all steps from overexpression to high-resolution structure determination, represent less than 1% of available crystal structures. This is in sharp contrast to the importance of this type of proteins, particularly G protein-coupled receptors (GPCRs), as today 60-70% of the current drug targets are based on membrane proteins. The key to improved success with membrane protein structural elucidation is technology development. The most efficient approach constitutes parallel studies on a large number of targets and evaluation of various systems for expression. Next, high throughput format solubilisation and refolding screening methods for a wide range of detergents and additives in numerous concentrations should be established. Today, several networks dealing with structural genomics approaches of membrane proteins have been initiated, among them the Membrane Protein Network (MePNet) programme that deals with the pharmaceutically important mammalian GPCRs. In MePNet, three overexpression systems have been employed for the evaluation of 101 GPCRs, which has generated large quantities of numerous recombinant GPCRs, compatible for structural biology applications.

  18. High-throughput Crystallography for Structural Genomics

    Science.gov (United States)

    Joachimiak, Andrzej

    2009-01-01

    Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now over 53,000 proteins structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact. PMID:19765976

  19. Crystal Structure of a Eukaryotic GEN1 Resolving Enzyme Bound to DNA.

    Science.gov (United States)

    Liu, Yijin; Freeman, Alasdair D J; Déclais, Anne-Cécile; Wilson, Timothy J; Gartner, Anton; Lilley, David M J

    2015-12-22

    We present the crystal structure of the junction-resolving enzyme GEN1 bound to DNA at 2.5 Å resolution. The structure of the GEN1 protein reveals it to have an elaborated FEN-XPG family fold that is modified for its role in four-way junction resolution. The functional unit in the crystal is a monomer of active GEN1 bound to the product of resolution cleavage, with an extensive DNA binding interface for both helical arms. Within the crystal lattice, a GEN1 dimer interface juxtaposes two products, whereby they can be reconnected into a four-way junction, the structure of which agrees with that determined in solution. The reconnection requires some opening of the DNA structure at the center, in agreement with permanganate probing and 2-aminopurine fluorescence. The structure shows that a relaxation of the DNA structure accompanies cleavage, suggesting how second-strand cleavage is accelerated to ensure productive resolution of the junction. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  20. Crystal Structure of a Eukaryotic GEN1 Resolving Enzyme Bound to DNA

    Directory of Open Access Journals (Sweden)

    Yijin Liu

    2015-12-01

    Full Text Available We present the crystal structure of the junction-resolving enzyme GEN1 bound to DNA at 2.5 Å resolution. The structure of the GEN1 protein reveals it to have an elaborated FEN-XPG family fold that is modified for its role in four-way junction resolution. The functional unit in the crystal is a monomer of active GEN1 bound to the product of resolution cleavage, with an extensive DNA binding interface for both helical arms. Within the crystal lattice, a GEN1 dimer interface juxtaposes two products, whereby they can be reconnected into a four-way junction, the structure of which agrees with that determined in solution. The reconnection requires some opening of the DNA structure at the center, in agreement with permanganate probing and 2-aminopurine fluorescence. The structure shows that a relaxation of the DNA structure accompanies cleavage, suggesting how second-strand cleavage is accelerated to ensure productive resolution of the junction.

  1. Practical applications of structural genomics technologies for mutagen research.

    Science.gov (United States)

    Zemla, Adam; Segelke, Brent W

    2011-06-17

    Here we present a perspective on a range of practical uses of structural genomics for mutagen research. Structural genomics is an overloaded term and requires some definition to bound the discussion; we give a brief description of public and private structural genomics endeavors, along with some of their objectives, their activities, their capabilities, and their limitations. We discuss how structural genomics might impact mutagen research in three different scenarios: at a structural genomics center, at a lab with modest resources that also conducts structural biology research, and at a lab that is conducting mutagen research without in-house experimental structural biology. Applications span functional annotation of single genes or SNP, to constructing gene networks and pathways, to an integrated systems biology approach. Structural genomics centers can take advantage of systems biology models to target high value targets for structure determination and in turn extend systems models to better understand systems biology diseases or phenomenon. Individual investigator run structural biology laboratories can collaborate with structural genomics centers, but can also take advantage of technical advances and tools developed by structural genomics centers and can employ a structural genomics approach to advancing biological understanding. Individual investigator-run non-structural biology laboratories can also collaborate with structural genomics centers, possibly influencing targeting decisions, but can also use structure based annotation tools enabled by the growing coverage of protein fold space provided by structural genomics. Better functional annotation can inform pathway and systems biology models. Copyright © 2011 Elsevier B.V. All rights reserved.

  2. The quality and validation of structures from structural genomics.

    Science.gov (United States)

    Domagalski, Marcin J; Zheng, Heping; Zimmerman, Matthew D; Dauter, Zbigniew; Wlodawer, Alexander; Minor, Wladek

    2014-01-01

    Quality control of three-dimensional structures of macromolecules is a critical step to ensure the integrity of structural biology data, especially those produced by structural genomics centers. Whereas the Protein Data Bank (PDB) has proven to be a remarkable success overall, the inconsistent quality of structures reveals a lack of universal standards for structure/deposit validation. Here, we review the state-of-the-art methods used in macromolecular structure validation, focusing on validation of structures determined by X-ray crystallography. We describe some general protocols used in the rebuilding and re-refinement of problematic structural models. We also briefly discuss some frontier areas of structure validation, including refinement of protein-ligand complexes, automation of structure redetermination, and the use of NMR structures and computational models to solve X-ray crystal structures by molecular replacement.

  3. How often do they have sex? A comparative analysis of the population structure of seven eukaryotic microbial pathogens.

    Directory of Open Access Journals (Sweden)

    Nicolás Tomasini

    Full Text Available The model of predominant clonal evolution (PCE proposed for micropathogens does not state that genetic exchange is totally absent, but rather, that it is too rare to break the prevalent PCE pattern. However, the actual impact of this "residual" genetic exchange should be evaluated. Multilocus Sequence Typing (MLST is an excellent tool to explore the problem. Here, we compared online available MLST datasets for seven eukaryotic microbial pathogens: Trypanosoma cruzi, the Fusarium solani complex, Aspergillus fumigatus, Blastocystis subtype 3, the Leishmania donovani complex, Candida albicans and Candida glabrata. We first analyzed phylogenetic relationships among genotypes within each dataset. Then, we examined different measures of branch support and incongruence among loci as signs of genetic structure and levels of past recombination. The analyses allow us to identify three types of genetic structure. The first was characterized by trees with well-supported branches and low levels of incongruence suggesting well-structured populations and PCE. This was the case for the T. cruzi and F. solani datasets. The second genetic structure, represented by Blastocystis spp., A. fumigatus and the L. donovani complex datasets, showed trees with weakly-supported branches but low levels of incongruence among loci, whereby genetic structuration was not clearly defined by MLST. Finally, trees showing weakly-supported branches and high levels of incongruence among loci were observed for Candida species, suggesting that genetic exchange has a higher evolutionary impact in these mainly clonal yeast species. Furthermore, simulations showed that MLST may fail to show right clustering in population datasets even in the absence of genetic exchange. In conclusion, these results make it possible to infer variable impacts of genetic exchange in populations of predominantly clonal micro-pathogens. Moreover, our results reveal different problems of MLST to determine the

  4. National Academy of Sciences and Academy of Sciences of the USSR workshop on structure of the eucaryotic genome and regulation of its expression

    Energy Technology Data Exchange (ETDEWEB)

    1990-01-01

    This report provides a brief overview of the Workshop on Structure of the Eukaryotic Genome and Regulation of its Expression held in Tbilisi, Georgia, USSR. The report describes the presentations made at the meeting but also goes on to describe the state of molecular biology and genetics research in the Soviet Union and makes recommendations on how to improve future such meetings.

  5. National Academy of Sciences and Academy of Sciences of the USSR workshop on structure of the eucaryotic genome and regulation of its expression. Final report

    Energy Technology Data Exchange (ETDEWEB)

    1990-12-31

    This report provides a brief overview of the Workshop on Structure of the Eukaryotic Genome and Regulation of its Expression held in Tbilisi, Georgia, USSR. The report describes the presentations made at the meeting but also goes on to describe the state of molecular biology and genetics research in the Soviet Union and makes recommendations on how to improve future such meetings.

  6. The contribution of co-transcriptional RNA:DNA hybrid structures to DNA damage and genome instability

    Science.gov (United States)

    Hamperl, Stephan; Cimprich, Karlene A.

    2014-01-01

    Accurate DNA replication and DNA repair are crucial for the maintenance of genome stability, and it is generally accepted that failure of these processes is a major source of DNA damage in cells. Intriguingly, recent evidence suggests that DNA damage is more likely to occur at genomic loci with high transcriptional activity. Furthermore, loss of certain RNA processing factors in eukaryotic cells is associated with increased formation of co-transcriptional RNA:DNA hybrid structures known as R-loops, resulting in double-strand breaks (DSBs) and DNA damage. However, the molecular mechanisms by which R-loop structures ultimately lead to DNA breaks and genome instability is not well understood. In this review, we summarize the current knowledge about the formation, recognition and processing of RNA:DNA hybrids, and discuss possible mechanisms by which these structures contribute to DNA damage and genome instability in the cell. PMID:24746923

  7. The contribution of co-transcriptional RNA:DNA hybrid structures to DNA damage and genome instability.

    Science.gov (United States)

    Hamperl, Stephan; Cimprich, Karlene A

    2014-07-01

    Accurate DNA replication and DNA repair are crucial for the maintenance of genome stability, and it is generally accepted that failure of these processes is a major source of DNA damage in cells. Intriguingly, recent evidence suggests that DNA damage is more likely to occur at genomic loci with high transcriptional activity. Furthermore, loss of certain RNA processing factors in eukaryotic cells is associated with increased formation of co-transcriptional RNA:DNA hybrid structures known as R-loops, resulting in double-strand breaks (DSBs) and DNA damage. However, the molecular mechanisms by which R-loop structures ultimately lead to DNA breaks and genome instability is not well understood. In this review, we summarize the current knowledge about the formation, recognition and processing of RNA:DNA hybrids, and discuss possible mechanisms by which these structures contribute to DNA damage and genome instability in the cell. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Structural Studies of RNA Helicases Involved in Eukaryotic Pre-mRNA Splicing, Ribosome Biogenesis, and Translation Initiation

    DEFF Research Database (Denmark)

    He, Yangzi

    Ribonucleic acids (RNAs) take centre stage in gene expression. In eukaryotes, most RNAs are transcribed as precursors, and these precursors are co- or post-transcriptionally processed and assemble with particular proteins to form ribonucleoproteins (RNPs). Mature RNPs participate in various gene...... and ligates the neighbouring exons to generate mature mRNAs. Prp43 is an RNA helicase of the DEAH/RHA family. In yeast, once mRNAs are released, Prp43 catalyzes the disassembly of spliceosomes. The 18S, 5.8S and 25S rRNAs are transcribed as a single polycistronic transcript—the 35S pre-rRNA....... It is nucleolytically cleaved and chemically modified to generate mature rRNAs, which assemble with ribosomal proteins to form the ribosome. Prp43 is required for the processing of the 18S rRNA. Using X-ray crystallography, I determined a high resolution structure of Prp43 bound to ADP, the first structure of a DEAH...

  9. Structural genomics is the largest contributor of novel structural leverage.

    Science.gov (United States)

    Nair, Rajesh; Liu, Jinfeng; Soong, Ta-Tsen; Acton, Thomas B; Everett, John K; Kouranov, Andrei; Fiser, Andras; Godzik, Adam; Jaroszewski, Lukasz; Orengo, Christine; Montelione, Gaetano T; Rost, Burkhard

    2009-04-01

    The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today's UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849-851, 2007) has resulted from systematic targeting of large families. PSI's per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another approximately15 years to cover most sequences in the current UniProt database.

  10. Long-Range Correlations in Genomic DNA: A Signature of the Nucleosomal Structure

    Science.gov (United States)

    Audit, B.; Thermes, C.; Vaillant, C.; D'Aubenton-Carafa, Y.; Muzy, J. F.; Arneodo, A.

    2001-03-01

    We use the ``wavelet transform microscope'' to carry out a comparative statistical analysis of DNA bending profiles and of the corresponding DNA texts. In the three kingdoms, one reveals on both signals a characteristic scale of 100-200 bp that separates two different regimes of power-law correlations (PLC). In the small-scale regime, PLC are observed in eukaryotic, in double-strand DNA viral, and in archaeal genomes, which contrasts with their total absence in the genomes of eubacteria and their viruses. This strongly suggests that small-scale PLC are related to the mechanisms underlying the wrapping of DNA in the nucleosomal structure. We further speculate that the large scale PLC are the signature of the higher-order structure and dynamics of chromatin.

  11. The structural code of cyanobacterial genomes.

    Science.gov (United States)

    Lehmann, Robert; Machné, Rainer; Herzel, Hanspeter

    2014-08-01

    A periodic bias in nucleotide frequency with a period of about 11 bp is characteristic for bacterial genomes. This signal is commonly interpreted to relate to the helical pitch of negatively supercoiled DNA. Functions in supercoiling-dependent RNA transcription or as a 'structural code' for DNA packaging have been suggested. Cyanobacterial genomes showed especially strong periodic signals and, on the other hand, DNA supercoiling and supercoiling-dependent transcription are highly dynamic and underlie circadian rhythms of these phototrophic bacteria. Focusing on this phylum and dinucleotides, we find that a minimal motif of AT-tracts (AT2) yields the strongest signal. Strong genome-wide periodicity is ancestral to a clade of unicellular and polyploid species but lost upon morphological transitions into two baeocyte-forming and a symbiotic species. The signal is intermediate in heterocystous species and weak in monoploid picocyanobacteria. A pronounced 'structural code' may support efficient nucleoid condensation and segregation in polyploid cells. The major source of the AT2 signal are protein-coding regions, where it is encoded preferentially in the first and third codon positions. The signal shows only few relations to supercoiling-dependent and diurnal RNA transcription in Synechocystis sp. PCC 6803. Strong and specific signals in two distinct transposons suggest roles in transposase transcription and transpososome formation. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Comparative genome analysis of three eukaryotic parasites with differing abilities to transform leukocytes reveals key mediators of theileria-induced leukocyte transformation

    KAUST Repository

    Hayashida, Kyoko

    2012-09-04

    We sequenced the genome of Theileria orientalis, a tick-borne apicomplexan protozoan parasite of cattle. The focus of this study was a comparative genome analysis of T. orientalis relative to other highly pathogenic Theileria species, T. parva and T. annulata. T. parva and T. annulata induce transformation of infected cells of lymphocyte or macrophage/monocyte lineages; in contrast, T. orientalis does not induce uncontrolled proliferation of infected leukocytes and multiplies predominantly within infected erythrocytes. While synteny across homologous chromosomes of the three Theileria species was found to be well conserved overall, subtelomeric structures were found to differ substantially, as T. orientalis lacks the large tandemly arrayed subtelomere-encoded variable secreted protein-encoding gene family. Moreover, expansion of particular gene families by gene duplication was found in the genomes of the two transforming Theileria species, most notably, the TashAT/TpHN and Tar/Tpr gene families. Gene families that are present only in T. parva and T. annulata and not in T. orientalis, Babesia bovis, or Plasmo-dium were also identified. Identification of differences between the genome sequences of Theileria species with different abilities to transform and immortalize bovine leukocytes will provide insight into proteins and mechanisms that have evolved to induce and regulate this process. The T. orientalis genome database is available at http://totdb.czc.hokudai.ac.jp/. 2012 Hayashida et al. T.

  13. Structural genomics of histone tail recognition.

    Science.gov (United States)

    Wang, Minghua; Mok, Man Wai; Harper, Hong; Lee, Wen Hwa; Min, Jinrong; Knapp, Stefan; Oppermann, Udo; Marsden, Brian; Schapira, Matthieu

    2010-10-15

    The structural genomics of histone tail recognition web server is an open access resource that presents within mini articles all publicly available experimental structures of histone tails in complex with human proteins. Each article is composed of interactive 3D slides that dissect the structural mechanism underlying the recognition of specific sequences and histone marks. A concise text html-linked to interactive graphics guides the reader through the main features of the interaction. This resource can be used to analyze and compare binding modes across multiple histone recognition modules, to evaluate the chemical tractability of binding sites involved in epigenetic signaling and design small molecule inhibitors. http://www.thesgc.org/resources/histone_tails/ matthieu.schapira@utoronto.ca Supplementary data are available at Bioinformatics online.

  14. Structural and Functional Genomics of Tomato

    Science.gov (United States)

    Barone, Amalia; Chiusano, Maria Luisa; Ercolano, Maria Raffaella; Giuliano, Giovanni; Grandillo, Silvana; Frusciante, Luigi

    2008-01-01

    Tomato (Solanum lycopersicum L.) is the most intensively investigated Solanaceous species both in genetic and genomics studies. It is a diploid species with a haploid set of 12 chromosomes and a small genome (950 Mb). Based on the detailed knowledge on tomato structural genomics, the sequencing of the euchromatic regions started in the year 2005 as a common effort of different countries. The manuscript focuses on markers used for tomato, on mapping efforts mainly based on exploitation of natural biodiversity, and it gives an updated report on the international sequencing activities. The principal tools developed to explore the function of tomato genes are also summarized, including mutagenesis, genetic transformation, and transcriptome analysis. The current progress in bioinformatic strategies available to manage the overwhelming amount of data generated from different tomato “omics” approaches is reported, and emphasis is given to the effort of producing a computational workbench for the analysis of the organization, as well as the functionality and evolution of the Solanaceae family. PMID:18317508

  15. Metabolic symbiosis at the origin of eukaryotes.

    Science.gov (United States)

    López-Garćia, P; Moreira, D

    1999-03-01

    Thirty years after Margulis revived the endosymbiosis theory for the origin of mitochondria and chloroplasts, two novel symbiosis hypotheses for the origin of eukaryotes have been put forward. Both propose that eukaryotes arose through metabolic symbiosis (syntrophy) between eubacteria and methanogenic Archaea. They also propose that this was mediated by interspecies hydrogen transfer and that, initially, mitochondria were anaerobic. These hypotheses explain the mosaic character of eukaryotes (i.e. an archaeal-like genetic machinery and a eubacterial-like metabolism), as well as distinct eukaryotic characteristics (which are proposed to be products of symbiosis). Combined data from comparative genomics, microbial ecology and the fossil record should help to test their validity.

  16. Structural Molecular Components of Septate Junctions in Cnidarians Point to the Origin of Epithelial Junctions in Eukaryotes

    KAUST Repository

    Ganot, P.

    2014-09-21

    Septate junctions (SJs) insure barrier properties and control paracellular diffusion of solutes across epithelia in invertebrates. However, the origin and evolution of their molecular constituents in Metazoa have not been firmly established. Here, we investigated the genomes of early branching metazoan representatives to reconstruct the phylogeny of the molecular components of SJs. Although Claudins and SJ cytoplasmic adaptor components appeared successively throughout metazoan evolution, the structural components of SJs arose at the time of Placozoa/Cnidaria/Bilateria radiation. We also show that in the scleractinian coral Stylophora pistillata, the structural SJ component Neurexin IV colocalizes with the cortical actin network at the apical border of the cells, at the place of SJs. We propose a model for SJ components in Cnidaria. Moreover, our study reveals an unanticipated diversity of SJ structural component variants in cnidarians. This diversity correlates with gene-specific expression in calcifying and noncalcifying tissues, suggesting specific paracellular pathways across the cell layers of these diploblastic animals.

  17. The RCSB PDB information portal for structural genomics.

    Science.gov (United States)

    Kouranov, Andrei; Xie, Lei; de la Cruz, Joanna; Chen, Li; Westbrook, John; Bourne, Philip E; Berman, Helen M

    2006-01-01

    The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at http://sg.pdb.org. There are currently three components to this site: Structural Genomics Initiatives contains information and links on each structural genomics site, including progress reports, target lists, target status, targets in the PDB and level of sequence redundancy; Targets provides combined target information, protocols and other data associated with protein structure determination; and Structures offers an assessment of the progress of structural genomics based on the functional coverage of the human genome by PDB structures, structural genomics targets and homology models. Functional coverage can be examined according to enzyme classification, gene ontology (biological process, cell component and molecular function) and disease.

  18. Patterns of prokaryotic lateral gene transfers affecting parasitic microbial eukaryotes

    DEFF Research Database (Denmark)

    Alsmark, Cecilia; Foster, Peter G; Sicheritz-Pontén, Thomas

    2013-01-01

    BACKGROUND: The influence of lateral gene transfer on gene origins and biology in eukaryotes is poorly understood compared with those of prokaryotes. A number of independent investigations focusing on specific genes, individual genomes, or specific functional categories from various eukaryotes have...... indicated that lateral gene transfer does indeed affect eukaryotic genomes. However, the lack of common methodology and criteria in these studies makes it difficult to assess the general importance and influence of lateral gene transfer on eukaryotic genome evolution. RESULTS: We used a phylogenomic...... approach to systematically investigate lateral gene transfer affecting the proteomes of thirteen, mainly parasitic, microbial eukaryotes, representing four of the six eukaryotic super-groups. All of the genomes investigated have been significantly affected by prokaryote-to-eukaryote lateral gene transfers...

  19. LOCnet and LOCtarget: sub-cellular localization for structural genomics targets

    Science.gov (United States)

    Nair, Rajesh; Rost, Burkhard

    2004-01-01

    LOCtarget is a web server and database that predicts and annotates sub-cellular localization for structural genomics targets; LOCnet is one of the methods used in LOCtarget that can predict sub-cellular localization for all eukaryotic and prokaryotic proteins. Targets are taken from the central registration database for structural genomics, namely, TargetDB. LOCtarget predicts localization through a combination of four different methods: known nuclear localization signals (PredictNLS), homology-based transfer of experimental annotations (LOChom), inference through automatic text analysis of SWISS-PROT keywords (LOCkey) and de novo prediction through a system of neural networks (LOCnet). Additionally, we report predictions from SignalP. The final prediction is based on the method with the highest confidence. The web server can be used to predict sub-cellular localization of proteins from their amino acid sequence. The LOCtarget database currently contains localization predictions for all eukaryotic proteins from TargetDB and is updated every week. The server is available at http://www.rostlab.org/services/LOCtarget/. PMID:15215440

  20. Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

    OpenAIRE

    Deming, Laura; Targ, Sasha; Sauder, Nate; Almeida, Diogo; Ye, Chun Jimmie

    2016-01-01

    Each human genome is a 3 billion base pair set of encoding instructions. Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. As such, architectures that fit the structure of genomics should be learned not prescribed. Here, we develop a novel search algorithm, applicable across domains, that discovers an optimal architecture which simultaneously learns general genom...

  1. RNase MRP and the RNA processing cascade in the eukaryotic ancestor.

    Science.gov (United States)

    Woodhams, Michael D; Stadler, Peter F; Penny, David; Collins, Lesley J

    2007-02-08

    Within eukaryotes there is a complex cascade of RNA-based macromolecules that process other RNA molecules, especially mRNA, tRNA and rRNA. An example is RNase MRP processing ribosomal RNA (rRNA) in ribosome biogenesis. One hypothesis is that this complexity was present early in eukaryotic evolution; an alternative is that an initial simpler network later gained complexity by gene duplication in lineages that led to animals, fungi and plants. Recently there has been a rapid increase in support for the complexity-early theory because the vast majority of these RNA-processing reactions are found throughout eukaryotes, and thus were likely to be present in the last common ancestor of living eukaryotes, herein called the Eukaryotic Ancestor. We present an overview of the RNA processing cascade in the Eukaryotic Ancestor and investigate in particular, RNase MRP which was previously thought to have evolved later in eukaryotes due to its apparent limited distribution in fungi and animals and plants. Recent publications, as well as our own genomic searches, find previously unknown RNase MRP RNAs, indicating that RNase MRP has a wide distribution in eukaryotes. Combining secondary structure and promoter region analysis of RNAs for RNase MRP, along with analysis of the target substrate (rRNA), allows us to discuss this distribution in the light of eukaryotic evolution. We conclude that RNase MRP can now be placed in the RNA-processing cascade of the Eukaryotic Ancestor, highlighting the complexity of RNA-processing in early eukaryotes. Promoter analyses of MRP-RNA suggest that regulation of the critical processes of rRNA cleavage can vary, showing that even these key cellular processes (for which we expect high conservation) show some species-specific variability. We present our consensus MRP-RNA secondary structure as a useful model for further searches.

  2. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    Directory of Open Access Journals (Sweden)

    Brunham Robert C

    2004-07-01

    Full Text Available Abstract Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics.

  3. Structure of eukaryotic CMG helicase at a replication fork and implications to replisome architecture and origin initiation.

    Science.gov (United States)

    Georgescu, Roxana; Yuan, Zuanning; Bai, Lin; de Luna Almeida Santos, Ruda; Sun, Jingchuan; Zhang, Dan; Yurieva, Olga; Li, Huilin; O'Donnell, Michael E

    2017-01-31

    The eukaryotic CMG (Cdc45, Mcm2-7, GINS) helicase consists of the Mcm2-7 hexameric ring along with five accessory factors. The Mcm2-7 heterohexamer, like other hexameric helicases, is shaped like a ring with two tiers, an N-tier ring composed of the N-terminal domains, and a C-tier of C-terminal domains; the C-tier contains the motor. In principle, either tier could translocate ahead of the other during movement on DNA. We have used cryo-EM single-particle 3D reconstruction to solve the structure of CMG in complex with a DNA fork. The duplex stem penetrates into the central channel of the N-tier and the unwound leading single-strand DNA traverses the channel through the N-tier into the C-tier motor, 5'-3' through CMG. Therefore, the N-tier ring is pushed ahead by the C-tier ring during CMG translocation, opposite the currently accepted polarity. The polarity of the N-tier ahead of the C-tier places the leading Pol ε below CMG and Pol α-primase at the top of CMG at the replication fork. Surprisingly, the new N-tier to C-tier polarity of translocation reveals an unforeseen quality-control mechanism at the origin. Thus, upon assembly of head-to-head CMGs that encircle double-stranded DNA at the origin, the two CMGs must pass one another to leave the origin and both must remodel onto opposite strands of single-stranded DNA to do so. We propose that head-to-head motors may generate energy that underlies initial melting at the origin.

  4. The RCSB PDB information portal for structural genomics

    OpenAIRE

    Kouranov, Andrei; Xie, Lei; de la Cruz, Joanna; Chen, Li; Westbrook, John; Bourne, Philip E.; Berman, Helen M.

    2005-01-01

    The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at . There are currently three components to this site: Structural Genomics Initiatives contains information and links on each structural genomics site, including progress reports, target lists, target status, targets in the PDB and level of sequence redundancy; Targets provides combined target information, protocols and other da...

  5. Local chromatin structure of heterochromatin regulates repeated DNA stability, nucleolus structure, and genome integrity

    Energy Technology Data Exchange (ETDEWEB)

    Peng, Jamy C. [Univ. of California, Berkeley, CA (United States)

    2007-01-01

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  6. [Methylation of adenine residues in DNA of eukaryotes].

    Science.gov (United States)

    Baniushin, B F

    2005-01-01

    Like in bacteria, DNA in these organisms is subjected to enzymatic modification (methylation) both at adenine and cytosine residues. There is an indirect evidence that adenine DNA methylation takes place also in animals. In plants m6A was detected in total, mitochondrial and nuclear DNAs; in plants one and the same gene (DRM2) can be methylated both at adenine and cytosine residues. ORF homologous to bacterial adenine DNA-methyltransferases are present in nuclear DNA of protozoa, yeasts, insects, nematodes, higher plants, vertebrates and other eukaryotes. Thus, adenine DNA-methyltransferases can be found in the various evolutionary distant eukaryotes. First N6-adenine DNA-methyltransferase (wadmtase) of higher eukaryotes was isolated from vacuolar fraction of vesicles obtained from aging wheat coleoptiles; in the presence of S-adenosyl-L-methionine this Mg2+ -, Ca2+ -dependent enzyme de novo methylates first adenine residue in TGATCA sequence in single- and double-stranded DNA but it prefers single-stranded DNA structures. Adenine DNA methylation in eukaryotes seems to be involved in regulation of both gene expression and DNA replication including replication of mitochondrial DNA. It can control persistence of foreign DNA in a cell and seems to be an element of R-M system in plants. Thus, in eukaryotic cell there are, at least, two different systems of the enzymatic DNA methylations (adenine and cytosine ones) and a special type of regulation of gene functioning based on the combinatory hierarchy of these interdependent genome modifications.

  7. The COG database: an updated version includes eukaryotes

    Directory of Open Access Journals (Sweden)

    Sverdlov Alexander V

    2003-09-01

    Full Text Available Abstract Background The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Results We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens, one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe, and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the

  8. Insight into the Recent Genome Duplication of the Halophilic Yeast Hortaea werneckii: Combining an Improved Genome with Gene Expression and Chromatin Structure

    Directory of Open Access Journals (Sweden)

    Sunita Sinha

    2017-07-01

    Full Text Available Extremophilic organisms demonstrate the flexibility and adaptability of basic biological processes by highlighting how cell physiology adapts to environmental extremes. Few eukaryotic extremophiles have been well studied and only a small number are amenable to laboratory cultivation and manipulation. A detailed characterization of the genome architecture of such organisms is important to illuminate how they adapt to environmental stresses. One excellent example of a fungal extremophile is the halophile Hortaea werneckii (Pezizomycotina, Dothideomycetes, Capnodiales, a yeast-like fungus able to thrive at near-saturating concentrations of sodium chloride and which is also tolerant to both UV irradiation and desiccation. Given its unique lifestyle and its remarkably recent whole genome duplication, H. werneckii provides opportunities for testing the role of genome duplications and adaptability to extreme environments. We previously assembled the genome of H. werneckii using short-read sequencing technology and found a remarkable degree of gene duplication. Technology limitations, however, precluded high-confidence annotation of the entire genome. We therefore revisited the H. wernickii genome using long-read, single-molecule sequencing and provide an improved genome assembly which, combined with transcriptome and nucleosome analysis, provides a useful resource for fungal halophile genomics. Remarkably, the ∼50 Mb H. wernickii genome contains 15,974 genes of which 95% (7608 are duplicates formed by a recent whole genome duplication (WGD, with an average of 5% protein sequence divergence between them. We found that the WGD is extraordinarily recent, and compared to Saccharomyces cerevisiae, the majority of the genome’s ohnologs have not diverged at the level of gene expression of chromatin structure.

  9. Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

    Science.gov (United States)

    Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

    2017-05-22

    Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.

  10. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  11. Structural flexibility in the Burkholderia mallei genome.

    Science.gov (United States)

    Nierman, William C; DeShazer, David; Kim, H Stanley; Tettelin, Herve; Nelson, Karen E; Feldblyum, Tamara; Ulrich, Ricky L; Ronning, Catherine M; Brinkac, Lauren M; Daugherty, Sean C; Davidsen, Tanja D; Deboy, Robert T; Dimitrov, George; Dodson, Robert J; Durkin, A Scott; Gwinn, Michelle L; Haft, Daniel H; Khouri, Hoda; Kolonay, James F; Madupu, Ramana; Mohammoud, Yasmin; Nelson, William C; Radune, Diana; Romero, Claudia M; Sarria, Saul; Selengut, Jeremy; Shamblin, Christine; Sullivan, Steven A; White, Owen; Yu, Yan; Zafar, Nikhat; Zhou, Liwei; Fraser, Claire M

    2004-09-28

    The complete genome sequence of Burkholderia mallei ATCC 23344 provides insight into this highly infectious bacterium's pathogenicity and evolutionary history. B. mallei, the etiologic agent of glanders, has come under renewed scientific investigation as a result of recent concerns about its past and potential future use as a biological weapon. Genome analysis identified a number of putative virulence factors whose function was supported by comparative genome hybridization and expression profiling of the bacterium in hamster liver in vivo. The genome contains numerous insertion sequence elements that have mediated extensive deletions and rearrangements of the genome relative to Burkholderia pseudomallei. The genome also contains a vast number (>12,000) of simple sequence repeats. Variation in simple sequence repeats in key genes can provide a mechanism for generating antigenic variation that may account for the mammalian host's inability to mount a durable adaptive immune response to a B. mallei infection.

  12. Structural flexibility in the Burkholderia mallei genome

    OpenAIRE

    William C. Nierman; DeShazer, David; Kim, H Stanley; Tettelin, Herve; Nelson, Karen E.; Feldblyum, Tamara; Ulrich, Ricky L.; Ronning, Catherine M.; Brinkac, Lauren M.; Daugherty, Sean C.; Davidsen, Tanja D.; DeBoy, Robert T.; Dimitrov, George; Dodson, Robert J.; Durkin, A. Scott

    2004-01-01

    The complete genome sequence of Burkholderia mallei ATCC 23344 provides insight into this highly infectious bacterium's pathogenicity and evolutionary history. B. mallei, the etiologic agent of glanders, has come under renewed scientific investigation as a result of recent concerns about its past and potential future use as a biological weapon. Genome analysis identified a number of putative virulence factors whose function was supported by comparative genome hybridization and expression prof...

  13. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    Science.gov (United States)

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. High-throughput computational and experimental techniques in structural genomics.

    Science.gov (United States)

    Chance, Mark R; Fiser, Andras; Sali, Andrej; Pieper, Ursula; Eswar, Narayanan; Xu, Guiping; Fajardo, J Eduardo; Radhakannan, Thirumuruhan; Marinkovic, Nebojsa

    2004-10-01

    Structural genomics has as its goal the provision of structural information for all possible ORF sequences through a combination of experimental and computational approaches. The access to genome sequences and cloning resources from an ever-widening array of organisms is driving high-throughput structural studies by the New York Structural Genomics Research Consortium. In this report, we outline the progress of the Consortium in establishing its pipeline for structural genomics, and some of the experimental and bioinformatics efforts leading to structural annotation of proteins. The Consortium has established a pipeline for structural biology studies, automated modeling of ORF sequences using solved (template) structures, and a novel high-throughput approach (metallomics) to examining the metal binding to purified protein targets. The Consortium has so far produced 493 purified proteins from >1077 expression vectors. A total of 95 have resulted in crystal structures, and 81 are deposited in the Protein Data Bank (PDB). Comparative modeling of these structures has generated >40,000 structural models. We also initiated a high-throughput metal analysis of the purified proteins; this has determined that 10%-15% of the targets contain a stoichiometric structural or catalytic transition metal atom. The progress of the structural genomics centers in the U.S. and around the world suggests that the goal of providing useful structural information on most all ORF domains will be realized. This projected resource will provide structural biology information important to understanding the function of most proteins of the cell.

  15. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee

    Science.gov (United States)

    Ventura, Mario; Catacchio, Claudia R.; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A.; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H.; Chen, Lin; Kidd, Jeffrey M.; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K.; Eichler, Evan E.

    2011-01-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes. PMID:21685127

  16. Transcriptional consequences of genomic structural aberrations in breast cancer

    OpenAIRE

    Inaki, Koichiro; Hillmer, Axel M.; Ukil, Leena; Yao, Fei; Woo, Xing Yi; Vardy, Leah A.; Zawack, Kelson Folkvard Braaten; Lee, Charlie Wah Heng; Ariyaratne, Pramila Nuwantha; Chan, Yang Sun; Desai, Kartiki Vasant; Bergh, Jonas; Hall, Per; Putti, Thomas Choudary; Ong, Wai Loon

    2011-01-01

    Using a long-span, paired-end deep sequencing strategy, we have comprehensively identified cancer genome rearrangements in eight breast cancer genomes. Herein, we show that 40%–54% of these structural genomic rearrangements result in different forms of fusion transcripts and that 44% are potentially translated. We find that single segmental tandem duplication spanning several genes is a major source of the fusion gene transcripts in both cell lines and primary tumors involving adjacent genes ...

  17. Introductory overview: X-ray absorption spectroscopy and structural genomics.

    Science.gov (United States)

    Ascone, Isabella; Fourme, Roger; Hasnain, S Samar

    2003-01-01

    A special Issue of the Journal is presented, dedicated to biological applications of X-ray absorption spectroscopy (BioXAS) and examining the role of this technique in post-genomic biology. The Issue confirms that BioXAS has come of age and it can be expected to make a significant contribution in the structural genomics effort on metalloproteins, which are estimated to make up about 30% of proteins coded by genomes.

  18. Structural genomics of microbial pathogens – An Indian programme

    OpenAIRE

    Vijayan, M

    2003-01-01

    Structural genomics, simply stated, seeks to determine the structures of all proteins coded by genomes of known sequence, using X-ray crystallography, NMR and bioinformatics. The known principles of protein architecture and the available information on the structural and functional classification of proteins, make this an approachable objective. The early excessive preoccupation with folds has now been substantially overcome. The emphasis is now on the determination of a collection of related...

  19. Functional differentiation of proteins: implications for structural genomics.

    Science.gov (United States)

    Friedberg, Iddo; Godzik, Adam

    2007-04-01

    Structural genomics is a broad initiative of various centers aiming to provide complete coverage of protein structure space. Because it is not feasible to experimentally determine the structures of all proteins, it is generally agreed that the only viable strategy to achieve such coverage is to carefully select specific proteins (targets), determine their structure experimentally, and then use comparative modeling techniques to model the rest. Here we suggest that structural genomics centers refine the structure-driven approach in target selection by adopting function-based criteria. We suggest targeting functionally divergent superfamilies within a given structural fold so that each function receives a structural characterization. We have developed a method to do so, and an itemized survey of several functionally rich folds shows that they are only partially functionally characterized. We call upon structural genomics centers to consider this approach and upon computational biologists to further develop function-based targeting methods.

  20. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob H

    2014-01-01

    , a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog). CONCLUSIONS: We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70......BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However...

  1. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene

  2. Structural genomics: the ultimate approach for rational drug design.

    Science.gov (United States)

    Lundstrom, Kenneth

    2006-10-01

    Structural genomics can be defined as structural biology on a large number of target proteins in parallel. This approach plays an important role in modern structure-based drug design. Although a number of structural genomics initiatives have been initiated, relatively few are associated with integral membrane proteins. This indicates the difficulties in expression, purification, and crystallization of membrane proteins, which has also been confirmed by the existence of some 100 high-resolution structures of membrane proteins among the more than 30,000 entries in public databases. Paradoxically, membrane proteins represent 60-70% of current drug targets and structural knowledge could both improve and speed up the drug discovery process. In order to improve the success rate for structure resolution of membrane proteins structural genomics networks have been established.

  3. Comparative RNA genomics

    DEFF Research Database (Denmark)

    Backofen, Rolf; Gorodkin, Jan; Hofacker, Ivo L.

    2018-01-01

    small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs...

  4. Update on the Pfam5000 Strategy for Selection of StructuralGenomics Targets

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2005-06-27

    Structural Genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good financial value, and tractable. In 2003, we presented the ''Pfam5000'' strategy, which involves selecting the 5,000 most important families from the Pfam database as sources for targets. In this update, we show that although both the Pfam database and the number of sequenced genomes have increased in size, the expected benefits of the Pfam5000 strategy have not changed substantially. Solving the structures of proteins from the 5,000 largest Pfam families would allow accurate fold assignment for approximately 65 percent of all prokaryotic proteins (covering 54 percent of residues) and 63 percent of eukaryotic proteins (42 percent of residues). Fewer than 2,300 of the largest families on this list remain to be solved, making the project feasible in the next five years given the expected throughput to be achieved in the production phase of the Protein Structure Initiative.

  5. Optimized guide RNA structure for genome editing via Cas9.

    Science.gov (United States)

    Xu, Jianyong; Lian, Wei; Jia, Yuning; Li, Lingyun; Huang, Zhong

    2017-11-07

    The genome editing tool Cas9-gRNA (guide RNA) has been successfully applied in different cell types and organisms with high efficiency. However, more efforts need to be made to enhance both efficiency and specificity. In the current study, we optimized the guide RNA structure of Streptococcus pyogenes CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR-associated) system to improve its genome editing efficiency. Comparing with the original functional structure of guide RNA, which is composed of crRNA and tracrRNA, the widely used chimeric gRNA has shorter crRNA and tracrRNA sequence. The deleted RNA sequence could form extra loop structure, which might enhance the stability of the guide RNA structure and subsequently the genome editing efficiency. Thus the genome editing efficiency of different forms of guide RNA was tested. And we found that the chimeric structure of gRNA with original full length of crRNA and tracrRNA showed higher genome editing efficiency than the conventional chimeric structure or other types of gRNA we tested. Therefore our data here uncovered the new type of gRNA structure with higher genome editing efficiency.

  6. Structural Genomics of Minimal Organisms: Pipeline and Results

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  7. Structural genomics of minimal organisms: pipeline and results.

    Science.gov (United States)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2008-01-01

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93% of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  8. Visualization of RNA structure models within the Integrative Genomics Viewer.

    Science.gov (United States)

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  9. Energetics and genetics across the prokaryote-eukaryote divide

    Science.gov (United States)

    2011-01-01

    Background All complex life on Earth is eukaryotic. All eukaryotic cells share a common ancestor that arose just once in four billion years of evolution. Prokaryotes show no tendency to evolve greater morphological complexity, despite their metabolic virtuosity. Here I argue that the eukaryotic cell originated in a unique prokaryotic endosymbiosis, a singular event that transformed the selection pressures acting on both host and endosymbiont. Results The reductive evolution and specialisation of endosymbionts to mitochondria resulted in an extreme genomic asymmetry, in which the residual mitochondrial genomes enabled the expansion of bioenergetic membranes over several orders of magnitude, overcoming the energetic constraints on prokaryotic genome size, and permitting the host cell genome to expand (in principle) over 200,000-fold. This energetic transformation was permissive, not prescriptive; I suggest that the actual increase in early eukaryotic genome size was driven by a heavy early bombardment of genes and introns from the endosymbiont to the host cell, producing a high mutation rate. Unlike prokaryotes, with lower mutation rates and heavy selection pressure to lose genes, early eukaryotes without genome-size limitations could mask mutations by cell fusion and genome duplication, as in allopolyploidy, giving rise to a proto-sexual cell cycle. The side effect was that a large number of shared eukaryotic basal traits accumulated in the same population, a sexual eukaryotic common ancestor, radically different to any known prokaryote. Conclusions The combination of massive bioenergetic expansion, release from genome-size constraints, and high mutation rate favoured a protosexual cell cycle and the accumulation of eukaryotic traits. These factors explain the unique origin of eukaryotes, the absence of true evolutionary intermediates, and the evolution of sex in eukaryotes but not prokaryotes. Reviewers This article was reviewed by: Eugene Koonin, William Martin

  10. Evolutionary genomics and population structure of Entamoeba histolytica

    Science.gov (United States)

    Das, Koushik; Ganguly, Sandipan

    2014-01-01

    Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba. PMID:25505504

  11. Transfer of DNA from Bacteria to Eukaryotes

    Directory of Open Access Journals (Sweden)

    Benoît Lacroix

    2016-07-01

    Full Text Available Historically, the members of the Agrobacterium genus have been considered the only bacterial species naturally able to transfer and integrate DNA into the genomes of their eukaryotic hosts. Yet, increasing evidence suggests that this ability to genetically transform eukaryotic host cells might be more widespread in the bacterial world. Indeed, analyses of accumulating genomic data reveal cases of horizontal gene transfer from bacteria to eukaryotes and suggest that it represents a significant force in adaptive evolution of eukaryotic species. Specifically, recent reports indicate that bacteria other than Agrobacterium, such as Bartonella henselae (a zoonotic pathogen, Rhizobium etli (a plant-symbiotic bacterium related to Agrobacterium, or even Escherichia coli, have the ability to genetically transform their host cells under laboratory conditions. This DNA transfer relies on type IV secretion systems (T4SSs, the molecular machines that transport macromolecules during conjugative plasmid transfer and also during transport of proteins and/or DNA to the eukaryotic recipient cells. In this review article, we explore the extent of possible transfer of genetic information from bacteria to eukaryotic cells as well as the evolutionary implications and potential applications of this transfer.

  12. Structural and functional analysis of rice genome

    Indian Academy of Sciences (India)

    Unknown

    PARUL KHURANA and SULABHA SHARMA. Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, .... of the genome, which are some- times over-looked by biological means. The development .... The map was updated by Saji et al. (2001), who utilized 1439 markers out of the ...

  13. Assessing structural variation in a personal genome-towards a human reference diploid genome.

    Science.gov (United States)

    English, Adam C; Salerno, William J; Hampton, Oliver A; Gonzaga-Jauregui, Claudia; Ambreth, Shruthi; Ritter, Deborah I; Beck, Christine R; Davis, Caleb F; Dahdouli, Mahmoud; Ma, Singer; Carroll, Andrew; Veeraraghavan, Narayanan; Bruestle, Jeremy; Drees, Becky; Hastie, Alex; Lam, Ernest T; White, Simon; Mishra, Pamela; Wang, Min; Han, Yi; Zhang, Feng; Stankiewicz, Pawel; Wheeler, David A; Reid, Jeffrey G; Muzny, Donna M; Rogers, Jeffrey; Sabo, Aniko; Worley, Kim C; Lupski, James R; Boerwinkle, Eric; Gibbs, Richard A

    2015-04-11

    Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

  14. CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes

    Directory of Open Access Journals (Sweden)

    Bazzicalupo Marco

    2011-06-01

    Full Text Available Abstract Recent developments in sequencing technologies have given the opportunity to sequence many bacterial genomes with limited cost and labor, compared to previous techniques. However, a limiting step of genome sequencing is the finishing process, needed to infer the relative position of each contig and close sequencing gaps. An additional degree of complexity is given by bacterial species harboring more than one replicon, which are not contemplated by the currently available programs. The availability of a large number of bacterial genomes allows geneticists to use complete genomes (possibly from the same species as templates for contigs mapping. Here we present CONTIGuator, a software tool for contigs mapping over a reference genome which allows the visualization of a map of contigs, underlining loss and/or gain of genetic elements and permitting to finish multipartite genomes. The functionality of CONTIGuator was tested using four genomes, demonstrating its improved performances compared to currently available programs. Our approach appears efficient, with a clear visualization, allowing the user to perform comparative structural genomics analysis on draft genomes. CONTIGuator is a Python script for Linux environments and can be used on normal desktop machines and can be downloaded from http://contiguator.sourceforge.net.

  15. Structural genomics-impact on biomedicine and drug discovery.

    Science.gov (United States)

    Weigelt, Johan

    2010-05-01

    The field of structural genomics emerged as one of many 'omics disciplines more than a decade ago, and a multitude of large scale initiatives have been launched across the world. Development and implementation of methods for high-throughput structural biology represents a common denominator among different structural genomics programs. From another perspective a distinction between "biology-driven" versus "structure-driven" approaches can be made. This review outlines the general themes of structural genomics, its achievements and its impact on biomedicine and drug discovery. The growing number of high resolution structures of known and potential drug target proteins is expected to have tremendous value for future drug discovery programs. Moreover, the availability of large numbers of purified proteins enables generation of tool reagents, such as chemical probes and antibodies, to further explore protein function in the cell. Copyright 2010 Elsevier Inc. All rights reserved.

  16. PSI-2: structural genomics to cover protein domain family space.

    Science.gov (United States)

    Dessailly, Benoît H; Nair, Rajesh; Jaroszewski, Lukasz; Fajardo, J Eduardo; Kouranov, Andrei; Lee, David; Fiser, Andras; Godzik, Adam; Rost, Burkhard; Orengo, Christine

    2009-06-10

    One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centers, targets representatives from large, structurally uncharacterized protein domain families, and from structurally uncharacterized subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly overrepresented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first 3 years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.

  17. Alu recombination-mediated structural deletions in the chimpanzee genome.

    Directory of Open Access Journals (Sweden)

    Kyudong Han

    2007-10-01

    Full Text Available With more than 1.2 million copies, Alu elements are one of the most important sources of structural variation in primate genomes. Here, we compare the chimpanzee and human genomes to determine the extent of Alu recombination-mediated deletion (ARMD in the chimpanzee genome since the divergence of the chimpanzee and human lineages ( approximately 6 million y ago. Combining computational data analysis and experimental verification, we have identified 663 chimpanzee lineage-specific deletions (involving a total of approximately 771 kb of genomic sequence attributable to this process. The ARMD events essentially counteract the genomic expansion caused by chimpanzee-specific Alu inserts. The RefSeq databases indicate that 13 exons in six genes, annotated as either demonstrably or putatively functional in the human genome, and 299 intronic regions have been deleted through ARMDs in the chimpanzee lineage. Therefore, our data suggest that this process may contribute to the genomic and phenotypic diversity between chimpanzees and humans. In addition, we found four independent ARMD events at orthologous loci in the gorilla or orangutan genomes. This suggests that human orthologs of loci at which ARMD events have already occurred in other nonhuman primate genomes may be "at-risk" motifs for future deletions, which may subsequently contribute to human lineage-specific genetic rearrangements and disorders.

  18. Genome Pool Strategy for Structural Coverage of Protein Families

    Science.gov (United States)

    Jaroszewski, Lukasz; Slabinski, Lukasz; Wooley, John; Deacon, Ashley M.; Lesley, Scott A.; Wilson, Ian. A.; Godzik, Adam

    2010-01-01

    As noticed by generations of structural biologists, closely homologous proteins may have substantially different crystallization properties and propensities. These observations can be used to systematically introduce additional dimensionality into crystallization trials by targeting homologous proteins from multiple genomes in a “genome pool” strategy. Through extensive use of our recently introduced “crystallization feasibility score” (Slabinski et al., 2007a), we can explain that the genome pool strategy works well because the crystallization feasibility scores are surprisingly broad within families of homologous proteins, with most families containing a range of optimal to very difficult targets. We also show that some families can be regarded as relatively “easy”, where a significant number of proteins are predicted to have optimal crystallization features, and others are “very difficult”, where almost none are predicted to result in a crystal structure. Thus, the outcome of such variable distributions of such crystallizability' preferences leads to uneven structural coverage of known families, with “easier” or “optimal” families having several times more solved structures than “very difficult” ones. Nevertheless, this latter category can be successfully targeted by increasing the number of genomes that are used to select targets from a given family. On average, adding 10 new genomes to the “genome pool” provides more promising targets for 7 “very difficult” families. In contrast, our crystallization feasibility score does not indicate that any specific microbial genomes can be readily classified as “easier” or “very difficult” with respect to providing suitable candidates for crystallization and structure determination. Finally, our analyses show that specific physicochemical properties of the protein sequence favor successful outcomes for structure determination and, hence, the group of proteins with known 3D

  19. A Genome-Wide Survey of Switchgrass Genome Structure and Organization

    Science.gov (United States)

    Sharma, Manoj K.; Sharma, Rita; Cao, Peijian; Jenkins, Jerry; Bartley, Laura E.; Qualls, Morgan; Grimwood, Jane; Schmutz, Jeremy; Rokhsar, Daniel; Ronald, Pamela C.

    2012-01-01

    The perennial grass, switchgrass (Panicum virgatum L.), is a promising bioenergy crop and the target of whole genome sequencing. We constructed two bacterial artificial chromosome (BAC) libraries from the AP13 clone of switchgrass to gain insight into the genome structure and organization, initiate functional and comparative genomic studies, and assist with genome assembly. Together representing 16 haploid genome equivalents of switchgrass, each library comprises 101,376 clones with average insert sizes of 144 (HindIII-generated) and 110 kb (BstYI-generated). A total of 330,297 high quality BAC-end sequences (BES) were generated, accounting for 263.2 Mbp (16.4%) of the switchgrass genome. Analysis of the BES identified 279,099 known repetitive elements, >50,000 SSRs, and 2,528 novel repeat elements, named switchgrass repetitive elements (SREs). Comparative mapping of 47 full-length BAC sequences and 330K BES revealed high levels of synteny with the grass genomes sorghum, rice, maize, and Brachypodium. Our data indicate that the sorghum genome has retained larger microsyntenous regions with switchgrass besides high gene order conservation with rice. The resources generated in this effort will be useful for a broad range of applications. PMID:22511929

  20. Genome-wide profiling of structural genomic variations in Korean HapMap individuals.

    Science.gov (United States)

    Bae, Joon Seol; Cheong, Hyun Sub; Park, Byung Lae; Kim, Lyoung Hyo; Han, Chang Soo; Park, Tae Joon; Kim, Jason Yongha; Pasaje, Charisse Flerida A; Lee, Jin Sol; Shin, Hyoung Doo

    2010-07-02

    Structural genomic variation study, along with microarray technology development has provided many genomic resources related with architecture of human genome, and led to the fact that human genome structure is a lot more complicated than previously thought. In the case of International HapMap Project, Epstein-Barr various immortalized cell lines were preferably used over blood in order to get a larger number of genomic DNA. However, genomic aberration stemming from immortalization process, biased representation of the donor tissue, and culture process may influence the accuracy of SNP genotypes. In order to identify chromosome aberrations including loss of heterozygosity (LOH), large-scale and small-scale copy number variations, we used Illumina HumanHap500 BeadChip (555,352 markers) on Korean HapMap individuals (n = 90) to obtain Log R ratio and B allele frequency information, and then utilized the data with various programs including Illumina ChromoZone, cnvParition and PennCNV. As a result, we identified 28 LOHs (>3 mb) and 35 large-scale CNVs (>1 mb), with 4 samples having completely duplicated chromosome. In addition, after checking the sample quality (standard deviation of log R ratio HapMap individuals, and expect that these findings will provide more meaningful information on the human genome.

  1. The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium.

    Science.gov (United States)

    Xiao, Rong; Anderson, Stephen; Aramini, James; Belote, Rachel; Buchwald, William A; Ciccosanti, Colleen; Conover, Ken; Everett, John K; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Jiang, Mei; Kornhaber, Gregory J; Lee, Dong Yup; Locke, Jessica Y; Ma, Li-Chung; Maglaqui, Melissa; Mao, Lei; Mitra, Saheli; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Sharma, Seema; Shastry, Ritu; Swapna, G V T; Tong, Saichu N; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T; Acton, Thomas B

    2010-10-01

    We describe the core Protein Production Platform of the Northeast Structural Genomics Consortium (NESG) and outline the strategies used for producing high-quality protein samples. The platform is centered on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems. The 6X-His tag allows for similar purification procedures for most targets and implementation of high-throughput (HTP) parallel methods. In most cases, the 6X-His-tagged proteins are sufficiently purified (>97% homogeneity) using a HTP two-step purification protocol for most structural studies. Using this platform, the open reading frames of over 16,000 different targeted proteins (or domains) have been cloned as>26,000 constructs. Over the past 10 years, more than 16,000 of these expressed protein, and more than 4400 proteins (or domains) have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html). Using these samples, the NESG has deposited more than 900 new protein structures to the Protein Data Bank (PDB). The methods described here are effective in producing eukaryotic and prokaryotic protein samples in E. coli. This paper summarizes some of the updates made to the protein production pipeline in the last 5 years, corresponding to phase 2 of the NIGMS Protein Structure Initiative (PSI-2) project. The NESG Protein Production Platform is suitable for implementation in a large individual laboratory or by a small group of collaborating investigators. These advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are of broad value to the structural biology, functional proteomics, and structural genomics communities.

  2. Eukaryotic DNA Replicases

    KAUST Repository

    Zaher, Manal S.

    2014-11-21

    The current model of the eukaryotic DNA replication fork includes three replicative DNA polymerases, polymerase α/primase complex (Pol α), polymerase δ (Pol δ), and polymerase ε (Pol ε). The primase synthesizes 8–12 nucleotide RNA primers that are extended by the DNA polymerization activity of Pol α into 30–35 nucleotide RNA-DNA primers. Replication factor C (RFC) opens the polymerase clamp-like processivity factor, proliferating cell nuclear antigen (PCNA), and loads it onto the primer-template. Pol δ utilizes PCNA to mediate highly processive DNA synthesis, while Pol ε has intrinsic high processivity that is modestly stimulated by PCNA. Pol ε replicates the leading strand and Pol δ replicates the lagging strand in a division of labor that is not strict. The three polymerases are comprised of multiple subunits and share unifying features in their large catalytic and B subunits. The remaining subunits are evolutionarily not related and perform diverse functions. The catalytic subunits are members of family B, which are distinguished by their larger sizes due to inserts in their N- and C-terminal regions. The sizes of these inserts vary among the three polymerases, and their functions remain largely unknown. Strikingly, the quaternary structures of Pol α, Pol δ, and Pol ε are arranged similarly. The catalytic subunits adopt a globular structure that is linked via its conserved C-terminal region to the B subunit. The remaining subunits are linked to the catalytic and B subunits in a highly flexible manner.

  3. A Structural Hinge in Eukaryotic MutY Homologues Mediates Catalytic Activity and Rad9-Rad1-Hus1 Checkpoint Complex Interactions

    Energy Technology Data Exchange (ETDEWEB)

    P Luncsford; D Chang; G Shi; J Bernstein; A Madabushi; D Patterson; A Lu; E Toth

    2011-12-31

    The DNA glycosylase MutY homologue (MYH or MUTYH) removes adenines misincorporated opposite 8-oxoguanine as part of the base excision repair pathway. Importantly, defects in human MYH (hMYH) activity cause the inherited colorectal cancer syndrome MYH-associated polyposis. A key feature of MYH activity is its coordination with cell cycle checkpoint via interaction with the Rad9-Rad1-Hus1 (9-1-1) complex. The 9-1-1 complex facilitates cell cycle checkpoint activity and coordinates this activity with ongoing DNA repair. The interdomain connector (IDC, residues 295-350) between the catalytic domain and the 8-oxoguanine recognition domain of hMYH is a critical element that maintains interactions with the 9-1-1 complex. We report the first crystal structure of a eukaryotic MutY protein, a fragment of hMYH (residues 65-350) that consists of the catalytic domain and the IDC. Our structure reveals that the IDC adopts a stabilized conformation projecting away from the catalytic domain to form a docking scaffold for 9-1-1. We further examined the role of the IDC using Schizosaccharomyces pombe MYH as model system. In vitro studies of S. pombe MYH identified residues I261 and E262 of the IDC (equivalent to V315 and E316 of the hMYH IDC) as critical for maintaining the MYH/9-1-1 interaction. We determined that the eukaryotic IDC is also required for DNA damage selection and robust enzymatic activity. Our studies also provide the first evidence that disruption of the MYH/9-1-1 interaction diminishes the repair of oxidative DNA damage in vivo. Thus, preserving the MYH/9-1-1 interaction contributes significantly to minimizing the mutagenic potential of oxidative DNA damage.

  4. Structural genomics and drug discovery: all in the family.

    Science.gov (United States)

    Weigelt, Johan; McBroom-Cerajewski, Linda D B; Schapira, Matthieu; Zhao, Yong; Arrowsmith, Cheryl H; Arrowmsmith, Cheryl H

    2008-02-01

    Structural genomics is starting to have an impact on the early stages of drug discovery and target validation through the contribution of new structures of known and potential drug targets, their complexes with ligands and protocols and reagents for additional structural work within a drug discovery program. Recent progress includes structures of targets from bacterial, viral and protozoan human pathogens, and human targets from known or potential druggable protein families such as, kinases, phosphatases, dehydrogenases/oxidoreductases, sulfo-, acetyl- and methyl-transferases, and a number of other key metabolic enzymes. Importantly, many of these structures contained ligands in the active sites, including for example, the first structures of target-bound therapeutics. Structural genomics of protein families combined with ligand discovery holds particular promise for advancing early stage discovery programs.

  5. Structural Genomics and Drug Discovery for Infectious Diseases

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, W.F.

    2010-09-03

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  6. Structural genomics and drug discovery for infectious diseases.

    Science.gov (United States)

    Anderson, W F

    2009-11-01

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  7. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than poi...... mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome.......BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point...... mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost...

  8. Autophagy in unicellular eukaryotes

    NARCIS (Netherlands)

    Kiel, J.A.K.W.

    2010-01-01

    Cells need a constant supply of precursors to enable the production of macromolecules to sustain growth and survival. Unlike metazoans, unicellular eukaryotes depend exclusively on the extracellular medium for this supply. When environmental nutrients become depleted, existing cytoplasmic components

  9. The Impact of Structural Genomics: Expectations and Outcomes

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Structural Genomics (SG) projects aim to expand our structural knowledge of biological macromolecules, while lowering the average costs of structure determination. We quantitatively analyzed the novelty, cost, and impact of structures solved by SG centers, and contrast these results with traditional structural biology. The first structure from a protein family is particularly important to reveal the fold and ancient relationships to other proteins. In the last year, approximately half of such structures were solved at a SG center rather than in a traditional laboratory. Furthermore, the cost of solving a structure at the most efficient U.S. center has now dropped to one-quarter the estimated cost of solving a structure by traditional methods. However, top structural biology laboratories are much more efficient than the average, and comparable to SG centers despite working on very challenging structures. Moreover, traditional structural biology papers are cited significantly more often, suggesting greater current impact.

  10. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob Hull

    2014-01-01

    , a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58......, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated...

  11. Myxoma Virus Immunomodulatory Protein M156R is a Structural Mimic of Eukaryotic Translation Initiation Factor eIF2 alpha

    Energy Technology Data Exchange (ETDEWEB)

    Ramelot, Theresa A.; Cort, John R.; Yee, Adelinda; Liu, Furong; Goshe, Michael B.; Edwards, Aled M.; Smith, Richard D.; Arrowsmith, Cheryl H.; Dever, Thomas E.; Kennedy, Michael A.

    2002-10-04

    M156R, the product of the myxoma virus M156R open reading frame, is a protein of unknown function. However, several homologs of M156R from other viruses are immunomodulatory proteins that bind to interferon-induced protein kinase PKR and inhibit phosphorylation of the eukaryotic translation initiation factor eIF2a. In this study, we have determined the nuclear magnetic resonance (NMR) structure of M156R, the first structure of a myxoma virus protein. The fold consists of a five-stranded antiparallel b-barrel with two of the strands connected by a long loop and a short a-helix. The similarity between M156R and the predicted S1 motif structure of eIF2a suggests that the viral homologs are pseudosubstrate inhibitors of PKR that mimic eIF2a in order to compete for binding to PKR. A homology modeled structure of the well studied vaccinia virus K3L was generated based on alignment with M156R. Residues important for binding to PKR are conserved residues on the surface of the b-barrel and in the mobile loop, identifying the putative PKR recognition motif.

  12. Structural genomics approach to drug discovery for Mycobacterium tuberculosis.

    Science.gov (United States)

    Ioerger, Thomas R; Sacchettini, James C

    2009-06-01

    Structural genomics has become a powerful tool for studying microorganisms at the molecular level. Advances in technology have enabled the assembly of high-throughput pipelines that can be used to automate X-ray crystal structure determination for many proteins in the genome of a target organism. In this paper, we describe the methods used in the Tuberculosis Structural Genomics Consortium (TBSGC), ranging from protein production and crystallization to diffraction data collection and processing. The TBSGC is unique in that it uses biological importance as a primary criterion for target selection. The over-riding goal is to solve structures of proteins that may be potential drug targets, in order to support drug discovery efforts. We describe the crystal structures of several significant proteins in the M. tuberculosis genome that have been solved by the TBSGC over the past few years. We conclude by describing the high-throughput screening facilities and virtual screening facilities we have implemented for identifying small-molecule inhibitors of proteins whose structures have been solved.

  13. Multi-scale structural community organisation of the human genome.

    Science.gov (United States)

    Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

    2017-04-11

    Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.

  14. Benefits of structural genomics for drug discovery research.

    Science.gov (United States)

    Grabowski, Marek; Chruszcz, Maksymilian; Zimmerman, Matthew D; Kirillova, Olga; Minor, Wladek

    2009-11-01

    While three dimensional structures have long been used to search for new drug targets, only a fraction of new drugs coming to the market has been developed with the use of a structure-based drug discovery approach. However, the recent years have brought not only an avalanche of new macromolecular structures, but also significant advances in the protein structure determination methodology only now making their way into structure-based drug discovery. In this paper, we review recent developments resulting from the Structural Genomics (SG) programs, focusing on the methods and results most likely to improve our understanding of the molecular foundation of human diseases. SG programs have been around for almost a decade, and in that time, have contributed a significant part of the structural coverage of both the genomes of pathogens causing infectious diseases and structurally uncharacterized biological processes in general. Perhaps most importantly, SG programs have developed new methodology at all steps of the structure determination process, not only to determine new structures highly efficiently, but also to screen protein/ligand interactions. We describe the methodologies, experience and technologies developed by SG, which range from improvements to cloning protocols to improved procedures for crystallographic structure solution that may be applied in "traditional" structural biology laboratories particularly those performing drug discovery. We also discuss the conditions that must be met to convert the present high-throughput structure determination pipeline into a high-output structure-based drug discovery system.

  15. Crystal Structure of the Eukaryotic Strong Inward-Rectifier K[superscript +] Channel Kir2.2 at 3.1 Å Resolution

    Energy Technology Data Exchange (ETDEWEB)

    Tao, Xiao; Avalos, Jose L.; Chen, Jiayun; MacKinnon, Roderick; (Rockefeller)

    2010-03-29

    Inward-rectifier potassium (K{sup +}) channels conduct K{sup +} ions most efficiently in one direction, into the cell. Kir2 channels control the resting membrane voltage in many electrically excitable cells, and heritable mutations cause periodic paralysis and cardiac arrhythmia. We present the crystal structure of Kir2.2 from chicken, which, excluding the unstructured amino and carboxyl termini, is 90% identical to human Kir2.2. Crystals containing rubidium (Rb{sup +}), strontium (Sr{sup 2+}), and europium (Eu{sup 3+}) reveal binding sites along the ion conduction pathway that are both conductive and inhibitory. The sites correlate with extensive electrophysiological data and provide a structural basis for understanding rectification. The channel's extracellular surface, with large structured turrets and an unusual selectivity filter entryway, might explain the relative insensitivity of eukaryotic inward rectifiers to toxins. These same surface features also suggest a possible approach to the development of inhibitory agents specific to each member of the inward-rectifier K{sup +} channel family.

  16. Structural classification of proteins and structural genomics: new insights into protein folding and evolution.

    Science.gov (United States)

    Andreeva, Antonina; Murzin, Alexey G

    2010-10-01

    During the past decade, the Protein Structure Initiative (PSI) centres have become major contributors of new families, superfamilies and folds to the Structural Classification of Proteins (SCOP) database. The PSI results have increased the diversity of protein structural space and accelerated our understanding of it. This review article surveys a selection of protein structures determined by the Joint Center for Structural Genomics (JCSG). It presents previously undescribed β-sheet architectures such as the double barrel and spiral β-roll and discusses new examples of unusual topologies and peculiar structural features observed in proteins characterized by the JCSG and other Structural Genomics centres.

  17. Evolutionary genomics and population structure of Entamoeba histolytica

    Directory of Open Access Journals (Sweden)

    Koushik Das

    2014-11-01

    Full Text Available Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba.

  18. Comparative genomics of Burkholderia multivorans, a ubiquitous pathogen with a highly conserved genomic structure.

    Directory of Open Access Journals (Sweden)

    Charlotte Peeters

    Full Text Available The natural environment serves as a reservoir of opportunistic pathogens. A well-established method for studying the epidemiology of such opportunists is multilocus sequence typing, which in many cases has defined strains predisposed to causing infection. Burkholderia multivorans is an important pathogen in people with cystic fibrosis (CF and its epidemiology suggests that strains are acquired from non-human sources such as the natural environment. This raises the central question of whether the isolation source (CF or environment or the multilocus sequence type (ST of B. multivorans better predicts their genomic content and functionality. We identified four pairs of B. multivorans isolates, representing distinct STs and consisting of one CF and one environmental isolate each. All genomes were sequenced using the PacBio SMRT sequencing technology, which resulted in eight high-quality B. multivorans genome assemblies. The present study demonstrated that the genomic structure of the examined B. multivorans STs is highly conserved and that the B. multivorans genomic lineages are defined by their ST. Orthologous protein families were not uniformly distributed among chromosomes, with core orthologs being enriched on the primary chromosome and ST-specific orthologs being enriched on the second and third chromosome. The ST-specific orthologs were enriched in genes involved in defense mechanisms and secondary metabolism, corroborating the strain-specificity of these virulence characteristics. Finally, the same B. multivorans genomic lineages occur in both CF and environmental samples and on different continents, demonstrating their ubiquity and evolutionary persistence.

  19. Comparative genomics of Burkholderia multivorans, a ubiquitous pathogen with a highly conserved genomic structure.

    Science.gov (United States)

    Peeters, Charlotte; Cooper, Vaughn S; Hatcher, Philip J; Verheyde, Bart; Carlier, Aurélien; Vandamme, Peter

    2017-01-01

    The natural environment serves as a reservoir of opportunistic pathogens. A well-established method for studying the epidemiology of such opportunists is multilocus sequence typing, which in many cases has defined strains predisposed to causing infection. Burkholderia multivorans is an important pathogen in people with cystic fibrosis (CF) and its epidemiology suggests that strains are acquired from non-human sources such as the natural environment. This raises the central question of whether the isolation source (CF or environment) or the multilocus sequence type (ST) of B. multivorans better predicts their genomic content and functionality. We identified four pairs of B. multivorans isolates, representing distinct STs and consisting of one CF and one environmental isolate each. All genomes were sequenced using the PacBio SMRT sequencing technology, which resulted in eight high-quality B. multivorans genome assemblies. The present study demonstrated that the genomic structure of the examined B. multivorans STs is highly conserved and that the B. multivorans genomic lineages are defined by their ST. Orthologous protein families were not uniformly distributed among chromosomes, with core orthologs being enriched on the primary chromosome and ST-specific orthologs being enriched on the second and third chromosome. The ST-specific orthologs were enriched in genes involved in defense mechanisms and secondary metabolism, corroborating the strain-specificity of these virulence characteristics. Finally, the same B. multivorans genomic lineages occur in both CF and environmental samples and on different continents, demonstrating their ubiquity and evolutionary persistence.

  20. Structures of the CRISPR genome integration complex.

    Science.gov (United States)

    Wright, Addison V; Liu, Jun-Jie; Knott, Gavin J; Doxzen, Kevin W; Nogales, Eva; Doudna, Jennifer A

    2017-09-15

    CRISPR-Cas systems depend on the Cas1-Cas2 integrase to capture and integrate short foreign DNA fragments into the CRISPR locus, enabling adaptation to new viruses. We present crystal structures of Cas1-Cas2 bound to both donor and target DNA in intermediate and product integration complexes, as well as a cryo-electron microscopy structure of the full CRISPR locus integration complex, including the accessory protein IHF (integration host factor). The structures show unexpectedly that indirect sequence recognition dictates integration site selection by favoring deformation of the repeat and the flanking sequences. IHF binding bends the DNA sharply, bringing an upstream recognition motif into contact with Cas1 to increase both the specificity and efficiency of integration. These results explain how the Cas1-Cas2 CRISPR integrase recognizes a sequence-dependent DNA structure to ensure site-selective CRISPR array expansion during the initial step of bacterial adaptive immunity. Copyright © 2017, American Association for the Advancement of Science.

  1. In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures

    Energy Technology Data Exchange (ETDEWEB)

    Acquisti, Claudia; Allegrini, Paolo E-mail: allegrip@ilc.cnr.it; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi

    2004-04-01

    We investigate on a possible way to connect the presence of low-complexity sequences (LCS) in DNA genomes and the non-stationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called non-stationarity entropic index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation.

  2. Viruses and viruslike particles of eukaryotic algae.

    OpenAIRE

    Van Etten, J L; Lane, L C; Meints, R H

    1991-01-01

    Until recently there was little interest or information on viruses and viruslike particles of eukaryotic algae. However, this situation is changing. In the past decade many large double-stranded DNA-containing viruses that infect two culturable, unicellular, eukaryotic green algae have been discovered. These viruses can be produced in large quantities, assayed by plaque formation, and analyzed by standard bacteriophage techniques. The viruses are structurally similar to animal iridoviruses, t...

  3. Metabolic Constraints on the Eukaryotic Transition

    Science.gov (United States)

    Wallace, Rodrick

    2009-04-01

    Mutualism, obligate mutualism, symbiosis, and the eukaryotic ‘fusion’ of Serial Endosymbiosis Theory represent progressively more rapid and less distorted real-time communication between biological structures instantiating information sources. Such progression in accurate information transmission requires, in turn, progressively greater channel capacity that, through the homology between information source uncertainty and free energy density, requires ever more energetic metabolism. The eukaryotic transition, according to this model, may have been entrained by an ecosystem resilience shift from anaerobic to aerobic metabolism.

  4. Decoding the fine-scale structure of a breast cancer genome and transcriptome

    OpenAIRE

    Volik, Stanislav; Raphael, Benjamin J.; Huang, Guiqing; Stratton, Michael R.; Bignel, Graham; Murnane, John; Brebner, John H.; Bajsarowicz, Krystyna; Paris, Pamela L.; Tao, Quanzhou; Kowbel, David; Lapuk, Anna; Shagin, Dmitri A.; Shagina, Irina A.; Gray, Joe W.

    2006-01-01

    A comprehensive understanding of cancer is predicated upon knowledge of the structure of malignant genomes underlying its many variant forms and the molecular mechanisms giving rise to them. It is well established that solid tumor genomes accumulate a large number of genome rearrangements during tumorigenesis. End Sequence Profiling (ESP) maps and clones genome breakpoints associated with all types of genome rearrangements elucidating the structural organization of tumor genomes. Here we exte...

  5. Structural genomics of infectious disease drug targets: the SSGCID.

    Science.gov (United States)

    Stacy, Robin; Begley, Darren W; Phan, Isabelle; Staker, Bart L; Van Voorhis, Wesley C; Varani, Gabriele; Buchko, Garry W; Stewart, Lance J; Myler, Peter J

    2011-09-01

    The Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium of researchers at Seattle BioMed, Emerald BioStructures, the University of Washington and Pacific Northwest National Laboratory that was established to apply structural genomics approaches to drug targets from infectious disease organisms. The SSGCID is currently funded over a five-year period by the National Institute of Allergy and Infectious Diseases (NIAID) to determine the three-dimensional structures of 400 proteins from a variety of Category A, B and C pathogens. Target selection engages the infectious disease research and drug-therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. The protein-expression systems, purified proteins, ligand screens and three-dimensional structures produced by SSGCID constitute a valuable resource for drug-discovery research, all of which is made freely available to the greater scientific community. This issue of Acta Crystallographica Section F, entirely devoted to the work of the SSGCID, covers the details of the high-throughput pipeline and presents a series of structures from a broad array of pathogenic organisms. Here, a background is provided on the structural genomics of infectious disease, the essential components of the SSGCID pipeline are discussed and a survey of progress to date is presented.

  6. Structural genomics plucks high-hanging membrane proteins.

    Science.gov (United States)

    Kloppmann, Edda; Punta, Marco; Rost, Burkhard

    2012-06-01

    Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. cDNA structure, genomic organization and expression patterns of ...

    African Journals Online (AJOL)

    Visfatin was a newly identified adipocytokine, which was involved in various physiologic and pathologic processes of organisms. The cDNA structure, genomic organization and expression patterns of silver Prussian carp visfatin were described in this report. The silver Prussian carp visfatin cDNA cloned from the liver was ...

  8. Eukaryotic expression, purification and structure/function analysis of native, recombinant CRISP3 from human and mouse

    Science.gov (United States)

    Volpert, Marianna; Mangum, Jonathan E.; Jamsai, Duangporn; D'Sylva, Rebecca; O'Bryan, Moira K.; McIntyre, Peter

    2014-02-01

    While the Cysteine-Rich Secretory Proteins (CRISPs) have been broadly proposed as regulators of reproduction and immunity, physiological roles have yet to be established for individual members of this family. Past efforts to investigate their functions have been limited by the difficulty of purifying correctly folded CRISPs from bacterial expression systems, which yield low quantities of correctly folded protein containing the eight disulfide bonds that define the CRISP family. Here we report the expression and purification of native, glycosylated CRISP3 from human and mouse, expressed in HEK 293 cells and isolated using ion exchange and size exclusion chromatography. Functional authenticity was verified by substrate-affinity, native glycosylation characteristics and quaternary structure (monomer in solution). Validated protein was used in comparative structure/function studies to characterise sites and patterns of N-glycosylation in CRISP3, revealing interesting inter-species differences.

  9. Architecture of the 90S Pre-ribosome: A Structural View on the Birth of the Eukaryotic Ribosome.

    Science.gov (United States)

    Kornprobst, Markus; Turk, Martin; Kellner, Nikola; Cheng, Jingdong; Flemming, Dirk; Koš-Braun, Isabelle; Koš, Martin; Thoms, Matthias; Berninghausen, Otto; Beckmann, Roland; Hurt, Ed

    2016-07-14

    The 90S pre-ribosome is an early biogenesis intermediate formed during co-transcriptional ribosome formation, composed of ∼70 assembly factors and several small nucleolar RNAs (snoRNAs) that associate with nascent pre-rRNA. We report the cryo-EM structure of the Chaetomium thermophilum 90S pre-ribosome, revealing how a network of biogenesis factors including 19 β-propellers and large α-solenoid proteins engulfs the pre-rRNA. Within the 90S pre-ribosome, we identify the UTP-A, UTP-B, Mpp10-Imp3-Imp4, Bms1-Rcl1, and U3 snoRNP modules, which are organized around 5'-ETS and partially folded 18S rRNA. The U3 snoRNP is strategically positioned at the center of the 90S particle to perform its multiple tasks during pre-rRNA folding and processing. The architecture of the elusive 90S pre-ribosome gives unprecedented structural insight into the early steps of pre-rRNA maturation. Nascent rRNA that is co-transcriptionally folded and given a particular shape by encapsulation within a dedicated mold-like structure is reminiscent of how polypeptides use chaperone chambers for their protein folding. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their preci...

  11. Defensins: antifungal lessons from eukaryotes

    Directory of Open Access Journals (Sweden)

    Patrícia M. Silva

    2014-03-01

    Full Text Available Over the last years, antimicrobial peptides (AMPs have been the focus of intense research towards the finding of a viable alternative to current antifungal drugs. Defensins are one of the major families of AMPs and the most represented among all eukaryotic groups, providing an important first line of host defense against pathogenic microorganisms. Several of these cysteine-stabilized peptides present a relevant effect against fungi. Defensins are the AMPs with the broader distribution across all eukaryotic kingdoms, namely, Fungi, Plantæ and Animalia, and were recently shown to have an ancestor in a bacterial organism. As a part of the host defense, defensins act as an important vehicle of information between innate and adaptive immune system and have a role in immunomodulation. This multidimensionality represents a powerful host shield, hard for microorganisms to overcome using single approach resistance strategies. Pathogenic fungi resistance to conventional antimycotic drugs is becoming a major problem. Defensins, as other AMPs, have shown to be an effective alternative to the current antimycotic therapies, demonstrating potential as novel therapeutic agents or drug leads. In this review, we summarize the current knowledge on some eukaryotic defensins with antifungal action. An overview of the main targets in the fungal cell and the mechanism of action of these AMPs (namely, the selectivity for some fungal membrane components are presented. Additionally, recent works on antifungal defensins structure, activity and citotoxicity are also reviewed.

  12. The eukaryotic promoter database (EPD).

    Science.gov (United States)

    Périer, R C; Praz, V; Junier, T; Bonnard, C; Bucher, P

    2000-01-01

    The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. WWW-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch

  13. TOPSAN: a dynamic web database for structural genomics.

    Science.gov (United States)

    Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

    2011-01-01

    The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.

  14. The History of Bordetella pertussis Genome Evolution Includes Structural Rearrangement.

    Science.gov (United States)

    Weigand, Michael R; Peng, Yanhui; Loparev, Vladimir; Batra, Dhwani; Bowden, Katherine E; Burroughs, Mark; Cassiday, Pamela K; Davis, Jamie K; Johnson, Taccara; Juieng, Phalasy; Knipe, Kristen; Mathis, Marsenia H; Pruitt, Andrea M; Rowe, Lori; Sheth, Mili; Tondella, M Lucia; Williams, Margaret M

    2017-04-15

    Despite high pertussis vaccine coverage, reported cases of whooping cough (pertussis) have increased over the last decade in the United States and other developed countries. Although Bordetella pertussis is well known for its limited gene sequence variation, recent advances in long-read sequencing technology have begun to reveal genomic structural heterogeneity among otherwise indistinguishable isolates, even within geographically or temporally defined epidemics. We have compared rearrangements among complete genome assemblies from 257 B. pertussis isolates to examine the potential evolution of the chromosomal structure in a pathogen with minimal gene nucleotide sequence diversity. Discrete changes in gene order were identified that differentiated genomes from vaccine reference strains and clinical isolates of various genotypes, frequently along phylogenetic boundaries defined by single nucleotide polymorphisms. The observed rearrangements were primarily large inversions centered on the replication origin or terminus and flanked by IS481, a mobile genetic element with >240 copies per genome and previously suspected to mediate rearrangements and deletions by homologous recombination. These data illustrate that structural genome evolution in B. pertussis is not limited to reduction but also includes rearrangement. Therefore, although genomes of clinical isolates are structurally diverse, specific changes in gene order are conserved, perhaps due to positive selection, providing novel information for investigating disease resurgence and molecular epidemiology.IMPORTANCE Whooping cough, primarily caused by Bordetella pertussis, has resurged in the United States even though the coverage with pertussis-containing vaccines remains high. The rise in reported cases has included increased disease rates among all vaccinated age groups, provoking questions about the pathogen's evolution. The chromosome of B. pertussis includes a large number of repetitive mobile genetic

  15. Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences

    Directory of Open Access Journals (Sweden)

    Makiko Suwa

    2011-04-01

    Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.

  16. Structural effects of the Solanum steroids solasodine, diosgenin and solanine on human erythrocytes and molecular models of eukaryotic membranes.

    Science.gov (United States)

    Manrique-Moreno, Marcela; Londoño-Londoño, Julián; Jemioła-Rzemińska, Małgorzata; Strzałka, Kazimierz; Villena, Fernando; Avello, Marcia; Suwalsky, Mario

    2014-01-01

    This report presents evidence that the following Solanum steroids: solasodine, diosgenin and solanine interact with human erythrocytes and molecular models of their membranes as follows: a) X-ray diffraction studies showed that the compounds at low molar ratios (0.1-10.0mol%) induced increasing structural perturbation to dimyristoylphosphatidylcholine bilayers and to a considerable lower extent to those of dimyristoylphosphatidylethanolamine; b) differential scanning calorimetry data showed that the compounds were able to alter the cooperativity of dimyristoylphosphatidylcholine, dimyristoylphosphatidylethanolamine and dimyristoylphosphatidylserine phase transitions in a concentration-dependent manner; c) in the presence of steroids, the fluorescence of Merocyanine 540 incorporated to the membranes decreased suggesting a fluidization of the lipid system; d) scanning electron microscopy observations showed that all steroids altered the normal shape of human erythrocytes inducing mainly echinocytosis, characterized by the formation of blebs in their surfaces, an indication that their molecules are located into the outer monolayer of the erythrocyte membrane. © 2013.

  17. Structural insights into a unique Hsp70-Hsp40 interaction in the eukaryotic ribosome-associated complex.

    Science.gov (United States)

    Weyer, Felix Alexander; Gumiero, Andrea; Gesé, Genís Valentín; Lapouge, Karine; Sinning, Irmgard

    2017-02-01

    Cotranslational chaperones assist de novo folding of nascent polypeptides, prevent them from aggregating and modulate translation. The ribosome-associated complex (RAC) is unique in that the Hsp40 protein Zuo1 and the atypical Hsp70 chaperone Ssz1 form a stable heterodimer, which acts as a cochaperone for the Hsp70 chaperone Ssb. Here we present the structure of the Chaetomium thermophilum RAC core comprising Ssz1 and the Zuo1 N terminus. We show how the conserved allostery of Hsp70 proteins is abolished and this Hsp70-Hsp40 pair is molded into a functional unit. Zuo1 stabilizes Ssz1 in trans through interactions that in canonical Hsp70s occur in cis. Ssz1 is catalytically inert and cannot adopt the closed conformation, but the substrate binding domain β is completed by Zuo1. Our study offers insights into the coupling of a special Hsp70-Hsp40 pair, which evolved to link protein folding and translation.

  18. The Database of Genomic Variants: a curated collection of structural variation in the human genome.

    Science.gov (United States)

    MacDonald, Jeffrey R; Ziman, Robert; Yuen, Ryan K C; Feuk, Lars; Scherer, Stephen W

    2014-01-01

    Over the past decade, the Database of Genomic Variants (DGV; http://dgv.tcag.ca/) has provided a publicly accessible, comprehensive curated catalogue of structural variation (SV) found in the genomes of control individuals from worldwide populations. Here, we describe updates and new features, which have expanded the utility of DGV for both the basic research and clinical diagnostic communities. The current version of DGV consists of 55 published studies, comprising >2.5 million entries identified in >22,300 genomes. Studies included in DGV are selected from the accessioned data sets in the archival SV databases dbVar (NCBI) and DGVa (EBI), and then further curated for accuracy and validity. The core visualization tool (gbrowse) has been upgraded with additional functions to facilitate data analysis and comparison, and a new query tool has been developed to provide flexible and interactive access to the data. The content from DGV is regularly incorporated into other large-scale genome reference databases and represents a standard data resource for new product and database development, in particular for copy number variation testing in clinical labs. The accurate cataloguing of variants in DGV will continue to enable medical genetics and genome sequencing research.

  19. Atypical mitochondrial inheritance patterns in eukaryotes.

    Science.gov (United States)

    Breton, Sophie; Stewart, Donald T

    2015-10-01

    Mitochondrial DNA (mtDNA) is predominantly maternally inherited in eukaryotes. Diverse molecular mechanisms underlying the phenomenon of strict maternal inheritance (SMI) of mtDNA have been described, but the evolutionary forces responsible for its predominance in eukaryotes remain to be elucidated. Exceptions to SMI have been reported in diverse eukaryotic taxa, leading to the prediction that several distinct molecular mechanisms controlling mtDNA transmission are present among the eukaryotes. We propose that these mechanisms will be better understood by studying the deviations from the predominating pattern of SMI. This minireview summarizes studies on eukaryote species with unusual or rare mitochondrial inheritance patterns, i.e., other than the predominant SMI pattern, such as maternal inheritance of stable heteroplasmy, paternal leakage of mtDNA, biparental and strictly paternal inheritance, and doubly uniparental inheritance of mtDNA. The potential genes and mechanisms involved in controlling mitochondrial inheritance in these organisms are discussed. The linkage between mitochondrial inheritance and sex determination is also discussed, given that the atypical systems of mtDNA inheritance examined in this minireview are frequently found in organisms with uncommon sexual systems such as gynodioecy, monoecy, or andromonoecy. The potential of deviations from SMI for facilitating a better understanding of a number of fundamental questions in biology, such as the evolution of mtDNA inheritance, the coevolution of nuclear and mitochondrial genomes, and, perhaps, the role of mitochondria in sex determination, is considerable.

  20. Hierarchical structure analysis describing abnormal base composition of genomes

    Science.gov (United States)

    Ouyang, Zhengqing; Liu, Jian-Kun; She, Zhen-Su

    2005-10-01

    Abnormal base compositional patterns of genomic DNA sequences are studied in the framework of a hierarchical structure (HS) model originally proposed for the study of fully developed turbulence [She and Lévêque, Phys. Rev. Lett. 72, 336 (1994)]. The HS similarity law is verified over scales between 103bp and 105bp , and the HS parameter β is proposed to describe the degree of heterogeneity in the base composition patterns. More than one hundred bacteria, archaea, virus, yeast, and human genome sequences have been analyzed and the results show that the HS analysis efficiently captures abnormal base composition patterns, and the parameter β is a characteristic measure of the genome. Detailed examination of the values of β reveals an intriguing link to the evolutionary events of genetic material transfer. Finally, a sequence complexity (S) measure is proposed to characterize gradual increase of organizational complexity of the genome during the evolution. The present study raises several interesting issues in the evolutionary history of genomes.

  1. Applications of NMR to structure-based drug design in structural genomics.

    Science.gov (United States)

    Powers, Robert

    2002-01-01

    Structural genomics is poised to have a tremendous impact on traditional structure-based drug design programs. As a result, there is a growing need to obtain rapid structural information in a reliable form that is amenable to rational drug design. In this manner, NMR has been expanding and evolving its role in aiding the design process. A variety of NMR methodologies that cover a range of inherent resolution are described in the context of structure-based drug design in the era of structural genomics.

  2. Cryoelectron Microscopic Structures of Eukaryotic Translation Termination Complexes Containing eRF1-eRF3 or eRF1-ABCE1

    Directory of Open Access Journals (Sweden)

    Anne Preis

    2014-07-01

    Full Text Available Termination and ribosome recycling are essential processes in translation. In eukaryotes, a stop codon in the ribosomal A site is decoded by a ternary complex consisting of release factors eRF1 and guanosine triphosphate (GTP-bound eRF3. After GTP hydrolysis, eRF3 dissociates, and ABCE1 can bind to eRF1-loaded ribosomes to stimulate peptide release and ribosomal subunit dissociation. Here, we present cryoelectron microscopic (cryo-EM structures of a pretermination complex containing eRF1-eRF3 and a termination/prerecycling complex containing eRF1-ABCE1. eRF1 undergoes drastic conformational changes: its central domain harboring the catalytically important GGQ loop is either packed against eRF3 or swung toward the peptidyl transferase center when bound to ABCE1. Additionally, in complex with eRF3, the N-terminal domain of eRF1 positions the conserved NIKS motif proximal to the stop codon, supporting its suggested role in decoding, yet it appears to be delocalized in the presence of ABCE1. These results suggest that stop codon decoding and peptide release can be uncoupled during termination.

  3. Comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in prokaryotic, eukaryotic and viral integral membrane proteins of high-resolution structure.

    Science.gov (United States)

    Saidijam, Massoud; Azizpour, Sonia; Patching, Simon G

    2017-02-15

    We report a comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in 235 high-resolution structures of integral membrane proteins. The properties of 1551 transmembrane helices in the structures were compared with those obtained by analysis of the same amino acid sequences using topology prediction tools. Explanations for the 81 (5.2%) missing or additional transmembrane helices in the prediction results were identified. Main reasons for missing transmembrane helices were mis-identification of N-terminal signal peptides, breaks in α-helix conformation or charged residues in the middle of transmembrane helices and transmembrane helices with unusual amino acid composition. The main reason for additional transmembrane helices was mis-identification of amphipathic helices, extramembrane helices or hairpin re-entrant loops. Transmembrane helix length had an overall median of 24 residues and an average of 24.9 ± 7.0 residues and the most common length was 23 residues. The overall content of residues in transmembrane helices as a percentage of the full proteins had a median of 56.8% and an average of 55.7 ± 16.0%. Amino acid composition was analysed for the full proteins, transmembrane helices and extramembrane regions. Individual proteins or types of proteins with transmembrane helices containing extremes in contents of individual amino acids or combinations of amino acids with similar physicochemical properties were identified and linked to structure and/or function. In addition to overall median and average values, all results were analysed for proteins originating from different types of organism (prokaryotic, eukaryotic, viral) and for subgroups of receptors, channels, transporters and others.

  4. The Evolution of Genome Structure by Natural and Sexual Selection.

    Science.gov (United States)

    Kirkpatrick, Mark

    2017-01-01

    Progress on understanding how genome structure evolves is accelerating with the arrival of new genomic, comparative, and theoretical approaches. This article reviews progress in understanding how chromosome inversions and sex chromosomes evolve, and how their evolution affects species' ecology. Analyses of clines in inversion frequencies in flies and mosquitoes imply strong local adaptation, and roles for both over- and under dominant selection. Those results are consistent with the hypothesis that inversions become established when they capture locally adapted alleles. Inversions can carry alleles that are beneficial to closely related species, causing them to introgress following hybridization. Models show that this "adaptive cassette" scenario can trigger large range expansions, as recently happened in malaria mosquitoes. Sex chromosomes are the most rapidly evolving genome regions of some taxa. Sexually antagonistic selection may be the key force driving transitions of sex determination between different pairs of chromosomes and between XY and ZW systems. Fusions between sex-chromosomes and autosomes most often involve the Y chromosome, a pattern that can be explained if fusions are mildly deleterious and fix by drift. Sexually antagonistic selection is one of several hypotheses to explain the recent discovery that the sex determination system has strong effects on the adult sex ratios of tetrapods. The emerging view of how genome structure evolves invokes a much richer constellation of forces than was envisioned during the Golden Age of research on Drosophila karyotypes. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Genetic linkage map of a wild genome: genomic structure, recombination and sexual dimorphism in bighorn sheep

    Science.gov (United States)

    2010-01-01

    Background The construction of genetic linkage maps in free-living populations is a promising tool for the study of evolution. However, such maps are rare because it is difficult to develop both wild pedigrees and corresponding sets of molecular markers that are sufficiently large. We took advantage of two long-term field studies of pedigreed individuals and genomic resources originally developed for domestic sheep (Ovis aries) to construct a linkage map for bighorn sheep, Ovis canadensis. We then assessed variability in genomic structure and recombination rates between bighorn sheep populations and sheep species. Results Bighorn sheep population-specific maps differed slightly in contiguity but were otherwise very similar in terms of genomic structure and recombination rates. The joint analysis of the two pedigrees resulted in a highly contiguous map composed of 247 microsatellite markers distributed along all 26 autosomes and the X chromosome. The map is estimated to cover about 84% of the bighorn sheep genome and contains 240 unique positions spanning a sex-averaged distance of 3051 cM with an average inter-marker distance of 14.3 cM. Marker synteny, order, sex-averaged interval lengths and sex-averaged total map lengths were all very similar between sheep species. However, in contrast to domestic sheep, but consistent with the usual pattern for a placental mammal, recombination rates in bighorn sheep were significantly greater in females than in males (~12% difference), resulting in an autosomal female map of 3166 cM and an autosomal male map of 2831 cM. Despite differing genome-wide patterns of heterochiasmy between the sheep species, sexual dimorphism in recombination rates was correlated between orthologous intervals. Conclusions We have developed a first-generation bighorn sheep linkage map that will facilitate future studies of the genetic architecture of trait variation in this species. While domestication has been hypothesized to be responsible for the

  6. Multiple Origins of Eukaryotic cox15 Suggest Horizontal Gene Transfer from Bacteria to Jakobid Mitochondrial DNA.

    Science.gov (United States)

    He, Ding; Fu, Cheng-Jie; Baldauf, Sandra L

    2016-01-01

    The most gene-rich and bacterial-like mitochondrial genomes known are those of Jakobida (Excavata). Of these, the most extreme example to date is the Andalucia godoyi mitochondrial DNA (mtDNA), including a cox15 gene encoding the respiratory enzyme heme A synthase (HAS), which is nuclear-encoded in nearly all other mitochondriate eukaryotes. Thus cox15 in eukaryotes appears to be a classic example of mitochondrion-to-nucleus (endosymbiotic) gene transfer, with A. godoyi uniquely retaining the ancestral state. However, our analyses reveal two highly distinct HAS types (encoded by cox15-1 and cox15-2 genes) and identify A. godoyi mitochondrial cox15-encoded HAS as type-1 and all other eukaryotic cox15-encoded HAS as type-2. Molecular phylogeny places the two HAS types in widely separated clades with eukaryotic type-2 HAS clustering with the bulk of α-proteobacteria (>670 sequences), whereas A. godoyi type-1 HAS clusters with an eclectic set of bacteria and archaea including two α-proteobacteria missing from the type-2 clade. This wide phylogenetic separation of the two HAS types is reinforced by unique features of their predicted protein structures. Meanwhile, RNA-sequencing and genomic analyses fail to detect either cox15 type in the nuclear genome of any jakobid including A. godoyi. This suggests that not only is cox15-1 a relatively recent acquisition unique to the Andalucia lineage but also the jakobid last common ancestor probably lacked both cox15 types. These results indicate that uptake of foreign genes by mtDNA is more taxonomically widespread than previously thought. They also caution against the assumption that all α-proteobacterial-like features of eukaryotes are ancient remnants of endosymbiosis. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Target Selection and Determination of Function in Structural Genomics

    Science.gov (United States)

    Watson, James D.; Todd, Annabel E.; Bray, James; Laskowski, Roman A.; Edwards, Aled; Joachimiak, Andrzej; Orengo, Christine A.; Thornton, Janet M.

    2011-01-01

    Summary The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question. PMID:12880206

  8. Chromatin structure and evolution in the human genome

    Directory of Open Access Journals (Sweden)

    Dunlop Malcolm G

    2007-05-01

    Full Text Available Abstract Background Evolutionary rates are not constant across the human genome but genes in close proximity have been shown to experience similar levels of divergence and selection. The higher-order organisation of chromosomes has often been invoked to explain such phenomena but previously there has been insufficient data on chromosome structure to investigate this rigorously. Using the results of a recent genome-wide analysis of open and closed human chromatin structures we have investigated the global association between divergence, selection and chromatin structure for the first time. Results In this study we have shown that, paradoxically, synonymous site divergence (dS at non-CpG sites is highest in regions of open chromatin, primarily as a result of an increased number of transitions, while the rates of other traditional measures of mutation (intergenic, intronic and ancient repeat divergence as well as SNP density are highest in closed regions of the genome. Analysis of human-chimpanzee divergence across intron-exon boundaries indicates that although genes in relatively open chromatin generally display little selection at their synonymous sites, those in closed regions show markedly lower divergence at their fourfold degenerate sites than in neighbouring introns and intergenic regions. Exclusion of known Exonic Splice Enhancer hexamers has little affect on the divergence observed at fourfold degenerate sites across chromatin categories; however, we show that closed chromatin is enriched with certain classes of ncRNA genes whose RNA secondary structure may be particularly important. Conclusion We conclude that, overall, non-CpG mutation rates are lowest in open regions of the genome and that regions of the genome with a closed chromatin structure have the highest background mutation rate. This might reflect lower rates of DNA damage or enhanced DNA repair processes in regions of open chromatin. Our results also indicate that dS is a poor

  9. Structural genomics studies of human caries pathogen Streptococcus mutans.

    Science.gov (United States)

    Li, Lanfen; Nan, Jie; Li, Dan; Brostromer, Erik; Wang, Zixi; Liu, Cong; Hou, Qiaoming; Fan, Xuexin; Ye, Zhaoyang; Su, Xiao-Dong

    2014-09-01

    Gram-positive bacterium Streptococcus mutans is the primary causative agent of human dental caries. To better understand this pathogen at the atomic structure level and to establish potential drug and vaccine targets, we have carried out structural genomics research since 2005. To achieve the goal, we have developed various in-house automation systems including novel high-throughput crystallization equipment and methods, based on which a large-scale, high-efficiency and low-cost platform has been establish in our laboratory. From a total of 1,963 annotated open reading frames, 1,391 non-membrane targets were selected prioritized by protein sequence similarities to unknown structures, and clustered by restriction sites to allow for cost-effective high-throughput conventional cloning. Selected proteins were over-expressed in different strains of Escherichia coli. Clones expressed soluble proteins were selected, expanded, and expressed proteins were purified and subjected to crystallization trials. Finally, protein crystals were subjected to X-ray analysis and structures were determined by crystallographic methods. Using the previously established procedures, we have so far obtained more than 200 kinds of protein crystals and 100 kinds of crystal structures involved in different biological pathways. In this paper we demonstrate and review a possibility of performing structural genomics studies at moderate laboratory scale. Furthermore, the techniques and methods developed in our study can be widely applied to conventional structural biology research practice.

  10. Assessing the accuracy of template-based structure prediction metaservers by comparison with structural genomics structures.

    Science.gov (United States)

    Gront, Dominik; Grabowski, Marek; Zimmerman, Matthew D; Raynor, John; Tkaczuk, Karolina L; Minor, Wladek

    2012-12-01

    The explosion of the size of the universe of known protein sequences has stimulated two complementary approaches to structural mapping of these sequences: theoretical structure prediction and experimental determination by structural genomics (SG). In this work, we assess the accuracy of structure prediction by two automated template-based structure prediction metaservers (genesilico.pl and bioinfo.pl) by measuring the structural similarity of the predicted models to corresponding experimental models determined a posteriori. Of 199 targets chosen from SG programs, the metaservers predicted the structures of about a fourth of them "correctly." (In this case, "correct" was defined as placing more than 70 % of the alpha carbon atoms in the model within 2 Å of the experimentally determined positions.) Almost all of the targets that could be modeled to this accuracy were those with an available template in the Protein Data Bank (PDB) with more than 25 % sequence identity. The majority of those SG targets with lower sequence identity to structures in the PDB were not predicted by the metaservers with this accuracy. We also compared metaserver results to CASP8 results, finding that the models obtained by participants in the CASP competition were significantly better than those produced by the metaservers.

  11. Structural genomic variation in childhood epilepsies with complex phenotypes

    DEFF Research Database (Denmark)

    Helbig, Ingo; Swinkels, Marielle E M; Aten, Emmelien

    2014-01-01

    of CNVs in patients with unclassified epilepsies and complex phenotypes. A total of 222 patients from three European countries, including patients with structural lesions on magnetic resonance imaging (MRI), dysmorphic features, and multiple congenital anomalies, were clinically evaluated and screened.......9%). Segregation of all identified variants could be assessed in 42 patients, 11 of which were de novo. The frequency of all structural variants and de novo variants was not statistically different between patients with or without MRI abnormalities or MRI subcategories. Patients with dysmorphic features were more...... likely to carry a rare CNV. Genome-wide screening methods for rare CNVs may provide clues for the genetic etiology in patients with a broader range of epilepsies than previously anticipated, including in patients with various brain anomalies detectable by MRI. Performing genome-wide screens for rare CNVs...

  12. Norovirus translation requires an interaction between the C Terminus of the genome-linked viral protein VPg and eukaryotic translation initiation factor 4G.

    Science.gov (United States)

    Chung, Liliane; Bailey, Dalan; Leen, Eoin N; Emmott, Edward P; Chaudhry, Yasmin; Roberts, Lisa O; Curry, Stephen; Locker, Nicolas; Goodfellow, Ian G

    2014-08-01

    Viruses have evolved a variety of mechanisms to usurp the host cell translation machinery to enable translation of the viral genome in the presence of high levels of cellular mRNAs. Noroviruses, a major cause of gastroenteritis in man, have evolved a mechanism that relies on the interaction of translation initiation factors with the virus-encoded VPg protein covalently linked to the 5' end of the viral RNA. To further characterize this novel mechanism of translation initiation, we have used proteomics to identify the components of the norovirus translation initiation factor complex. This approach revealed that VPg binds directly to the eIF4F complex, with a high affinity interaction occurring between VPg and eIF4G. Mutational analyses indicated that the C-terminal region of VPg is important for the VPg-eIF4G interaction; viruses with mutations that alter or disrupt this interaction are debilitated or non-viable. Our results shed new light on the unusual mechanisms of protein-directed translation initiation. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  13. Comparative Analysis of the 15.5kD Box C/D snoRNP Core Protein in the Primitive Eukaryote Giardia lamblia Reveals Unique Structural and Functional Features

    Energy Technology Data Exchange (ETDEWEB)

    Biswas, Shyamasri; Buhrman, Greg; Gagnon, Keith; Mattos, Carla; Brown, II, Bernard A.; Maxwell, E. Stuart (NCSU); (UTSMC)

    2012-07-11

    Box C/D ribonucleoproteins (RNP) guide the 2'-O-methylation of targeted nucleotides in archaeal and eukaryotic rRNAs. The archaeal L7Ae and eukaryotic 15.5kD box C/D RNP core protein homologues initiate RNP assembly by recognizing kink-turn (K-turn) motifs. The crystal structure of the 15.5kD core protein from the primitive eukaryote Giardia lamblia is described here to a resolution of 1.8 {angstrom}. The Giardia 15.5kD protein exhibits the typical {alpha}-{beta}-{alpha} sandwich fold exhibited by both archaeal L7Ae and eukaryotic 15.5kD proteins. Characteristic of eukaryotic homologues, the Giardia 15.5kD protein binds the K-turn motif but not the variant K-loop motif. The highly conserved residues of loop 9, critical for RNA binding, also exhibit conformations similar to those of the human 15.5kD protein when bound to the K-turn motif. However, comparative sequence analysis indicated a distinct evolutionary position between Archaea and Eukarya. Indeed, assessment of the Giardia 15.5kD protein in denaturing experiments demonstrated an intermediate stability in protein structure when compared with that of the eukaryotic mouse 15.5kD and archaeal Methanocaldococcus jannaschii L7Ae proteins. Most notable was the ability of the Giardia 15.5kD protein to assemble in vitro a catalytically active chimeric box C/D RNP utilizing the archaeal M. jannaschii Nop56/58 and fibrillarin core proteins. In contrast, a catalytically competent chimeric RNP could not be assembled using the mouse 15.5kD protein. Collectively, these analyses suggest that the G. lamblia 15.5kD protein occupies a unique position in the evolution of this box C/D RNP core protein retaining structural and functional features characteristic of both archaeal L7Ae and higher eukaryotic 15.5kD homologues.

  14. Elucidation of Operon Structures across Closely Related Bacterial Genomes

    Science.gov (United States)

    Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components. PMID:24959722

  15. Precambrian Skeletonized Microbial Eukaryotes

    Science.gov (United States)

    Lipps, Jere H.

    2017-04-01

    Skeletal heterotrophic eukaryotes are mostly absent from the Precambrian, although algal eukaryotes appear about 2.2 billion years ago. Tintinnids, radiolaria and foraminifera have molecular origins well back into the Precambrian yet no representatives of these groups are known with certainty in that time. These data infer times of the last common ancestors, not the appearance of true representatives of these groups which may well have diversified or not been preserved since those splits. Previous reports of these groups in the Precambrian are misinterpretations of other objects in the fossil record. Reported tintinnids at 1600 mya from China are metamorphic shards or mineral artifacts, the many specimens from 635-715 mya in Mongolia may be eukaryotes but they are not tintinnids, and the putative tintinnids at 580 mya in the Doushantou formation of China are diagenetic alterations of well-known acritarchs. The oldest supposed foraminiferan is Titanotheca from 550 to 565 mya rocks in South America and Africa is based on the occurrence of rutile in the tests and in a few modern agglutinated foraminifera, as well as the agglutinated tests. Neither of these nor the morphology are characteristic of foraminifera; hence these fossils remain as indeterminate microfossils. Platysolenites, an agglutinated tube identical to the modern foraminiferan Bathysiphon, occurs in the latest Neoproterozoic in Russia, Canada, and the USA (California). Some of the larger fossils occurring in typical Ediacaran (late Neoproterozoic) assemblages may be xenophyophorids (very large foraminifera), but the comparison is disputed and flawed. Radiolaria, on occasion, have been reported in the Precambrian, but the earliest known clearly identifiable ones are in the Cambrian. The only certain Precambrian heterotrophic skeletal eukaryotes (thecamoebians) occur in fresh-water rocks at about 750 mya. Skeletonized radiolaria and foraminifera appear sparsely in the Cambrian and radiate in the Ordovician

  16. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    Directory of Open Access Journals (Sweden)

    Kurita Manabu

    2008-06-01

    Full Text Available Abstract Background The recent determination of complete chloroplast (cp genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and

  17. TOPSAN: a collaborative annotation environment for structural genomics.

    Science.gov (United States)

    Weekes, Dana; Krishna, S Sri; Bakolitsa, Constantina; Wilson, Ian A; Godzik, Adam; Wooley, John

    2010-08-17

    Many protein structures determined in high-throughput structural genomics centers, despite their significant novelty and importance, are available only as PDB depositions and are not accompanied by a peer-reviewed manuscript. Because of this they are not accessible by the standard tools of literature searches, remaining underutilized by the broad biological community. To address this issue we have developed TOPSAN, The Open Protein Structure Annotation Network, a web-based platform that combines the openness of the wiki model with the quality control of scientific communication. TOPSAN enables research collaborations and scientific dialogue among globally distributed participants, the results of which are reviewed by experts and eventually validated by peer review. The immediate goal of TOPSAN is to harness the combined experience, knowledge, and data from such collaborations in order to enhance the impact of the astonishing number and diversity of structures being determined by structural genomics centers and high-throughput structural biology. TOPSAN combines features of automated annotation databases and formal, peer-reviewed scientific research literature, providing an ideal vehicle to bridge a gap between rapidly accumulating data from high-throughput technologies and a much slower pace for its analysis and integration with other, relevant research.

  18. TOPSAN: a collaborative annotation environment for structural genomics

    Directory of Open Access Journals (Sweden)

    Weekes Dana

    2010-08-01

    Full Text Available Abstract Background Many protein structures determined in high-throughput structural genomics centers, despite their significant novelty and importance, are available only as PDB depositions and are not accompanied by a peer-reviewed manuscript. Because of this they are not accessible by the standard tools of literature searches, remaining underutilized by the broad biological community. Results To address this issue we have developed TOPSAN, The Open Protein Structure Annotation Network, a web-based platform that combines the openness of the wiki model with the quality control of scientific communication. TOPSAN enables research collaborations and scientific dialogue among globally distributed participants, the results of which are reviewed by experts and eventually validated by peer review. The immediate goal of TOPSAN is to harness the combined experience, knowledge, and data from such collaborations in order to enhance the impact of the astonishing number and diversity of structures being determined by structural genomics centers and high-throughput structural biology. Conclusions TOPSAN combines features of automated annotation databases and formal, peer-reviewed scientific research literature, providing an ideal vehicle to bridge a gap between rapidly accumulating data from high-throughput technologies and a much slower pace for its analysis and integration with other, relevant research.

  19. The major architects of chromatin: architectural proteins in bacteria, archaea and eukaryotes.

    Science.gov (United States)

    Luijsterburg, Martijn S; White, Malcolm F; van Driel, Roel; Dame, Remus Th

    2008-01-01

    The genomic DNA of all organisms across the three kingdoms of life needs to be compacted and functionally organized. Key players in these processes are DNA supercoiling, macromolecular crowding and architectural proteins that shape DNA by binding to it. The architectural proteins in bacteria, archaea and eukaryotes generally do not exhibit sequence or structural conservation especially across kingdoms. Instead, we propose that they are functionally conserved. Most of these proteins can be classified according to their architectural mode of action: bending, wrapping or bridging DNA. In order for DNA transactions to occur within a compact chromatin context, genome organization cannot be static. Indeed chromosomes are subject to a whole range of remodeling mechanisms. In this review, we discuss the role of (i) DNA supercoiling, (ii) macromolecular crowding and (iii) architectural proteins in genome organization, as well as (iv) mechanisms used to remodel chromosome structure and to modulate genomic activity. We conclude that the underlying mechanisms that shape and remodel genomes are remarkably similar among bacteria, archaea and eukaryotes.

  20. The TB Structural Genomics Consortium: a decade of progress.

    Science.gov (United States)

    Chim, Nicholas; Habel, Jeff E; Johnston, Jodie M; Krieger, Inna; Miallau, Linda; Sankaranarayanan, Ramasamy; Morse, Robert P; Bruning, John; Swanson, Stephanie; Kim, Haelee; Kim, Chang-Yub; Li, Hongye; Bulloch, Esther M; Payne, Richard J; Manos-Turvey, Alexandra; Hung, Li-Wei; Baker, Edward N; Lott, J Shaun; James, Michael N G; Terwilliger, Thomas C; Eisenberg, David S; Sacchettini, James C; Goulding, Celia W

    2011-03-01

    The TB Structural Genomics Consortium is a worldwide organization of collaborators whose mission is the comprehensive structural determination and analyses of Mycobacterium tuberculosis proteins to ultimately aid in tuberculosis diagnosis and treatment. Congruent to the overall vision, Consortium members have additionally established an integrated facilities core to streamline M. tuberculosis structural biology and developed bioinformatics resources for data mining. This review aims to share the latest Consortium developments with the TB community, including recent structures of proteins that play significant roles within M. tuberculosis. Atomic resolution details may unravel mechanistic insights and reveal unique and novel protein features, as well as important protein-protein and protein-ligand interactions, which ultimately lead to a better understanding of M. tuberculosis biology and may be exploited for rational, structure-based therapeutics design. Copyright © 2010 Elsevier Ltd. All rights reserved.

  1. Target selection for structural genomics of infectious diseases.

    Science.gov (United States)

    Yeats, Corin; Dessailly, Benoit H; Glass, Elizabeth M; Fremont, Daved H; Orengo, Christine A

    2014-01-01

    This chapter describes the protocols used to identify, filter, and annotate potential protein targets from an organism associated with infectious diseases. Protocols often combine computational approaches for mining information in public databases or for checking whether the protein has already been targeted for structure determination, with manual strategies that examine the literature for information on the biological role of the protein or the experimental strategies that explore the effects of knocking out the protein. Publicly available computational tools have been cited as much as possible. Where these do not exist, the concepts underlying in-house tools developed for the Center for Structural Genomics of Infectious Diseases have been described.

  2. Anaerobic energy metabolism in unicellular photosynthetic eukaryotes.

    Science.gov (United States)

    Atteia, Ariane; van Lis, Robert; Tielens, Aloysius G M; Martin, William F

    2013-02-01

    Anaerobic metabolic pathways allow unicellular organisms to tolerate or colonize anoxic environments. Over the past ten years, genome sequencing projects have brought a new light on the extent of anaerobic metabolism in eukaryotes. A surprising development has been that free-living unicellular algae capable of photoautotrophic lifestyle are, in terms of their enzymatic repertoire, among the best equipped eukaryotes known when it comes to anaerobic energy metabolism. Some of these algae are marine organisms, common in the oceans, others are more typically soil inhabitants. All these species are important from the ecological (O(2)/CO(2) budget), biotechnological, and evolutionary perspectives. In the unicellular algae surveyed here, mixed-acid type fermentations are widespread while anaerobic respiration, which is more typical of eukaryotic heterotrophs, appears to be rare. The presence of a core anaerobic metabolism among the algae provides insights into its evolutionary origin, which traces to the eukaryote common ancestor. The predicted fermentative enzymes often exhibit an amino acid extension at the N-terminus, suggesting that these proteins might be compartmentalized in the cell, likely in the chloroplast or the mitochondrion. The green algae Chlamydomonas reinhardtii and Chlorella NC64 have the most extended set of fermentative enzymes reported so far. Among the eukaryotes with secondary plastids, the diatom Thalassiosira pseudonana has the most pronounced anaerobic capabilities as yet. From the standpoints of genomic, transcriptomic, and biochemical studies, anaerobic energy metabolism in C. reinhardtii remains the best characterized among photosynthetic protists. This article is part of a Special Issue entitled: The evolutionary aspects of bioenergetic systems. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Fragment Screening of Infectious Disease Targets in a Structural Genomics Environment

    OpenAIRE

    Begley, Darren W; Davies, Douglas R; Hartley, Robert; Edwards, Thomas E; Staker, Bart L; Van Voorhis, Wesley C; Myler, Peter J; Stewart, Lance J

    2011-01-01

    Structural genomics efforts have traditionally focused on generating single protein structures of unique and diverse targets. However, a lone structure for a given target is often insufficient to firmly assign function or to drive drug discovery. As part of the Seattle Structural Genomics Center for Infectious Disease, we seek to expand the focus of structural genomics by elucidating ensembles of structures that examine small molecule-protein interactions for selected infectious disease targe...

  4. Recombination in circulating Human enterovirus B: independent evolution of structural and non-structural genome regions.

    Science.gov (United States)

    Lukashev, Alexander N; Lashkevich, Vasilii A; Ivanova, Olga E; Koroleva, Galina A; Hinkkanen, Ari E; Ilonen, Jorma

    2005-12-01

    The complete nucleotide sequences of eight Human enterovirus B (HEV-B) strains were determined, representing five serotypes, E6, E7, E11, CVB3 and CVB5, which were isolated in the former Soviet Union between 1998 and 2002. All strains were mosaic recombinants and only the VP2-VP3-VP1 genome region was similar to that of the corresponding prototype HEV-B strains. In seven of the eight strains studied, the 2C-3D genome region was most similar to the prototype E30, EV74 and EV75 strains, whilst the remaining strain was most similar to the prototype E1 and E9 strains in the non-structural protein genome region. Most viruses also bore marks of additional recombination events in this part of the genome. In the 5' non-translated region, all strains were more similar to the prototype E9 than to other enteroviruses. In most cases, recombination mapped to the VP4 and 2ABC genome regions. This, together with the star-like topology of the phylogenetic trees for these genome regions, identified these genome parts as recombination hot spots. These findings further support the concept of independent evolution of enterovirus genome fragments and indicate a requirement for more advanced typing approaches. A range of available phylogenetic methods was also compared for efficient detection of recombination in enteroviruses.

  5. TSTMP: target selection for structural genomics of human transmembrane proteins.

    Science.gov (United States)

    Varga, Julia; Dobson, László; Reményi, István; Tusnády, Gábor E

    2017-01-04

    The TSTMP database is designed to help the target selection of human transmembrane proteins for structural genomics projects and structure modeling studies. Currently, there are only 60 known 3D structures among the polytopic human transmembrane proteins and about a further 600 could be modeled using existing structures. Although there are a great number of human transmembrane protein structures left to be determined, surprisingly only a small fraction of these proteins have 'selected' (or above) status according to the current version the TargetDB/TargetTrack database. This figure is even worse regarding those transmembrane proteins that would contribute the most to the structural coverage of the human transmembrane proteome. The database was built by sorting out proteins from the human transmembrane proteome with known structure and searching for suitable model structures for the remaining proteins by combining the results of a state-of-the-art transmembrane specific fold recognition algorithm and a sequence similarity search algorithm. Proteins were searched for homologues among the human transmembrane proteins in order to select targets whose successful structure determination would lead to the best structural coverage of the human transmembrane proteome. The pipeline constructed for creating the TSTMP database guarantees to keep the database up-to-date. The database is available at http://tstmp.enzim.ttk.mta.hu. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline

    Science.gov (United States)

    Lesley, Scott A.; Kuhn, Peter; Godzik, Adam; Deacon, Ashley M.; Mathews, Irimpan; Kreusch, Andreas; Spraggon, Glen; Klock, Heath E.; McMullan, Daniel; Shin, Tanya; Vincent, Juli; Robb, Alyssa; Brinen, Linda S.; Miller, Mitchell D.; McPhillips, Timothy M.; Miller, Mark A.; Scheibe, Daniel; Canaves, Jaume M.; Guda, Chittibabu; Jaroszewski, Lukasz; Selby, Thomas L.; Elsliger, Marc-Andre; Wooley, John; Taylor, Susan S.; Hodgson, Keith O.; Wilson, Ian A.; Schultz, Peter G.; Stevens, Raymond C.

    2002-01-01

    Structural genomics is emerging as a principal approach to define protein structure–function relationships. To apply this approach on a genomic scale, novel methods and technologies must be developed to determine large numbers of structures. We describe the design and implementation of a high-throughput structural genomics pipeline and its application to the proteome of the thermophilic bacterium Thermotoga maritima. By using this pipeline, we successfully cloned and attempted expression of 1,376 of the predicted 1,877 genes (73%) and have identified crystallization conditions for 432 proteins, comprising 23% of the T. maritima proteome. Representative structures from TM0423 glycerol dehydrogenase and TM0449 thymidylate synthase-complementing protein are presented as examples of final outputs from the pipeline. PMID:12193646

  7. Meet me halfway: when genomics meets structural bioinformatics.

    Science.gov (United States)

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function.

  8. Coverage of protein sequence space by current structural genomics targets.

    Science.gov (United States)

    O'Toole, Nicholas; Raymond, Stéphane; Cygler, Miroslaw

    2003-01-01

    By its purest definition the ultimate goal of structural genomics (SG) is the determination of the structures of all proteins encoded by genomes. Most of these will be obtained by homology modeling using the structures of a set of target proteins for experimental determination. Thanks to the open exchange of SG target information, we are able to analyze the sequences of the current target list to evaluate the extent of its coverage of protein sequence space. The presence of homologous sequences currently either in the Protein Data Bank (PDB) or among SG targets has been determined for each of the protein sequences in several organisms. In this way we are able to evaluate the coverage by existing or targeted structural data for the non-membranous parts of entire proteomes. For small bacterial proteomes such as that of H. influenzae almost all proteins have homologous sequences among SG targets or in the PDB. There is significantly lower coverage for more complex organisms, such as C. elegans. We have mapped the SG target list onto the ProtoMap clustering of protein sequences. Clusters occupied by SG targets represent over 150,000 protein sequences, which is approximately 44% of the total protein sequences classified by ProtoMap. The mapping of SG targets also enables an evaluation of the degree of overlap within the target list. An SG target typically occupies a ProtoMap cluster with more than six other homologous targets.

  9. The Seattle Structural Genomics Center for Infectious Disease (SSGCID).

    Science.gov (United States)

    Myler, P J; Stacy, R; Stewart, L; Staker, B L; Van Voorhis, W C; Varani, G; Buchko, G W

    2009-11-01

    The NIAID-funded Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium established to apply structural genomics approaches to potential drug targets from NIAID priority organisms for biodefense and emerging and re-emerging diseases. The mission of the SSGCID is to determine approximately 400 protein structures over the next five years. In order to maximize biomedical impact, ligand-based drug-lead discovery campaigns will be pursued for a small number of high-impact targets. Here we review the center's target selection processes, which include pro-active engagement of the infectious disease research and drug therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. This is followed by a brief overview of the SSGCID structure determination pipeline and ligand screening methodology. Finally, specifics of our resources available to the scientific community are presented. Physical materials and data produced by SSGCID will be made available to the scientific community, with the aim that they will provide essential groundwork benefiting future research and drug discovery.

  10. Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo.

    Science.gov (United States)

    Ritchey, Laura E; Su, Zhao; Tang, Yin; Tack, David C; Assmann, Sarah M; Bevilacqua, Philip C

    2017-08-21

    RNA serves many functions in biology such as splicing, temperature sensing, and innate immunity. These functions are often determined by the structure of RNA. There is thus a pressing need to understand RNA structure and how it changes during diverse biological processes both in vivo and genome-wide. Here, we present Structure-seq2, which provides nucleotide-resolution RNA structural information in vivo and genome-wide. This optimized version of our original Structure-seq method increases sensitivity by at least 4-fold and improves data quality by minimizing formation of a deleterious by-product, reducing ligation bias, and improving read coverage. We also present a variation of Structure-seq2 in which a biotinylated nucleotide is incorporated during reverse transcription, which greatly facilitates the protocol by eliminating two PAGE purification steps. We benchmark Structure-seq2 on both mRNA and rRNA structure in rice (Oryza sativa). We demonstrate that Structure-seq2 can lead to new biological insights. Our Structure-seq2 datasets uncover hidden breaks in chloroplast rRNA and identify a previously unreported N1-methyladenosine (m1A) in a nuclear-encoded Oryza sativa rRNA. Overall, Structure-seq2 is a rapid, sensitive, and unbiased method to probe RNA in vivo and genome-wide that facilitates new insights into RNA biology. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. The impact of structural genomics: the first quindecennial.

    Science.gov (United States)

    Grabowski, Marek; Niedzialkowska, Ewa; Zimmerman, Matthew D; Minor, Wladek

    2016-03-01

    The period 2000-2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives alone have produced over 2000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research.

  12. Refining the structure and content of clinical genomic reports.

    Science.gov (United States)

    Dorschner, Michael O; Amendola, Laura M; Shirts, Brian H; Kiedrowski, Lesli; Salama, Joseph; Gordon, Adam S; Fullerton, Stephanie M; Tarczy-Hornoch, Peter; Byers, Peter H; Jarvik, Gail P

    2014-03-01

    To effectively articulate the results of exome and genome sequencing we refined the structure and content of molecular test reports. To communicate results of a randomized control trial aimed at the evaluation of exome sequencing for clinical medicine, we developed a structured narrative report. With feedback from genetics and non-genetics professionals, we developed separate indication-specific and incidental findings reports. Standard test report elements were supplemented with research study-specific language, which highlighted the limitations of exome sequencing and provided detailed, structured results, and interpretations. The report format we developed to communicate research results can easily be transformed for clinical use by removal of research-specific statements and disclaimers. The development of clinical reports for exome sequencing has shown that accurate and open communication between the clinician and laboratory is ideally an ongoing process to address the increasing complexity of molecular genetic testing. © 2014 Wiley Periodicals, Inc.

  13. Macromolecular structure determination in the post-genome era

    CERN Document Server

    Kuhn, P

    2001-01-01

    Recent advances in genetics, molecular biology and crystallographic instrumentation and methodology have led to a revolution in the field of Structural Molecular Biology (SMB). These combined advances have paved the way to a more complete and detailed understanding of the biological macromolecules that make up an organism, both in terms of their individual functions and also the interactions between them. In this paper we describe a large-scale, genomic approach to the three-dimensional structure determination of macromolecules and their complexes, using high-throughput methodology to streamline all aspects of the process. This task requires the development of automated high-intensity synchrotron beam lines for X-ray diffraction data collection from single crystal samples. Furthermore, these beam lines must be operated within a sophisticated software and hardware environment, which is capable of delivering a completely automated structure determination pipeline. The SMB resource at SSRL is developing a system...

  14. Solution Structure of Archaeoglobus fulgidis Peptidyl-tRNA Hydrolase(Pth2) Provides Evidence for an Extensive Conserved Family of Pth2 Enzymes in Archaea, Bacteria and Eukaryotes.

    Energy Technology Data Exchange (ETDEWEB)

    Powers, Robert; Mirkovic, Nebojsa; Goldsmith-Fischman, Sharon; Acton, Thomas; Chiang, Yiwen; Huang, Yuanpeng; Ma, LiChung; Rajan, Paranji K.; Cort, John R.; Kennedy, Michael A.; Liu, Jinfeng; Rost, Burkhard; Honig, Barry; Murray, Diana; Montelione, Gaetano

    2005-11-01

    The solution structure of protein AF2095 from the thermophilic archaea Archaeglobus fulgidis, a 123-residue (13.6 kDa) protein, has been determined by NMR methods. The structure of AF2095 is comprised of four a-helices and a mixed b-sheet consisting of four parallel and anti-parallel b-strands, where the a-helices sandwich the b-sheet. Sequence and structural comparison of AF2095 with proteins from Homo sapiens, Methanocaldococcus jannaschii and Sulfolobus solfataricus, reveals that AF2095 is a peptidyl-tRNA hydrolase (Pth2). This structural comparison also identifies putative catalytic residues and a tRNA interaction region for AF2095. The structure of AF2095 is also similar to the structure of protein TA0108 from archaea Thermoplasma acidophilum, which is deposited in the Protein Database but not functionally annotated. The NMR structure of AF2095 has been further leveraged to obtain good quality structural models for 55 other proteins. Although earlier studies have proposed that the Pth2 protein family is restricted to archeal and eukaryotic organisms, the similarity of the AF2095 structure to human Pth2, the conservation of key active-site residues, and the good quality of the resulting homology models demonstrate a large family of homologous Pth2 proteins that are conserved in eukaryotic, archaeal and bacterial organisms, providing novel insights in the evolution of the Pth and Pth2 enzyme families.

  15. From DNA Sequences to Chemical Structures – Methods for Mining Microbial Genomic and Metagenomic Data Sets for New Natural Products

    Directory of Open Access Journals (Sweden)

    Jurica Zucko

    2010-01-01

    Full Text Available Rapid mining of large genomic and metagenomic data sets for modular polyketide synthases, non-ribosomal peptide synthetases and hybrid polyketide synthase/non-ribosomal peptide synthetase biosynthetic gene clusters has been achieved using the generic computer program packages ClustScan and CompGen. These program packages perform the annotation with the hierarchical structuring into polypeptides, modules and domains, as well as storage and graphical presentations of the data. This aims to achieve the most accurate predictions of the activities and specificities of catalytically active domains that can be made with present knowledge, leading to a prediction of the most likely chemical structures produced by these enzymes. The program packages also allow generation of novel clusters by homologous recombination of the annotated genes in silico. ClustScan and CompGen were used to construct a custom database of known compounds (CSDB and of predicted entirely novel recombinant products (r-CSDB that can be used for in silico screening with computer aided drug design technology. The use of these programs has been exemplified by analysing genomic sequences from terrestrial prokaryotes and eukaryotic microorganisms, a marine metagenomic data set and a newly discovered example of a 'shared metabolic pathway' in marine-microbial endosymbiosis.

  16. The role of chromatin insulators in nuclear architecture and genome function

    Science.gov (United States)

    Van Bortle, Kevin; Corces, Victor G.

    2013-01-01

    Eukaryotic genomes are intricately arranged into highly organized yet dynamic structures that underlie patterns of gene expression and cellular identity. The recent adaptation of novel genomic strategies for assaying nuclear architecture has significantly extended and accelerated our ability to query the nature of genome organization and the players involved. In particular, recent explorations of physical arrangements and chromatin landscapes in higher eukaryotes have demonstrated that chromatin insulators, which mediate functional interactions between regulatory elements, appear to play an important role in these processes. Here we reflect on current findings and our rapidly expanding understanding of insulators and their role in nuclear architecture and genome function. PMID:23298659

  17. Progress of structural genomics initiatives: an analysis of solved target structures.

    Science.gov (United States)

    Todd, Annabel E; Marsden, Russell L; Thornton, Janet M; Orengo, Christine A

    2005-05-20

    The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the

  18. WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes

    Science.gov (United States)

    Hoff, Katharina J.; Stanke, Mario

    2013-01-01

    The prediction of protein coding genes is an important step in the annotation of newly sequenced and assembled genomes. AUGUSTUS is one of the most accurate tools for eukaryotic gene prediction. Here, we present WebAUGUSTUS, a web interface for training AUGUSTUS and predicting genes with AUGUSTUS. Depending on the needs of the user, WebAUGUSTUS generates training gene structures automatically. Besides a genome file, either a file with expressed sequence tags or a file with protein sequences is required for this step. Alternatively, it is possible to submit an externally generated training gene structure file and a genome file. The web service optimizes AUGUSTUS parameters and predicts genes with those parameters. WebAUGUSTUS is available at http://bioinf.uni-greifswald.de/webaugustus. PMID:23700307

  19. WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes.

    Science.gov (United States)

    Hoff, Katharina J; Stanke, Mario

    2013-07-01

    The prediction of protein coding genes is an important step in the annotation of newly sequenced and assembled genomes. AUGUSTUS is one of the most accurate tools for eukaryotic gene prediction. Here, we present WebAUGUSTUS, a web interface for training AUGUSTUS and predicting genes with AUGUSTUS. Depending on the needs of the user, WebAUGUSTUS generates training gene structures automatically. Besides a genome file, either a file with expressed sequence tags or a file with protein sequences is required for this step. Alternatively, it is possible to submit an externally generated training gene structure file and a genome file. The web service optimizes AUGUSTUS parameters and predicts genes with those parameters. WebAUGUSTUS is available at http://bioinf.uni-greifswald.de/webaugustus.

  20. Endosymbiotic theories for eukaryote origin.

    Science.gov (United States)

    Martin, William F; Garg, Sriram; Zimorski, Verena

    2015-09-26

    For over 100 years, endosymbiotic theories have figured in thoughts about the differences between prokaryotic and eukaryotic cells. More than 20 different versions of endosymbiotic theory have been presented in the literature to explain the origin of eukaryotes and their mitochondria. Very few of those models account for eukaryotic anaerobes. The role of energy and the energetic constraints that prokaryotic cell organization placed on evolutionary innovation in cell history has recently come to bear on endosymbiotic theory. Only cells that possessed mitochondria had the bioenergetic means to attain eukaryotic cell complexity, which is why there are no true intermediates in the prokaryote-to-eukaryote transition. Current versions of endosymbiotic theory have it that the host was an archaeon (an archaebacterium), not a eukaryote. Hence the evolutionary history and biology of archaea increasingly comes to bear on eukaryotic origins, more than ever before. Here, we have compiled a survey of endosymbiotic theories for the origin of eukaryotes and mitochondria, and for the origin of the eukaryotic nucleus, summarizing the essentials of each and contrasting some of their predictions to the observations. A new aspect of endosymbiosis in eukaryote evolution comes into focus from these considerations: the host for the origin of plastids was a facultative anaerobe. © 2015 The Authors.

  1. Endosymbiotic theories for eukaryote origin

    Science.gov (United States)

    Martin, William F.; Garg, Sriram; Zimorski, Verena

    2015-01-01

    For over 100 years, endosymbiotic theories have figured in thoughts about the differences between prokaryotic and eukaryotic cells. More than 20 different versions of endosymbiotic theory have been presented in the literature to explain the origin of eukaryotes and their mitochondria. Very few of those models account for eukaryotic anaerobes. The role of energy and the energetic constraints that prokaryotic cell organization placed on evolutionary innovation in cell history has recently come to bear on endosymbiotic theory. Only cells that possessed mitochondria had the bioenergetic means to attain eukaryotic cell complexity, which is why there are no true intermediates in the prokaryote-to-eukaryote transition. Current versions of endosymbiotic theory have it that the host was an archaeon (an archaebacterium), not a eukaryote. Hence the evolutionary history and biology of archaea increasingly comes to bear on eukaryotic origins, more than ever before. Here, we have compiled a survey of endosymbiotic theories for the origin of eukaryotes and mitochondria, and for the origin of the eukaryotic nucleus, summarizing the essentials of each and contrasting some of their predictions to the observations. A new aspect of endosymbiosis in eukaryote evolution comes into focus from these considerations: the host for the origin of plastids was a facultative anaerobe. PMID:26323761

  2. Mitochondrion-related organelles in eukaryotic protists.

    Science.gov (United States)

    Shiflett, April M; Johnson, Patricia J

    2010-01-01

    The discovery of mitochondrion-type genes in organisms thought to lack mitochondria led to the demonstration that hydrogenosomes share a common ancestry with mitochondria, as well as the discovery of mitosomes in multiple eukaryotic lineages. No examples of examined eukaryotes lacking a mitochondrion-related organelle exist, implying that the endosymbiont that gave rise to the mitochondrion was present in the first eukaryote. These organelles, known as hydrogenosomes, mitosomes, or mitochondrion-like organelles, are typically reduced, both structurally and biochemically, relative to classical mitochondria. However, despite their diversification and adaptation to different niches, all appear to play a role in Fe-S cluster assembly, as observed for mitochondria. Although evidence supports the use of common protein targeting mechanisms in the biogenesis of these diverse organelles, divergent features are also apparent. This review examines the metabolism and biogenesis of these organelles in divergent unicellular microbes, with a focus on parasitic protists.

  3. Patenting nonassociated polymeric structures (NAPS): implications for structural genomic data release.

    Science.gov (United States)

    Sung, Lawrence M

    2003-01-01

    The intellectual property laws that govern patent rights should provide a reasonable balance between the competing concerns of open access and exclusivity. Open access can facilitate knowledge dissemination and collaboration in furthering science. On the other hand, exclusivity can ensure interest and financial investment in scientific research and development. In recent days, the appropriate balance between open access and exclusivity has been a focus of public debate, particularly with regard to genomic inventions and their applications. In seeking to reconcile the timing of structural genomic data release with certain efforts to secure intellectual property rights, the International Structural Genomics Organisation joins others confronting this controversy. This paper seeks to inform the discussion with an overview of the U.S. standards for patenting nonassociated polymeric structures (NAPS), which include polynucleotides or polypeptides of unknown biological significance, and their corresponding structural data. In the United States, the present ability to obtain patent rights to these discoveries appears problematic given the requirement of specific, substantial and credible utility, among other things. Without demonstrable utility, NAPS and NAPS-related data likely will not be entitled to patent protection, whether the U.S. Patent & Trademark Office rejects NAPS claims as unpatentable in the first instance, or the U.S. federal courts invalidate NAPS claims in later patent litigation. As such, the improbability of obtaining enforceable patent rights to NAPS might undermine the rationale for delaying structural genomic data release to allow for the filing of patent applications in this regard.

  4. Backbone Solution Structures of Proteins Using Residual Dipolar Couplings: Application to a Novel Structural Genomics Target

    Science.gov (United States)

    Valafar, H.; Mayer, K. L.; Bougault, C. M.; LeBlond, P. D.; Jenney, F. E.; Brereton, P. S.; Adams, M.W.W.; Prestegard, J.H.

    2006-01-01

    Structural genomics (or proteomics) activities are critically dependent on the availability of high-throughput structure determination methodology. Development of such methodology has been a particular challenge for NMR based structure determination because of the demands for isotopic labeling of proteins and the requirements for very long data acquisition times. We present here a methodology that gains efficiency from a focus on determination of backbone structures of proteins as opposed to full structures with all side chains in place. This focus is appropriate given the presumption that many protein structures in the future will be built using computational methods that start from representative fold family structures and replace as many as 70% of the side chains in the course of structure determination. The methodology we present is based primarily on residual dipolar couplings (RDCs), readily accessible NMR observables that constrain the orientation of backbone fragments irrespective of separation in space. A new software tool is described for the assembly of backbone fragments under RDC constraints and an application to a structural genomics target is presented. The target is an 8.7 kDa protein from Pyrococcus furiosus, PF1061, that was previously not well annotated, and had a nearest structurally characterized neighbor with only 33% sequence identity. The structure produced shows structural similarity to this sequence homologue, but also shows similarity to other proteins that suggests a functional role in sulfur transfer. Given the backbone structure and a possible functional link this should be an ideal target for development of modeling methods. PMID:15704012

  5. Unicellular eukaryotes as models in cell and molecular biology: critical appraisal of their past and future value.

    Science.gov (United States)

    Simon, Martin; Plattner, Helmut

    2014-01-01

    Unicellular eukaryotes have been appreciated as model systems for the analysis of crucial questions in cell and molecular biology. This includes Dictyostelium (chemotaxis, amoeboid movement, phagocytosis), Tetrahymena (telomere structure, telomerase function), Paramecium (variant surface antigens, exocytosis, phagocytosis cycle) or both ciliates (ciliary beat regulation, surface pattern formation), Chlamydomonas (flagellar biogenesis and beat), and yeast (S. cerevisiae) for innumerable aspects. Nowadays many problems may be tackled with "higher" eukaryotic/metazoan cells for which full genomic information as well as domain databases, etc., were available long before protozoa. Established molecular tools, commercial antibodies, and established pharmacology are additional advantages available for higher eukaryotic cells. Moreover, an increasing number of inherited genetic disturbances in humans have become elucidated and can serve as new models. Among lower eukaryotes, yeast will remain a standard model because of its peculiarities, including its reduced genome and availability in the haploid form. But do protists still have a future as models? This touches not only the basic understanding of biology but also practical aspects of research, such as fund raising. As we try to scrutinize, due to specific advantages some protozoa should and will remain favorable models for analyzing novel genes or specific aspects of cell structure and function. Outstanding examples are epigenetic phenomena-a field of rising interest. © 2014 Elsevier Inc. All rights reserved.

  6. The complete mitochondrial genome structure of snow leopard Panthera uncia.

    Science.gov (United States)

    Wei, Lei; Wu, Xiaobing; Jiang, Zhigang

    2009-05-01

    The complete mitochondrial genome (mtDNA) of snow leopard Panthera uncia was obtained by using the polymerase chain reaction (PCR) technique based on the PCR fragments of 30 primers we designed. The entire mtDNA sequence was 16 773 base pairs (bp) in length, and the base composition was: A-5,357 bp (31.9%); C-4,444 bp (26.5%); G-2,428 bp (14.5%); T-4,544 bp (27.1%). The structural characteristics [0] of the P. uncia mitochondrial genome were highly similar to these of Felis catus, Acinonyx jubatus, Neofelis nebulosa and other mammals. However, we found several distinctive features of the mitochondrial genome of Panthera unica. First, the termination codon of COIII was TAA, which differed from those of F. catus, A. jubatus and N. nebulosa. Second, tRNA(Ser) ((AGY)), which lacked the ''DHU'' arm, could not be folded into the typical cloverleaf-shaped structure. Third, in the control region, a long repetitive sequence in RS-2 (32 bp) region was found with 2 repeats while one short repetitive segment (9 bp) was found with 15 repeats in the RS-3 region. We performed phylogenetic analysis based on a 3 816 bp concatenated sequence of 12S rRNA, 16S rRNA, ND2, ND4, ND5, Cyt b and ATP8 for P. uncia and other related species, the result indicated that P. uncia and P. leo were the sister species, which was different from the previous findings.

  7. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  8. Endosymbiotic gene transfer from prokaryotic pangenomes: Inherited chimerism in eukaryotes.

    Science.gov (United States)

    Ku, Chuan; Nelson-Sathi, Shijulal; Roettger, Mayo; Garg, Sriram; Hazkani-Covo, Einat; Martin, William F

    2015-08-18

    Endosymbiotic theory in eukaryotic-cell evolution rests upon a foundation of three cornerstone partners--the plastid (a cyanobacterium), the mitochondrion (a proteobacterium), and its host (an archaeon)--and carries a corollary that, over time, the majority of genes once present in the organelle genomes were relinquished to the chromosomes of the host (endosymbiotic gene transfer). However, notwithstanding eukaryote-specific gene inventions, single-gene phylogenies have never traced eukaryotic genes to three single prokaryotic sources, an issue that hinges crucially upon factors influencing phylogenetic inference. In the age of genomes, single-gene trees, once used to test the predictions of endosymbiotic theory, now spawn new theories that stand to eventually replace endosymbiotic theory with descriptive, gene tree-based variants featuring supernumerary symbionts: prokaryotic partners distinct from the cornerstone trio and whose existence is inferred solely from single-gene trees. We reason that the endosymbiotic ancestors of mitochondria and chloroplasts brought into the eukaryotic--and plant and algal--lineage a genome-sized sample of genes from the proteobacterial and cyanobacterial pangenomes of their respective day and that, even if molecular phylogeny were artifact-free, sampling prokaryotic pangenomes through endosymbiotic gene transfer would lead to inherited chimerism. Recombination in prokaryotes (transduction, conjugation, transformation) differs from recombination in eukaryotes (sex). Prokaryotic recombination leads to pangenomes, and eukaryotic recombination leads to vertical inheritance. Viewed from the perspective of endosymbiotic theory, the critical transition at the eukaryote origin that allowed escape from Muller's ratchet--the origin of eukaryotic recombination, or sex--might have required surprisingly little evolutionary innovation.

  9. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  10. Identification of genomic indels and structural variations using split reads

    Directory of Open Access Journals (Sweden)

    Urban Alexander E

    2011-07-01

    Full Text Available Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC, a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read. All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions. A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models. This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions. We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole

  11. Applications of Recombinant DNA Technology in Gastrointestinal Medicine and Hepatology: Basic Paradigms of Molecular Cell Biology. Part A: Eukaryotic Gene Structure and DNA Replication

    Directory of Open Access Journals (Sweden)

    Gary E Wild

    2000-01-01

    Full Text Available Progress in the basic sciences of cell and molecular biology has provided an exciting dimension that has translated into clinically relevant information in every medical subspecialty. Importantly, the application of recombinant DNA technology has played a major role in unravelling the intricacies related to the molecular pathophysiology of disease. This series of review articles constitutes a framework for the integration of the database of new information into the core knowledge base of concepts related to the pathogenesis of gastrointestinal disorders and liver disease. The goal of this series of three articles is to review the basic principles of eukaryotic gene expression. The first article examines the role of DNA in directing the flow of genetic information in eukaryotic cells.

  12. Structural genomics target selection for the New York consortium on membrane protein structure.

    Science.gov (United States)

    Punta, Marco; Love, James; Handelman, Samuel; Hunt, John F; Shapiro, Lawrence; Hendrickson, Wayne A; Rost, Burkhard

    2009-12-01

    The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space.

  13. Identification of two Penelope-like elements with different structures and chromosome localization in kuruma shrimp genome.

    Science.gov (United States)

    Koyama, Takashi; Kondo, Hidehiro; Aoki, Takashi; Hirono, Ikuo

    2013-02-01

    Penelope, originally found as a key element responsible for the hybrid dysgenesis in Drosophila virilis, has been widely conserved throughout eukaryotic genomes. In other organisms, they are often referred to as Penelope-like elements or PLEs. In this study, we found two types of PLEs, designated MjPLE01 and MjPLE02, from kuruma shrimp, Marsupenaeus japonicus. There was no observed nucleotide similarity between MjPLE01 and 02, and both elements differed from each other in terms of their structure; MjPLE02 has a distinctive endonuclease (EN) domain at the C-terminus while MjPLE01 do not. A phylogenetic tree that includes publicly available PLEs and TERTs showed that MjPLE01 and 02 were closely related to Coprina elements, which have been reported as an EN-deficient PLE, and to Penelope-Poseidon group, which possess an EN domain, respectively. Genomic Southern blot analysis using MjPLE01 as a probe showed several multiple bands that differ among individual shrimps. On the other hand, two major identical bands were observed when MjPLE02 was used. Colony hybridization showed co-localization of MjPLE01 and GGTTA repeats, suggesting that MjPLE01 might be prevalent in subtelomeric regions of kuruma shrimp genome. These results suggest that the kuruma shrimp genome has at least two types of PLEs with different domain compositions, phylogenetic positions, and probably chromosomeal localization. Such distinctive types of PLEs in an organism have never been described and hence could be a potential source to understand how multiple PLE types evolved.

  14. Structure of the archaeal Kae1/Bud32 fusion protein MJ1130: a model for the eukaryotic EKC/KEOPS subcomplex.

    Science.gov (United States)

    Hecker, Arnaud; Lopreiato, Raffaele; Graille, Marc; Collinet, Bruno; Forterre, Patrick; Libri, Domenico; van Tilbeurgh, Herman

    2008-09-03

    The EKC/KEOPS yeast complex is involved in telomere maintenance and transcription. The Bud32p and kinase-associated endopeptidase 1 (Kaelp) components of the complex are totally conserved in eukarya and archaea. Their genes are fused in several archaeal genomes, suggesting that they physically interact. We report here the structure of the Methanocaldococcus jannaschii Kae1/Bud32 fusion protein MJ1130. Kae1 is an iron protein with an ASKHA fold and Bud32 is an atypical small RIO-type kinase. The structure MJ1130 suggests that association with Kae1 maintains the Bud32 kinase in an inactive state. We indeed show that yeast Kae1p represses the kinase activity of yeast Bud32p. Extensive conserved interactions between MjKae1 and MjBud32 suggest that Kae1p and Bud32p directly interact in both yeast and archaea. Mutations that disrupt the Kae1p/Bud32p interaction in the context of the yeast complex have dramatic effects in vivo and in vitro, similar to those observed with deletion mutations of the respective components. Direct interaction between Kae1p and Bud32p in yeast is required both for the transcription and the telomere homeostasis function of EKC/KEOPS.

  15. Eukaryotic DNA Replication Fork.

    Science.gov (United States)

    Burgers, Peter M J; Kunkel, Thomas A

    2017-06-20

    This review focuses on the biogenesis and composition of the eukaryotic DNA replication fork, with an emphasis on the enzymes that synthesize DNA and repair discontinuities on the lagging strand of the replication fork. Physical and genetic methodologies aimed at understanding these processes are discussed. The preponderance of evidence supports a model in which DNA polymerase ε (Pol ε) carries out the bulk of leading strand DNA synthesis at an undisturbed replication fork. DNA polymerases α and δ carry out the initiation of Okazaki fragment synthesis and its elongation and maturation, respectively. This review also discusses alternative proposals, including cellular processes during which alternative forks may be utilized, and new biochemical studies with purified proteins that are aimed at reconstituting leading and lagging strand DNA synthesis separately and as an integrated replication fork.

  16. Use of the Operon Structure of the C. elegans Genome as a Tool to Identify Functionally Related Proteins

    Directory of Open Access Journals (Sweden)

    Silvia Dossena

    2013-12-01

    Full Text Available One of the most pressing challenges in the post genomic era is the identification and characterization of protein-protein interactions (PPIs, as these are essential in understanding the cellular physiology of health and disease. Experimental techniques suitable for characterizing PPIs (X-ray crystallography or nuclear magnetic resonance spectroscopy, among others are usually laborious, time-consuming and often difficult to apply to membrane proteins, and therefore require accurate prediction of the candidate interacting partners. High-throughput experimental methods (yeast two-hybrid and affinity purification succumb to the same shortcomings, and can also lead to high rates of false positive and negative results. Therefore, reliable tools for predicting PPIs are needed. The use of the operon structure in the eukaryote Caenorhabditis elegans genome is a valuable, though underserved, tool for identifying physically or functionally interacting proteins. Based on the concept that genes organized in the same operon may encode physically or functionally related proteins, this algorithm is easy to be applied and, importantly, gives a limited number of candidate partners of a given protein, allowing for focused experimental verification. Moreover, this approach can be successfully used to predict PPIs in the human system, including those of membrane proteins.

  17. Schistosoma comparative genomics: integrating genome structure, parasite biology and anthelmintic discovery

    Science.gov (United States)

    Swain, Martin T.; Larkin, Denis M.; Caffrey, Conor R.; Davies, Stephen J.; Loukas, Alex; Skelly, Patrick J.; Hoffmann, Karl F.

    2011-01-01

    Schistosoma genomes provide a comprehensive resource for identifying the molecular processes that shape parasite evolution and for discovering novel chemotherapeutic or immunoprophylactic targets. Here, we demonstrate how intra- and intergenus comparative genomics can be used to drive these investigations forward, illustrate the advantages and limitations of these approaches and review how post genomic technologies offer complementary strategies for genome characterisation. While sequencing and functional characterisation of other schistosome/platyhelminth genomes continues to expedite anthelmintic discovery, we contend that future priorities should equally focus on improving assembly quality, and chromosomal assignment, of existing schistosome/platyhelminth genomes. PMID:22024648

  18. Genetic Structure and Distribution of the Colibactin Genomic Island among Members of the Family Enterobacteriaceae▿ †

    Science.gov (United States)

    Putze, Johannes; Hennequin, Claire; Nougayrède, Jean-Philippe; Zhang, Wenlan; Homburg, Stefan; Karch, Helge; Bringer, Marie-Agnés; Fayolle, Corinne; Carniel, Elisabeth; Rabsch, Wolfgang; Oelschlaeger, Tobias A.; Oswald, Eric; Forestier, Christiane; Hacker, Jörg; Dobrindt, Ulrich

    2009-01-01

    A genomic island encoding the biosynthesis and secretion pathway of putative hybrid nonribosomal peptide-polyketide colibactin has been recently described in Escherichia coli. Colibactin acts as a cyclomodulin and blocks the eukaryotic cell cycle. The origin and prevalence of the colibactin island among enterobacteria are unknown. We therefore screened 1,565 isolates of different genera and species related to the Enterobacteriaceae by PCR for the presence of this DNA element. The island was detected not only in E. coli but also in Klebsiella pneumoniae, Enterobacter aerogenes, and Citrobacter koseri isolates. It was highly conserved among these species and was always associated with the yersiniabactin determinant. Structural variations between individual strains were only observed in an intergenic region containing variable numbers of tandem repeats. In E. coli, the colibactin island was usually restricted to isolates of phylogenetic group B2 and inserted at the asnW tRNA locus. Interestingly, in K. pneumoniae, E. aerogenes, C. koseri, and three E. coli strains of phylogenetic group B1, the functional colibactin determinant was associated with a genetic element similar to the integrative and conjugative elements ICEEc1 and ICEKp1 and to several enterobacterial plasmids. Different asn tRNA genes served as chromosomal insertion sites of the ICE-associated colibactin determinant: asnU in the three E. coli strains of ECOR group B1, and different asn tRNA loci in K. pneumoniae. The detection of the colibactin genes associated with an ICE-like element in several enterobacteria provides new insights into the spread of this gene cluster and its putative mode of transfer. Our results shed light on the mechanisms of genetic exchange between members of the family Enterobacteriaceae. PMID:19720753

  19. PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics

    Directory of Open Access Journals (Sweden)

    Rychlewski Leszek

    2006-02-01

    Full Text Available Abstract Background The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB. Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Results Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0, for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. Conclusion We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. Availability http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF

  20. Genomic structure and expression of immunoglobulins in Squamata.

    Science.gov (United States)

    Olivieri, David N; Garet, Elina; Estevez, Olivia; Sánchez-Espinel, Christian; Gambón-Deza, Francisco

    2016-04-01

    The Squamata order represents a major evolutionary reptile lineage, yet the structure and expression of immunoglobulins in this order has been scarcely studied in detail. From the genome sequences of four Squamata species (Gekko japonicus, Ophisaurus gracilis, Pogona vitticeps and Ophiophagus hannah) and RNA-seq datasets from 18 other Squamata species, we identified the immunoglobulins present in these animals as well as the tissues in which they are found. All Squamata have at least three immunoglobulin classes; namely, the immunoglobulins M, D, and Y. Unlike mammals, however, we provide evidence that some Squamata lineages possess more than one Cμ gene which is located downstream from the Cδ gene. The existence of two evolutionary lineages of immunoglobulin Y is shown. Additionally, it is demonstrated that while all Squamata species possess the λ light chain, only Iguanidae species possess the κ light chain. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Expanding the eukaryotic genetic code

    Science.gov (United States)

    Chin, Jason W.; Cropp, T. Ashton; Anderson, J. Christopher; Schultz, Peter G.

    2013-01-22

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, orthogonal pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  2. Expanding the eukaryotic genetic code

    Energy Technology Data Exchange (ETDEWEB)

    Chin, Jason W.; Cropp, T. Ashton; Anderson, J. Christopher; Schultz, Peter G.

    2017-02-28

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, orthogonal pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  3. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    Science.gov (United States)

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  4. Mosaic structure of Mycobacterium bovis BCG genomes as a representation of phage sequences' mobility.

    Science.gov (United States)

    Voronina, Olga L; Kunda, Marina S; Aksenova, Ekaterina I; Semenov, Andrey N; Ryzhova, Natalia N; Lunin, Vladimir G; Gintsburg, Alexandr L

    2016-12-28

    The control of genome stability is relevant for the worldwide BCG vaccine preventing the acute forms of childhood tuberculosis. BCG sub-strains whole genome comparative analysis and revealing the triggers of sub-strains transition were the purpose of our investigation. Whole genome sequencing of three BCG Russia seed lots (1963, 1982, 2006 years) confirmed the stability of vaccine sub-strain genome. Comparative analysis of three Mycobacteruim bovis and nine M. bovis BCG genomes shown that differences between "early" and "late" sub-strains BCG genomes were associated with specific prophage profiles. Several prophages common to all BCG genomes included ORFs which were homologues to Caudovirales. Surprisingly very different prophage profiles characterized BCG Tice and BCG Montreal genomes. These prophages contained ORFs which were homologues to Herpesviruses. Phylogeny of strains cohort based on genome maps restriction analysis and whole genomes sequence data were in agreement with prophage profiles. Pair-wise alignment of unique BCG Tice and BCG Montreal prophage sequences and BCG Russia 368 genome demonstrated only similarity of fragmetary sequences that suggested the contribution of prophages in genome mosaic structure formation. Control of the extended sequences is important for genome with mosaic structure. Prophage search tools are effective instruments in this analysis.

  5. Towards a comprehensive structural variation map of an individual human genome.

    Science.gov (United States)

    Pang, Andy W; MacDonald, Jeffrey R; Pinto, Dalila; Wei, John; Rafiq, Muhammad A; Conrad, Donald F; Park, Hansoo; Hurles, Matthew E; Lee, Charles; Venter, J Craig; Kirkness, Ewen F; Levy, Samuel; Feuk, Lars; Scherer, Stephen W

    2010-01-01

    Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions. We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association. Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.

  6. Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

    Science.gov (United States)

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-01-01

    Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389

  7. Cytokinesis in eukaryotes.

    Science.gov (United States)

    Guertin, David A; Trautmann, Susanne; McCollum, Dannel

    2002-06-01

    Cytokinesis is the final event of the cell division cycle, and its completion results in irreversible partition of a mother cell into two daughter cells. Cytokinesis was one of the first cell cycle events observed by simple cell biological techniques; however, molecular characterization of cytokinesis has been slowed by its particular resistance to in vitro biochemical approaches. In recent years, the use of genetic model organisms has greatly advanced our molecular understanding of cytokinesis. While the outcome of cytokinesis is conserved in all dividing organisms, the mechanism of division varies across the major eukaryotic kingdoms. Yeasts and animals, for instance, use a contractile ring that ingresses to the cell middle in order to divide, while plant cells build new cell wall outward to the cortex. As would be expected, there is considerable conservation of molecules involved in cytokinesis between yeast and animal cells, while at first glance, plant cells seem quite different. However, in recent years, it has become clear that some aspects of division are conserved between plant, yeast, and animal cells. In this review we discuss the major recent advances in defining cytokinesis, focusing on deciding where to divide, building the division apparatus, and dividing. In addition, we discuss the complex problem of coordinating the division cycle with the nuclear cycle, which has recently become an area of intense research. In conclusion, we discuss how certain cells have utilized cytokinesis to direct development.

  8. The challenge of protein structure determination—lessons from structural genomics

    Science.gov (United States)

    Slabinski, Lukasz; Jaroszewski, Lukasz; Rodrigues, Ana P.C.; Rychlewski, Leszek; Wilson, Ian A.; Lesley, Scott A.; Godzik, Adam

    2007-01-01

    The process of experimental determination of protein structure is marred with a high ratio of failures at many stages. With availability of large quantities of data from high-throughput structure determination in structural genomics centers, we can now learn to recognize protein features correlated with failures; thus, we can recognize proteins more likely to succeed and eventually learn how to modify those that are less likely to succeed. Here, we identify several protein features that correlate strongly with successful protein production and crystallization and combine them into a single score that assesses “crystallization feasibility.” The formula derived here was tested with a jackknife procedure and validated on independent benchmark sets. The “crystallization feasibility” score described here is being applied to target selection in the Joint Center for Structural Genomics, and is now contributing to increasing the success rate, lowering the costs, and shortening the time for protein structure determination. Analyses of PDB depositions suggest that very similar features also play a role in non-high-throughput structure determination, suggesting that this crystallization feasibility score would also be of significant interest to structural biology, as well as to molecular and biochemistry laboratories. PMID:17962404

  9. Eukaryotic checkpoints are absent in the cell division cycle of ...

    Indian Academy of Sciences (India)

    Fidelity in transmission of genetic characters is ensured by the faithful duplication of the genome, followed by equal segregation of the genetic material in the progeny. Thus, alternation of DNA duplication (S-phase) and chromosome segregation during the M-phase are hallmarks of most well studied eukaryotes. Several ...

  10. Uncoupling of Sister Replisomes during Eukaryotic DNA Replication

    NARCIS (Netherlands)

    Yardimci, Hasan; Loveland, Anna B.; Habuchi, Satoshi; van Oijen, Antoine M.; Walter, Johannes C.

    2010-01-01

    The duplication of eukaryotic genomes involves the replication of DNA from multiple origins of replication. In S phase, two sister replisomes assemble at each active origin, and they replicate DNA in opposite directions. Little is known about the functional relationship between sister replisomes.

  11. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...... protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential...

  12. Molecular Data are Transforming Hypotheses on the Origin and Diversification of Eukaryotes

    OpenAIRE

    Tekle, Yonas I.; Parfrey, Laura Wegener; Katz, Laura A.

    2009-01-01

    The explosion of molecular data has transformed hypotheses on both the origin of eukaryotes and the structure of the eukaryotic tree of life. Early ideas about the evolution of eukaryotes arose through analyses of morphology by light microscopy and later electron microscopy. Though such studies have proven powerful at resolving more recent events, theories on origins and diversification of eukaryotic life have been substantially revised in light of analyses of molecular data including gene an...

  13. Towards Fully Automated Structure-Based Function Prediction In Structural Genomics: A Case Study

    Science.gov (United States)

    Watson, James D.; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A.; Thornton, Janet M.

    2007-01-01

    Summary As the global Structural Genomics projects have picked up pace the number of structures annotated in the Protein Data Bank as “hypothetical protein” or “unknown function” has grown significantly. A major challenge now involves the development of computational methods to accurately and automatically assign functions to these proteins. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI) we review the success of the pipeline and the importance of structure-based function prediction. As a dataset we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity difficult to identify using current sequence methods. No one method is successful in all cases so through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chance that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment. PMID:17316683

  14. Eukaryotic transcription factors

    DEFF Research Database (Denmark)

    Staby, Lasse; O'Shea, Charlotte; Willemoës, Martin

    2017-01-01

    Gene-specific transcription factors (TFs) are key regulatory components of signaling pathways, controlling, for example, cell growth, development, and stress responses. Their biological functions are determined by their molecular structures, as exemplified by their structured DNA-binding domains...

  15. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center.

    Science.gov (United States)

    Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-01

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  16. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  17. Systematic Prioritization of Druggable Mutations in ∼5000 Genomes Across 16 Cancer Types Using a Structural Genomics-based Approach.

    Science.gov (United States)

    Zhao, Junfei; Cheng, Feixiong; Wang, Yuanyuan; Arteaga, Carlos L; Zhao, Zhongming

    2016-02-01

    A massive amount of somatic mutations has been cataloged in large-scale projects such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium projects. The majority of the somatic mutations found in tumor genomes are neutral 'passenger' rather than damaging "driver" mutations. Now, understanding their biological consequences and prioritizing them for druggable targets are urgently needed. Thanks to the rapid advances in structural genomics technologies (e.g. X-ray), large-scale protein structural data has now been made available, providing critical information for deciphering functional roles of mutations in cancer and prioritizing those alterations that may mediate drug binding at the atom resolution and, as such, be druggable targets. We hypothesized that mutations at protein-ligand binding-site residues are likely to be druggable targets. Thus, to prioritize druggable mutations, we developed SGDriver, a structural genomics-based method incorporating the somatic missense mutations into protein-ligand binding-site residues using a Bayes inference statistical framework. We applied SGDriver to 746,631 missense mutations observed in 4997 tumor-normal pairs across 16 cancer types from The Cancer Genome Atlas. SGDriver detected 14,471 potential druggable mutations in 2091 proteins (including 1,516 recurrently mutated proteins) across 3558 cancer genomes (71.2%), and further identified 298 proteins harboring mutations that were significantly enriched at protein-ligand binding-site residues (adjusted p value < 0.05). The identified proteins are significantly enriched in both oncoproteins and tumor suppressors. The follow-up drug-target network analysis suggested 98 known and 126 repurposed druggable anticancer targets (e.g. SPOP and NR3C1). Furthermore, our integrative analysis indicated that 13% of patients might benefit from current targeted therapy, and this -proportion would increase to 31% when considering drug repositioning. This study

  18. On the Diversification of the Translation Apparatus across Eukaryotes

    Directory of Open Access Journals (Sweden)

    Greco Hernández

    2012-01-01

    Full Text Available Diversity is one of the most remarkable features of living organisms. Current assessments of eukaryote biodiversity reaches 1.5 million species, but the true figure could be several times that number. Diversity is ingrained in all stages and echelons of life, namely, the occupancy of ecological niches, behavioral patterns, body plans and organismal complexity, as well as metabolic needs and genetics. In this review, we will discuss that diversity also exists in a key biochemical process, translation, across eukaryotes. Translation is a fundamental process for all forms of life, and the basic components and mechanisms of translation in eukaryotes have been largely established upon the study of traditional, so-called model organisms. By using modern genome-wide, high-throughput technologies, recent studies of many nonmodel eukaryotes have unveiled a surprising diversity in the configuration of the translation apparatus across eukaryotes, showing that this apparatus is far from being evolutionarily static. For some of the components of this machinery, functional differences between different species have also been found. The recent research reviewed in this article highlights the molecular and functional diversification the translational machinery has undergone during eukaryotic evolution. A better understanding of all aspects of organismal diversity is key to a more profound knowledge of life.

  19. The Eukaryotic Promoter Database (EPD)

    OpenAIRE

    Perier, R. C.; Praz, V; Junier, T; Bonnard, C.; Bucher, P

    2000-01-01

    The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well a...

  20. Integrated database of information from structural genomics experiments.

    Science.gov (United States)

    Asada, Yukuhiko; Sugahara, Michihiro; Mizutani, Hisashi; Naitow, Hisashi; Tanaka, Tomoyuki; Matsuura, Yoshinori; Agari, Yoshihiro; Ebihara, Akio; Shinkai, Akeo; Kuramitsu, Seiki; Yokoyama, Shigeyuki; Kaminuma, Eri; Kobayashi, Norio; Nishikata, Koro; Shimoyama, Sayoko; Toyoda, Tetsuro; Ishikawa, Tetsuya; Kunishima, Naoki

    2013-05-01

    Information from structural genomics experiments at the RIKEN SPring-8 Center, Japan has been compiled and published as an integrated database. The contents of the database are (i) experimental data from nine species of bacteria that cover a large variety of protein molecules in terms of both evolution and properties (http://database.riken.jp/db/bacpedia), (ii) experimental data from mutant proteins that were designed systematically to study the influence of mutations on the diffraction quality of protein crystals (http://database.riken.jp/db/bacpedia) and (iii) experimental data from heavy-atom-labelled proteins from the heavy-atom database HATODAS (http://database.riken.jp/db/hatodas). The database integration adopts the semantic web, which is suitable for data reuse and automatic processing, thereby allowing batch downloads of full data and data reconstruction to produce new databases. In addition, to enhance the use of data (i) and (ii) by general researchers in biosciences, a comprehensible user interface, Bacpedia (http://bacpedia.harima.riken.jp), has been developed.

  1. The genomic structure of the DMBT1 gene

    DEFF Research Database (Denmark)

    Mollenhauer, J; Holmskov, U; Wiemann, S

    1999-01-01

    , and in gastrointestinal and lung cancers. Based on these properties, DMBT1 has been proposed to be a candidate tumour suppressor gene. We have determined the genomic sequence of DMBT1 to allow analyses of mutations. The gene has at least 54 exons that span a genomic region of about 80 kb. We have identified a putative...

  2. DHPC: a new tool to express genome structural features.

    Science.gov (United States)

    Deng, Xuegong; Deng, Xuemei; Rayner, Simon; Liu, Xiangdong; Zhang, Qingling; Yang, Yupu; Li, Ning

    2008-05-01

    The DHPC (DNA Hilbert-Peano curve) is a new tool for visualizing large-scale genome sequences by mapping sequences into a two-dimensional square. It utilizes the space-filling function of Hilbert-Peano mapping. By applying a Gauss smoothing technique and a user-defined color function, a large-scale genome sequence can be mapped into a two-dimensional color image. In the calculated DHPCs, many genome characteristics are revealed. In this article we introduce the method and show how DHPCs may be used to identify regions of different base composition. The power of the method is demonstrated by presenting multiple examples such as repeating sequences, degree of base bias, regions of homogeneity and their boundaries, and mark of annotated segments. We also present several genome curves generated by DHPC to demonstrate how DHPC can be used to find previously unidentified sequence features in these genomes.

  3. Serial endosymbiosis or singular event at the origin of eukaryotes?

    Science.gov (United States)

    Lane, Nick

    2017-12-07

    'On the Origin of Mitosing Cells' heralded a new way of seeing cellular evolution, with symbiosis at its heart. Lynn Margulis (then Sagan) marshalled an impressive array of evidence for endosymbiosis, from cell biology to atmospheric chemistry and Earth history. Despite her emphasis on symbiosis, she saw plenty of evidence for gradualism in eukaryotic evolution, with multiple origins of mitosis and sex, repeated acquisitions of plastids, and putative evolutionary intermediates throughout the microbial world. Later on, Margulis maintained her view of multiple endosymbioses giving rise to other organelles such as hydrogenosomes, in keeping with the polyphyletic assumptions of the serial endosymbiosis theory. She stood at the threshold of the phylogenetic era, and anticipated its potential. Yet while predicting that the nucleotide sequences of genes would enable a detailed reconstruction of eukaryotic evolution, Margulis did not, and could not, imagine the radically different story that would eventually emerge from comparative genomics. The last eukaryotic common ancestor now seems to have been essentially a modern eukaryotic cell that had already evolved mitosis, meiotic sex, organelles and endomembrane systems. The long search for missing evolutionary intermediates has failed to turn up a single example, and those discussed by Margulis turn out to have evolved reductively from more complex ancestors. Strikingly, Margulis argued that all eukaryotes had mitochondria in her 1967 paper (a conclusion that she later disavowed). But she developed her ideas in the context of atmospheric oxygen and aerobic respiration, neither of which is consistent with more recent geological and phylogenetic findings. Instead, a modern synthesis of genomics and bioenergetics points to the endosymbiotic restructuring of eukaryotic genomes in relation to bioenergetic membranes as the singular event that permitted the evolution of morphological complexity. Copyright © 2017 Elsevier Ltd. All

  4. Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression

    Directory of Open Access Journals (Sweden)

    Adam Rodney D

    2007-04-01

    Full Text Available Abstract Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome.

  5. Massive expansion of the calpain gene family in unicellular eukaryotes

    Directory of Open Access Journals (Sweden)

    Zhao Sen

    2012-09-01

    Full Text Available Abstract Background Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists. Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Results Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. Conclusions The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.

  6. DNA microarrays: from structural genomics to functional genomics. The applications of gene chips in dermatology and dermatopathology.

    Science.gov (United States)

    Sellheyer, Klaus; Belbin, Thomas J

    2004-11-01

    The human genome project was successful in sequencing the entire human genome and ended earlier than expected. The vast genetic information now available will have far-reaching consequences for medicine in the twenty-first century. The knowledge gained from the mapping and sequencing of human genes on a genome-wide scale--commonly referred to as structural genomics--is prerequisite for studies that focus on the functional aspects of genes. A recently invented technique, known as gene chip, or DNA microarray, technology, allows the study of the function of thousands of genes at once, thereby opening the door to the new field of functional genomics. At its core, the DNA microarray utilizes a unique feature of DNA known as complementary hybridization. As such, it is not different from Southern (DNA) blot or northern (RNA) blot hybridizations, or the polymerase chain reaction, with the exception that it allows expression profiling of the entire human genome in a single hybridization experiment. The article highlights the principles, technology, and applications of DNA microarrays as they pertain to the field of dermatology and dermatopathology. The most important applications are the gene expression profiling of skin cancer, especially of melanoma. Other potential applications include gene expression profiling of inflammatory skin diseases, the mutational analysis of genodermatoses, and polymorphism screening, as well as drug development and chemosensitivity prediction. cDNA microarrays will shape the diagnostic approach of the dermatology and the dermatopathology of the future and may lead to new therapeutic options.

  7. Chromosome Conformation Capture on Chip (4C): Meeting genomic neighbors

    NARCIS (Netherlands)

    M.J. Simonis (Marieke)

    2008-01-01

    markdownabstract__Abstract__ The eukaryotic genome is extensively folded to fit in the small volume of the cell nucleus. Several lines of evidence have suggested a functional relationship between the structural folding of chromosomes and gene expression; however methods to systematically analyze

  8. Chapter 1. Target selection in structural genomics projects to increase knowledge of protein structure and function space.

    Science.gov (United States)

    Carter, Phil; Lee, David; Orengo, Christine

    2008-01-01

    Structural genomics aims to solve the three-dimensional structures of proteins at a rapid rate and in a cost-effective manner, with the hope of significantly impacting on the life sciences, biotechnology, and drug discovery in the long-term. Structural genomics initiatives started in Japan in 1997 with the advent of the Protein Folds Project. Since then many new initiatives have begun worldwide, with diverse aims motivating the selection of proteins for structure determination. In this chapter, we consider the biological goals of high-throughput structural biology, while focusing on the Protein Structure Initiative in the United States. This is the most productive of the structural genomics initiatives, having solved 3,363 new structures between September 2000 and October 2008.

  9. Genomic Signals of Reoriented ORFs

    Directory of Open Access Journals (Sweden)

    Paul Dan Cristea

    2004-01-01

    Full Text Available Complex representation of nucleotides is used to convert DNA sequences into complex digital genomic signals. The analysis of the cumulated phase and unwrapped phase of DNA genomic signals reveals large-scale features of eukaryote and prokaryote chromosomes that result from statistical regularities of base and base-pair distributions along DNA strands. By reorienting the chromosome coding regions, a “hidden” linear variation of the cumulated phase has been revealed, along with the conspicuous almost linear variation of the unwrapped phase. A model of chromosome longitudinal structure is inferred on these bases.

  10. Defining the genome structure of 'Tongil' rice, an important cultivar in the Korean "Green Revolution".

    Science.gov (United States)

    Kim, Backki; Kim, Dong-Gwan; Lee, Gileung; Seo, Jeonghwan; Choi, Ik-Young; Choi, Beom-Soon; Yang, Tae-Jin; Kim, Kwang Soo; Lee, Joohyun; Chin, Joong Hyoun; Koh, Hee-Jong

    2014-12-01

    Tongil (IR667-98-1-2) rice, developed in 1972, is a high-yield rice variety derived from a three-way cross between indica and japonica varieties. Tongil contributed to the self-sufficiency of staple food production in Korea during a period known as the 'Korean Green Revolution'. We analyzed the nucleotide-level genome structure of Tongil rice and compared it to those of the parental varieties. A total of 17.3 billion Illumina Hiseq reads, 47× genome coverage, were generated for Tongil rice. Three parental accessions of Tongil rice, two indica types and one japonica type, were also sequenced at approximately 30x genome coverage. A total of 2,149,991 SNPs were detected between Tongil and Nipponbare varieties. The average SNP frequency of Tongil was 5.77 per kb. Genome composition was determined based on SNP data by comparing Tongil with three parental genome sequences using the sliding window approach. Analyses revealed that 91.8% of the Tongil genome originated from the indica parents and 7.9% from the japonica parent. Copy numbers of SSR motifs, ORF gene distribution throughout the whole genome, gene ontology (GO) annotation, and some yield-related QTLs or gene locations were also comparatively analyzed between Tongil and parental varieties using sequence-based tools. Each genetic factor was transferred from the parents into Tongil rice in amounts that were in proportion to the whole genome composition. Tongil was derived from a three-way cross among two indica and one japonica varieties. Defining the genome structure of Tongil rice demonstrates that the Tongil genome is derived primarily from the indica genome with a small proportion of japonica genome introgression. Comparative gene distribution, SSR, GO, and yield-related gene analysis support the finding that the Tongil genome is primarily made up of the indica genome.

  11. Towards New Antifolates Targeting Eukaryotic Opportunistic Infections

    Energy Technology Data Exchange (ETDEWEB)

    Liu, J.; Bolstad, D; Bolstad, E; Wright, D; Anderson, A

    2009-01-01

    Trimethoprim, an antifolate commonly prescribed in combination with sulfamethoxazole, potently inhibits several prokaryotic species of dihydrofolate reductase (DHFR). However, several eukaryotic pathogenic organisms are resistant to trimethoprim, preventing its effective use as a therapeutic for those infections. We have been building a program to reengineer trimethoprim to more potently and selectively inhibit eukaryotic species of DHFR as a viable strategy for new drug discovery targeting several opportunistic pathogens. We have developed a series of compounds that exhibit potent and selective inhibition of DHFR from the parasitic protozoa Cryptosporidium and Toxoplasma as well as the fungus Candida glabrata. A comparison of the structures of DHFR from the fungal species Candida glabrata and Pneumocystis suggests that the compounds may also potently inhibit Pneumocystis DHFR.

  12. The Eukaryotic Promoter Database (EPD): recent developments.

    Science.gov (United States)

    Périer, R C; Junier, T; Bonnard, C; Bucher, P

    1999-01-01

    The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Recent efforts have focused on exhaustive cross-referencing to the EMBL nucleotide sequence database, and on the improvement of the WWW-based user interfaces and data retrieval mechanisms. EPD can be accessed at http://www.epd.isb-sib.ch

  13. Horizontal transfers of transposable elements in eukaryotes: The flying genes.

    Science.gov (United States)

    Panaud, Olivier

    2016-01-01

    Transposable elements (TEs) are the major components of eukaryotic genomes. Their propensity to densely populate and in some cases invade the genomes of plants and animals is in contradiction with the fact that transposition is strictly controlled by several molecular pathways acting at either transcriptional or post-transcriptional levels. Horizontal transfers, defined as the transmission of genetic material between sexually isolated species, have long been considered as rare phenomena. Here, we show that the horizontal transfers of transposable elements (HTTs) are very frequent in ecosystems. The exact mechanisms of such transfers are not well understood, but species involved in close biotic interactions, like parasitism, show a propensity to exchange genetic material horizontally. We propose that HTTs allow TEs to escape the silencing machinery of their host genome and may therefore be an important mechanism for their survival and their dissemination in eukaryotes. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  14. Fragment screening of infectious disease targets in a structural genomics environment.

    Science.gov (United States)

    Begley, Darren W; Davies, Douglas R; Hartley, Robert C; Edwards, Thomas E; Staker, Bart L; Van Voorhis, Wesley C; Myler, Peter J; Stewart, Lance J

    2011-01-01

    Structural genomics efforts have traditionally focused on generating single protein structures of unique and diverse targets. However, a lone structure for a given target is often insufficient to firmly assign function or to drive drug discovery. As part of the Seattle Structural Genomics Center for Infectious Disease (SSGCID), we seek to expand the focus of structural genomics by elucidating ensembles of structures that examine small molecule-protein interactions for selected infectious disease targets. In this chapter, we discuss two applications for small molecule libraries in structural genomics: unbiased fragment screening, to provide inspiration for lead development, and targeted, knowledge-based screening, to confirm or correct the functional annotation of a given gene product. This shift in emphasis results in a structural genomics effort that is more engaged with the infectious disease research community, and one that produces structures of greater utility to researchers interested in both protein function and inhibitor development. We also describe specific methods for conducting high-throughput fragment screening in a structural genomics context by X-ray crystallography. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia.

    Science.gov (United States)

    Morrison, Hilary G; McArthur, Andrew G; Gillin, Frances D; Aley, Stephen B; Adam, Rodney D; Olsen, Gary J; Best, Aaron A; Cande, W Zacheus; Chen, Feng; Cipriano, Michael J; Davids, Barbara J; Dawson, Scott C; Elmendorf, Heidi G; Hehl, Adrian B; Holder, Michael E; Huse, Susan M; Kim, Ulandt U; Lasek-Nesselquist, Erica; Manning, Gerard; Nigam, Anuranjini; Nixon, Julie E J; Palm, Daniel; Passamaneck, Nora E; Prabhu, Anjali; Reich, Claudia I; Reiner, David S; Samuelson, John; Svard, Staffan G; Sogin, Mitchell L

    2007-09-28

    The genome of the eukaryotic protist Giardia lamblia, an important human intestinal parasite, is compact in structure and content, contains few introns or mitochondrial relics, and has simplified machinery for DNA replication, transcription, RNA processing, and most metabolic pathways. Protein kinases comprise the single largest protein class and reflect Giardia's requirement for a complex signal transduction network for coordinating differentiation. Lateral gene transfer from bacterial and archaeal donors has shaped Giardia's genome, and previously unknown gene families, for example, cysteine-rich structural proteins, have been discovered. Unexpectedly, the genome shows little evidence of heterozygosity, supporting recent speculations that this organism is sexual. This genome sequence will not only be valuable for investigating the evolution of eukaryotes, but will also be applied to the search for new therapeutics for this parasite.

  16. Decoding the fine-scale structure of a breast cancer genome and transcriptome.

    Science.gov (United States)

    Volik, Stanislav; Raphael, Benjamin J; Huang, Guiqing; Stratton, Michael R; Bignel, Graham; Murnane, John; Brebner, John H; Bajsarowicz, Krystyna; Paris, Pamela L; Tao, Quanzhou; Kowbel, David; Lapuk, Anna; Shagin, Dmitri A; Shagina, Irina A; Gray, Joe W; Cheng, Jan-Fang; de Jong, Pieter J; Pevzner, Pavel; Collins, Colin

    2006-03-01

    A comprehensive understanding of cancer is predicated upon knowledge of the structure of malignant genomes underlying its many variant forms and the molecular mechanisms giving rise to them. It is well established that solid tumor genomes accumulate a large number of genome rearrangements during tumorigenesis. End Sequence Profiling (ESP) maps and clones genome breakpoints associated with all types of genome rearrangements elucidating the structural organization of tumor genomes. Here we extend the ESP methodology in several directions using the breast cancer cell line MCF-7. First, targeted ESP is applied to multiple amplified loci, revealing a complex process of rearrangement and co-amplification in these regions reminiscent of breakage/fusion/bridge cycles. Second, genome breakpoints identified by ESP are confirmed using a combination of DNA sequencing and PCR. Third, in vitro functional studies assign biological function to a rearranged tumor BAC clone, demonstrating that it encodes anti-apoptotic activity. Finally, ESP is extended to the transcriptome identifying four novel fusion transcripts and providing evidence that expression of fusion genes may be common in tumors. These results demonstrate the distinct advantages of ESP including: (1) the ability to detect all types of rearrangements and copy number changes; (2) straightforward integration of ESP data with the annotated genome sequence; (3) immortalization of the genome; (4) ability to generate tumor-specific reagents for in vitro and in vivo functional studies. Given these properties, ESP could play an important role in a tumor genome project.

  17. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations.

    Science.gov (United States)

    Gremme, Gordon; Steinbiss, Sascha; Kurtz, Stefan

    2013-01-01

    Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

  18. Automatic generation of gene finders for eukaryotic species

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Krogh, A.

    2006-01-01

    Background The number of sequenced eukaryotic genomes is rapidly increasing. This means that over time it will be hard to keep supplying customised gene finders for each genome. This calls for procedures to automatically generate species-specific gene finders and to re-train them as the quantity...... length distributions. The performance of each individual gene predictor on each individual genome is comparable to the best of the manually optimised species-specific gene finders. It is shown that species-specific gene finders are superior to gene finders trained on other species....

  19. Structural genomic variation as risk factor for idiopathic recurrent miscarriage

    DEFF Research Database (Denmark)

    Nagirnaja, Liina; Palta, Priit; Kasak, Laura

    2014-01-01

    Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs...... and identify common rearrangements modulating risk to RM. Genome-wide screening of Estonian RM patients and fertile controls identified excessive cumulative burden of CNVs (5.4 and 6.1 Mb per genome) in two RM cases possibly increasing their individual disease risk. Functional profiling of all rearranged genes...... and Denmark (meta-analysis, n = 309/205, odds ratio = 4.82, P = 0.012). Comparison to Estonian population-based cohort (total, n = 1000) confirmed the risk for Estonian female cases (P = 7.9 × 10(-4) ). Datasets of four cohorts from the Database of Genomic Variants (total, n = 5,846 subjects) exhibited...

  20. Mapping and phasing of structural variation in patient genomes using nanopore sequencing

    NARCIS (Netherlands)

    Cretu Stancu, Mircea|info:eu-repo/dai/nl/413641880; van Roosmalen, Markus J|info:eu-repo/dai/nl/413995429; Renkens, Ivo; Nieboer, Marleen M|info:eu-repo/dai/nl/413968073; Middelkamp, Sjors|info:eu-repo/dai/nl/413967247; de Ligt, Joep|info:eu-repo/dai/nl/374312117; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose|info:eu-repo/dai/nl/413647560; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin|info:eu-repo/dai/nl/183050487; Talkowski, Michael E.; Marschall, Tobias; de Ridder, Jeroen|info:eu-repo/dai/nl/304110299; Kloosterman, Wigard P|info:eu-repo/dai/nl/304076953

    2017-01-01

    Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel

  1. Organizational heterogeneity of vertebrate genomes.

    Directory of Open Access Journals (Sweden)

    Svetlana Frenkel

    Full Text Available Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.

  2. DNA mismatch repair and its many roles in eukaryotic cells

    DEFF Research Database (Denmark)

    Liu, Dekang; Keijzers, Guido; Rasmussen, Lene Juel

    2017-01-01

    in the clinic, and as a biomarker of cancer susceptibility in animal model systems. Prokaryotic MMR is well-characterized at the molecular and mechanistic level; however, MMR is considerably more complex in eukaryotic cells than in prokaryotic cells, and in recent years, it has become evident that MMR plays......DNA mismatch repair (MMR) is an important DNA repair pathway that plays critical roles in DNA replication fidelity, mutation avoidance and genome stability, all of which contribute significantly to the viability of cells and organisms. MMR is widely-used as a diagnostic biomarker for human cancers...... novel roles in eukaryotic cells, several of which are not yet well-defined or understood. Many MMR-deficient human cancer cells lack mutations in known human MMR genes, which strongly suggests that essential eukaryotic MMR components/cofactors remain unidentified and uncharacterized. Furthermore...

  3. The genome of obligately intracellular Ehrlichia canis revealsthemes of complex membrane structure and immune evasion strategies

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.; Ivanova, N.; Francino, P.; Chain, P.; Shin, M.; Malfatti, S.; Larimer, F.; Copeland,A.; Detter, J.C.; Land, M.; Richardson, P.M.; Yu, X.J.; Walker, D.H.; McBride, J.W.; Kyrpides, N.C.

    2005-09-01

    Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).

  4. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  5. Alternative splice variants in TIM barrel proteins from human genome correlate with the structural and evolutionary modularity of this versatile protein fold.

    Directory of Open Access Journals (Sweden)

    Adrián Ochoa-Leyva

    Full Text Available After the surprisingly low number of genes identified in the human genome, alternative splicing emerged as a major mechanism to generate protein diversity in higher eukaryotes. However, it is still not known if its prevalence along the genome evolution has contributed to the overall functional protein diversity or if it simply reflects splicing noise. The (βα8 barrel or TIM barrel is one of the most frequent, versatile, and ancient fold encountered among enzymes. Here, we analyze the structural modifications present in TIM barrel proteins from the human genome product of alternative splicing events. We found that 87% of all splicing events involved deletions; most of these events resulted in protein fragments that corresponded to the (βα2, (βα4, (βα5, (βα6, and (βα7 subdomains of TIM barrels. Because approximately 7% of all the splicing events involved internal β-strand substitutions, we decided, based on the genomic data, to design β-strand and α-helix substitutions in a well-studied TIM barrel enzyme. The biochemical characterization of one of the chimeric variants suggests that some of the splice variants in the human genome with β-strand substitutions may be evolving novel functions via either the oligomeric state or substrate specificity. We provide results of how the splice variants represent subdomains that correlate with the independently folding and evolving structural units previously reported. This work is the first to observe a link between the structural features of the barrel and a recurrent genetic mechanism. Our results suggest that it is reasonable to expect that a sizeable fraction of splice variants found in the human genome represent structurally viable functional proteins. Our data provide additional support for the hypothesis of the origin of the TIM barrel fold through the assembly of smaller subdomains. We suggest a model of how nature explores new proteins through alternative splicing as a mechanism to

  6. In silico prediction and screening of modular crystal structures via a high-throughput genomic approach

    National Research Council Canada - National Science Library

    Li, Yi; Li, Xu; Liu, Jiancong; Duan, Fangzheng; Yu, Jihong

    2015-01-01

    .... Here we demonstrate the application of a new genomic approach to ABC-6 zeolites, a family of industrially important catalysts whose structures are built from the stacking of modular six-ring layers...

  7. Genomic evolution of 11 type strains within family Planctomycetaceae.

    Directory of Open Access Journals (Sweden)

    Min Guo

    Full Text Available The species in family Planctomycetaceae are ideal groups for investigating the origin of eukaryotes. Their cells are divided by a lipidic intracytoplasmic membrane and they share a number of eukaryote-like molecular characteristics. However, their genomic structures, potential abilities, and evolutionary status are still unknown. In this study, we searched for common protein families and a core genome/pan genome based on 11 sequenced species in family Planctomycetaceae. Then, we constructed phylogenetic tree based on their 832 common protein families. We also annotated the 11 genomes using the Clusters of Orthologous Groups database. Moreover, we predicted and reconstructed their core/pan metabolic pathways using the KEGG (Kyoto Encyclopedia of Genes and Genomes orthology system. Subsequently, we identified genomic islands (GIs and structural variations (SVs among the five complete genomes and we specifically investigated the integration of two Planctomycetaceae plasmids in all 11 genomes. The results indicate that Planctomycetaceae species share diverse genomic variations and unique genomic characteristics, as well as have huge potential for human applications.

  8. Conservation and Variability of Meiosis Across the Eukaryotes.

    Science.gov (United States)

    Loidl, Josef

    2016-11-23

    Comparisons among a variety of eukaryotes have revealed considerable variability in the structures and processes involved in their meiosis. Nevertheless, conventional forms of meiosis occur in all major groups of eukaryotes, including early-branching protists. This finding confirms that meiosis originated in the common ancestor of all eukaryotes and suggests that primordial meiosis may have had many characteristics in common with conventional extant meiosis. However, it is possible that the synaptonemal complex and the delicate crossover control related to its presence were later acquisitions. Later still, modifications to meiotic processes occurred within different groups of eukaryotes. Better knowledge on the spectrum of derived and uncommon forms of meiosis will improve our understanding of many still mysterious aspects of the meiotic process and help to explain the evolutionary basis of functional adaptations to the meiotic program.

  9. Horizontal gene transfer in eukaryotic plant pathogens.

    Science.gov (United States)

    Soanes, Darren; Richards, Thomas A

    2014-01-01

    Gene transfer has been identified as a prevalent and pervasive phenomenon and an important source of genomic innovation in bacteria. The role of gene transfer in microbial eukaryotes seems to be of a reduced magnitude but in some cases can drive important evolutionary innovations, such as new functions that underpin the colonization of different niches. The aim of this review is to summarize published cases that support the hypothesis that horizontal gene transfer (HGT) has played a role in the evolution of phytopathogenic traits in fungi and oomycetes. Our survey of the literature identifies 46 proposed cases of transfer of genes that have a putative or experimentally demonstrable phytopathogenic function. When considering the life-cycle steps through which a pathogen must progress, the majority of the HGTs identified are associated with invading, degrading, and manipulating the host. Taken together, these data suggest HGT has played a role in shaping how fungi and oomycetes colonize plant hosts.

  10. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    Science.gov (United States)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  11. A new chicken genome assembly provides insight into avian genome structure

    Science.gov (United States)

    The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3) built from combined long single molecule sequencing t...

  12. TAR cloning: insights into gene function, long-range haplotypes and genome structure and evolution.

    Science.gov (United States)

    Kouprina, Natalay; Larionov, Vladimir

    2006-10-01

    The structural and functional analysis of mammalian genomes would benefit from the ability to isolate from multiple DNA samples any targeted chromosomal segment that is the size of an average human gene. A cloning technique that is based on transformation-associated recombination (TAR) in the yeast Saccharomyces cerevisiae satisfies this need. It is a unique tool to selectively recover chromosome segments that are up to 250 kb in length from complex genomes. In addition, TAR cloning can be used to characterize gene function and genome variation, including polymorphic structural rearrangements, mutations and the evolution of gene families, and for long-range haplotyping.

  13. Studying Cattle Genomic Structural Variations in the Green Economy Era

    Science.gov (United States)

    Transgenic cattle carrying multiple genomic modifications have been produced by serial rounds of somatic cell chromatin transfer (cloning) of sequentially genetically targeted somatic cells. However, cloning efficiency tends to decline with the increase of rounds of cloning. It is possible that mult...

  14. Eukaryotic organisms in Proterozoic oceans.

    Science.gov (United States)

    Knoll, A H; Javaux, E J; Hewitt, D; Cohen, P

    2006-06-29

    The geological record of protists begins well before the Ediacaran and Cambrian diversification of animals, but the antiquity of that history, its reliability as a chronicle of evolution and the causal inferences that can be drawn from it remain subjects of debate. Well-preserved protists are known from a relatively small number of Proterozoic formations, but taphonomic considerations suggest that they capture at least broad aspects of early eukaryotic evolution. A modest diversity of problematic, possibly stem group protists occurs in ca 1800-1300 Myr old rocks. 1300-720 Myr fossils document the divergence of major eukaryotic clades, but only with the Ediacaran-Cambrian radiation of animals did diversity increase within most clades with fossilizable members. While taxonomic placement of many Proterozoic eukaryotes may be arguable, the presence of characters used for that placement is not. Focus on character evolution permits inferences about the innovations in cell biology and development that underpin the taxonomic and morphological diversification of eukaryotic organisms.

  15. Eukaryotic vs. cyanobacterial oxygenic photosynthesis

    OpenAIRE

    Schmelling, Nicolas

    2015-01-01

    Slides of my talk about the differences between eukaryotic and cyanobacterial oxygenic photosynthesis.  The talk is a more generell overview about the differences of the two systems. Slides and Figures are my own. For comments, questions and suggestions please contact me via twitter @derschmelling or via mail

  16. Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution

    Science.gov (United States)

    Yap, Jia-Yee S.; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y. H.; Wilkins, Marc R.; Rossetto, Maurizio; Delaney, Sven K.

    2015-01-01

    The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine. PMID:26061691

  17. Structural modelling and phylogenetic analyses of PgeIF4A2 (Eukaryotic translation initiation factor) from Pennisetum glaucum reveal signature motifs with a role in stress tolerance and development.

    Science.gov (United States)

    Agarwal, Aakrati; Mudgil, Yashwanti; Pandey, Saurabh; Fartyal, Dhirendra; Reddy, Malireddy K

    2016-01-01

    Eukaryotic translation initiation factor 4A (eIF4A) is an indispensable component of the translation machinery and also play a role in developmental processes and stress alleviation in plants and animals. Different eIF4A isoforms are present in the cytosol of the cell, namely, eIF4A1, eIF4A2, and eIF4A3 and their expression is tightly regulated in cap-dependent translation. We revealed the structural model of PgeIF4A2 protein using the crystal structure of Homo sapiens eIF4A3 (PDB ID: 2J0S) as template by Modeller 9.12. The resultant PgeIF4A2 model structure was refined by PROCHECK, ProSA, Verify3D and RMSD that showed the model structure is reliable with 77 % amino acid sequence identity with template. Investigation revealed two conserved signatures for ATP-dependent RNA Helicase DEAD-box conserved site (VLDEADEML) and RNA helicase DEAD-box type, Q-motif in sheet-turn-helix and α-helical region respectively. All these conserved motifs are responsible for response during developmental stages and stress tolerance in plants.

  18. Replication and transcription on a collision course: eukaryotic regulation mechanisms and implications for DNA stability.

    Directory of Open Access Journals (Sweden)

    Alessandra eBrambati

    2015-04-01

    Full Text Available DNA replication and transcription are vital cellular processes during which the genetic information is copied into complementary DNA and RNA molecules. Highly complex machineries required for DNA and RNA synthesis compete for the same DNA template, therefore being on a collision course. Unscheduled replication-transcription clashes alter the gene transcription program and generate replication stress, reducing fork speed. Molecular pathways and mechanisms that minimize the conflict between replication and transcription have been extensively characterized in prokaryotic cells and recently identified also in eukaryotes. A pathological outcome of replication-transcription collisions is the formation of stable RNA:DNA hybrids in molecular structures called R-loops. Growing evidence suggests that R-loop accumulation promotes both genetic and epigenetic instability, thus severely affecting genome functionality. In the present review, we summarize the current knowledge related to replication and transcription conflicts in eukaryotes, their consequences on genome instability and the pathways involved in their resolution. These findings are relevant to clarify the molecular basis of cancer and neurodegenerative diseases.

  19. Integrated databases and computer systems for studying eukaryotic gene expression.

    Science.gov (United States)

    Kolchanov, N A; Ponomarenko, M P; Frolov, A S; Ananko, E A; Kolpakov, F A; Ignatieva, E V; Podkolodnaya, O A; Goryachkovskaya, T N; Stepanenko, I L; Merkulova, T I; Babenko, V V; Ponomarenko, Y V; Kochetov, A V; Podkolodny, N L; Vorobiev, D V; Lavryushev, S V; Grigorovich, D A; Kondrakhin, Y V; Milanesi, L; Wingender, E; Solovyev, V; Overton, G C

    1999-01-01

    The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences. The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins. GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++.

  20. Multiple roles of genome-attached bacteriophage terminal proteins

    Energy Technology Data Exchange (ETDEWEB)

    Redrejo-Rodríguez, Modesto; Salas, Margarita, E-mail: msalas@cbm.csic.es

    2014-11-15

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer.

  1. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution....... These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D...

  2. Unsupervised pattern discovery in human chromatin structure through genomic segmentation.

    Science.gov (United States)

    Hoffman, Michael M; Buske, Orion J; Wang, Jie; Weng, Zhiping; Bilmes, Jeff A; Noble, William Stafford

    2012-03-18

    We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.

  3. A statistical anomaly indicates symbiotic origins of eukaryotic membranes

    Science.gov (United States)

    Bansal, Suneyna; Mittal, Aditya

    2015-01-01

    Compositional analyses of nucleic acids and proteins have shed light on possible origins of living cells. In this work, rigorous compositional analyses of ∼5000 plasma membrane lipid constituents of 273 species in the three life domains (archaea, eubacteria, and eukaryotes) revealed a remarkable statistical paradox, indicating symbiotic origins of eukaryotic cells involving eubacteria. For lipids common to plasma membranes of the three domains, the number of carbon atoms in eubacteria was found to be similar to that in eukaryotes. However, mutually exclusive subsets of same data show exactly the opposite—the number of carbon atoms in lipids of eukaryotes was higher than in eubacteria. This statistical paradox, called Simpson's paradox, was absent for lipids in archaea and for lipids not common to plasma membranes of the three domains. This indicates the presence of interaction(s) and/or association(s) in lipids forming plasma membranes of eubacteria and eukaryotes but not for those in archaea. Further inspection of membrane lipid structures affecting physicochemical properties of plasma membranes provides the first evidence (to our knowledge) on the symbiotic origins of eukaryotic cells based on the “third front” (i.e., lipids) in addition to the growing compositional data from nucleic acids and proteins. PMID:25631820

  4. The SGC beyond structural genomics: redefining the role of 3D structures by coupling genomic stratification with fragment-based discovery.

    Science.gov (United States)

    Bradley, Anthony R; Echalier, Aude; Fairhead, Michael; Strain-Damerell, Claire; Brennan, Paul; Bullock, Alex N; Burgess-Brown, Nicola A; Carpenter, Elisabeth P; Gileadi, Opher; Marsden, Brian D; Lee, Wen Hwa; Yue, Wyatt; Bountra, Chas; von Delft, Frank

    2017-11-08

    The ongoing explosion in genomics data has long since outpaced the capacity of conventional biochemical methodology to verify the large number of hypotheses that emerge from the analysis of such data. In contrast, it is still a gold-standard for early phenotypic validation towards small-molecule drug discovery to use probe molecules (or tool compounds), notwithstanding the difficulty and cost of generating them. Rational structure-based approaches to ligand discovery have long promised the efficiencies needed to close this divergence; in practice, however, this promise remains largely unfulfilled, for a host of well-rehearsed reasons and despite the huge technical advances spearheaded by the structural genomics initiatives of the noughties. Therefore the current, fourth funding phase of the Structural Genomics Consortium (SGC), building on its extensive experience in structural biology of novel targets and design of protein inhibitors, seeks to redefine what it means to do structural biology for drug discovery. We developed the concept of a Target Enabling Package (TEP) that provides, through reagents, assays and data, the missing link between genetic disease linkage and the development of usefully potent compounds. There are multiple prongs to the ambition: rigorously assessing targets' genetic disease linkages through crowdsourcing to a network of collaborating experts; establishing a systematic approach to generate the protocols and data that comprise each target's TEP; developing new, X-ray-based fragment technologies for generating high quality chemical matter quickly and cheaply; and exploiting a stringently open access model to build multidisciplinary partnerships throughout academia and industry. By learning how to scale these approaches, the SGC aims to make structures finally serve genomics, as originally intended, and demonstrate how 3D structures systematically allow new modes of druggability to be discovered for whole classes of targets. © 2017 The

  5. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure.

    Science.gov (United States)

    Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J; Links, Matthew G; Clarke, Carling; Higgins, Erin E; Huebert, Terry; Sharpe, Andrew G; Parkin, Isobel A P

    2014-04-23

    Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop.

  6. Associations between inverted repeats and the structural evolution of bacterial genomes.

    Science.gov (United States)

    Achaz, Guillaume; Coissac, Eric; Netter, Pierre; Rocha, Eduardo P C

    2003-01-01

    The stability of the structure of bacterial genomes is challenged by recombination events. Since major rearrangements (i.e., inversions) are thought to frequently operate by homologous recombination between inverted repeats, we analyzed the presence and distribution of such repeats in bacterial genomes and their relation to the conservation of chromosomal structure. First, we show that there is a strong under-representation of inverted repeats, relative to direct repeats, in most chromosomes, especially among the ones regarded as most stable. Second, we show that the avoidance of repeats is frequently associated with the stability of the genomes. Closely related genomes reported to differ in terms of stability are also found to differ in the number of inverted repeats. Third, when using replication strand bias as a proxy for genome stability, we find a significant negative correlation between this strand bias and the abundance of inverted repeats. Fourth, when measuring the recombining potential of inverted repeats and their eventual impact on different features of the chromosomal structure, we observe a tendency of repeats to be located in the chromosome in such a way that rearrangements produce a smaller strand switch and smaller asymmetries than expected by chance. Finally, we discuss the limitations of our analysis and the influence of factors such as the nature of repeats, e.g., transposases, or the differences in the recombination machinery among bacteria. These results shed light on the challenges imposed on the genome structure by the presence of inverted repeats. PMID:12930739

  7. The structure of a rigorously conserved RNA element within the SARS virus genome.

    Directory of Open Access Journals (Sweden)

    Michael P Robertson

    2005-01-01

    Full Text Available We have solved the three-dimensional crystal structure of the stem-loop II motif (s2m RNA element of the SARS virus genome to 2.7-A resolution. SARS and related coronaviruses and astroviruses all possess a motif at the 3' end of their RNA genomes, called the s2m, whose pathogenic importance is inferred from its rigorous sequence conservation in an otherwise rapidly mutable RNA genome. We find that this extreme conservation is clearly explained by the requirement to form a highly structured RNA whose unique tertiary structure includes a sharp 90 degrees kink of the helix axis and several novel longer-range tertiary interactions. The tertiary base interactions create a tunnel that runs perpendicular to the main helical axis whose interior is negatively charged and binds two magnesium ions. These unusual features likely form interaction surfaces with conserved host cell components or other reactive sites required for virus function. Based on its conservation in viral pathogen genomes and its absence in the human genome, we suggest that these unusual structural features in the s2m RNA element are attractive targets for the design of anti-viral therapeutic agents. Structural genomics has sought to deduce protein function based on three-dimensional homology. Here we have extended this approach to RNA by proposing potential functions for a rigorously conserved set of RNA tertiary structural interactions that occur within the SARS RNA genome itself. Based on tertiary structural comparisons, we propose the s2m RNA binds one or more proteins possessing an oligomer-binding-like fold, and we suggest a possible mechanism for SARS viral RNA hijacking of host protein synthesis, both based upon observed s2m RNA macromolecular mimicry of a relevant ribosomal RNA fold.

  8. Eukaryotic Penelope-Like Retroelements Encode Hammerhead Ribozyme Motifs

    Science.gov (United States)

    Cervera, Amelia; De la Peña, Marcos

    2014-01-01

    Small self-cleaving RNAs, such as the paradigmatic Hammerhead ribozyme (HHR), have been recently found widespread in DNA genomes across all kingdoms of life. In this work, we found that new HHR variants are preserved in the ancient family of Penelope-like elements (PLEs), a group of eukaryotic retrotransposons regarded as exceptional for encoding telomerase-like retrotranscriptases and spliceosomal introns. Our bioinformatic analysis revealed not only the presence of minimalist HHRs in the two flanking repeats of PLEs but also their massive and widespread occurrence in metazoan genomes. The architecture of these ribozymes indicates that they may work as dimers, although their low self-cleavage activity in vitro suggests the requirement of other factors in vivo. In plants, however, PLEs show canonical HHRs, whereas fungi and protist PLEs encode ribozyme variants with a stable active conformation as monomers. Overall, our data confirm the connection of self-cleaving RNAs with eukaryotic retroelements and unveil these motifs as a significant fraction of the encoded information in eukaryotic genomes. PMID:25135949

  9. G2S: A web-service for annotating genomic variants on 3D protein structures.

    Science.gov (United States)

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-01-27

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that support programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design conception and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.

  10. Specificity and evolvability in eukaryotic protein interaction networks.

    Directory of Open Access Journals (Sweden)

    Pedro Beltrao

    2007-02-01

    Full Text Available Progress in uncovering the protein interaction networks of several species has led to questions of what underlying principles might govern their organization. Few studies have tried to determine the impact of protein interaction network evolution on the observed physiological differences between species. Using comparative genomics and structural information, we show here that eukaryotic species have rewired their interactomes at a fast rate of approximately 10(-5 interactions changed per protein pair, per million years of divergence. For Homo sapiens this corresponds to 10(3 interactions changed per million years. Additionally we find that the specificity of binding strongly determines the interaction turnover and that different biological processes show significantly different link dynamics. In particular, human proteins involved in immune response, transport, and establishment of localization show signs of positive selection for change of interactions. Our analysis suggests that a small degree of molecular divergence can give rise to important changes at the network level. We propose that the power law distribution observed in protein interaction networks could be partly explained by the cell's requirement for different degrees of protein binding specificity.

  11. Lateral gene transfer between prokaryotes and multicellular eukaryotes: ongoing and significant?

    NARCIS (Netherlands)

    Ros, V.I.D.; Hurst, G.D.D.

    2009-01-01

    The expansion of genome sequencing projects has produced accumulating evidence for lateral transfer of genes between prokaryotic and eukaryotic genomes. However, it remains controversial whether these genes are of functional importance in their recipient host. Nikoh and Nakabachi, in a recent paper

  12. Mapping and phasing of structural variation in patient genomes using nanopore sequencing.

    Science.gov (United States)

    Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P

    2017-11-06

    Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.

  13. Development of a radiation track structure clustering algorithm for the prediction of DNA DSB yields and radiation induced cell death in Eukaryotic cells.

    Science.gov (United States)

    Douglass, Michael; Bezak, Eva; Penfold, Scott

    2015-04-21

    The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model developed in the current work. This model uses fundamental measurable microscopic quantities such as genome length rather than macroscopic radiobiological quantities such as alpha/beta ratios. This means that the model can be theoretically used under a wide range of conditions with a single set of input parameters once calibrated for a given cell line.

  14. Gonococcal attachment to eukaryotic cells

    Energy Technology Data Exchange (ETDEWEB)

    James, J.F.; Lammel, C.J.; Draper, D.L.; Brown, D.A.; Sweet, R.L.; Brooks, G.F.

    The attachment of Neisseria gonorrhoeae to eukaryotic cells grown in tissue culture was analyzed by use of light and electron microscopy and by labeling of the bacteria with (/sup 3/H)- and (/sup 14/C)adenine. Isogenic piliated and nonpiliated N. gonorrhoeae from opaque and transparent colonies were studied. The results of light microscopy studies showed that the gonococci attached to cells of human origin, including Flow 2000, HeLa 229, and HEp 2. Studies using radiolabeled gonococci gave comparable results. Piliated N. gonorrhoeae usually attached in larger numbers than nonpiliated organisms, and those from opaque colonies attached more often than isogenic variants from transparent colonies. Day-to-day variation in rate of attachment was observed. Scanning electron microscopy studies showed the gonococcal attachment to be specific for microvilli of the host cells. It is concluded that more N. gonorrhoeae from opaque colonies, as compared with isogenic variants from transparent colonies, attach to eukaryotic cells grown in tissue culture.

  15. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes

    DEFF Research Database (Denmark)

    Parker, Brian John; Moltke, Ida; Roth, Adam

    2011-01-01

    a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein...... identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one...... involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we...

  16. Honey Bee Deformed Wing Virus Structures Reveal that Conformational Changes Accompany Genome Release.

    Science.gov (United States)

    Organtini, Lindsey J; Shingler, Kristin L; Ashley, Robert E; Capaldi, Elizabeth A; Durrani, Kulsoom; Dryden, Kelly A; Makhov, Alexander M; Conway, James F; Pizzorno, Marie C; Hafenstein, Susan

    2017-01-15

    The picornavirus-like deformed wing virus (DWV) has been directly linked to colony collapse; however, little is known about the mechanisms of host attachment or entry for DWV or its molecular and structural details. Here we report the three-dimensional (3-D) structures of DWV capsids isolated from infected honey bees, including the immature procapsid, the genome-filled virion, the putative entry intermediate (A-particle), and the empty capsid that remains after genome release. The capsids are decorated by large spikes around the 5-fold vertices. The 5-fold spikes had an open flower-like conformation for the procapsid and genome-filled capsids, whereas the putative A-particle and empty capsids that had released the genome had a closed tube-like spike conformation. Between the two conformations, the spikes undergo a significant hinge-like movement that we predicted using a Robetta model of the structure comprising the spike. We conclude that the spike structures likely serve a function during host entry, changing conformation to release the genome, and that the genome may escape from a 5-fold vertex to initiate infection. Finally, the structures illustrate that, similarly to picornaviruses, DWV forms alternate particle conformations implicated in assembly, host attachment, and RNA release. Honey bees are critical for global agriculture, but dramatic losses of entire hives have been reported in numerous countries since 2006. Deformed wing virus (DWV) and infestation with the ectoparasitic mite Varroa destructor have been linked to colony collapse disorder. DWV was purified from infected adult worker bees to pursue biochemical and structural studies that allowed the first glimpse into the conformational changes that may be required during transmission and genome release for DWV. Copyright © 2017 American Society for Microbiology.

  17. Genome-wide identification of structural variants in genes encoding drug targets

    DEFF Research Database (Denmark)

    Rasmussen, Henrik Berg; Dahmcke, Christina Mackeprang

    2012-01-01

    The objective of the present study was to identify structural variants of drug target-encoding genes on a genome-wide scale. We also aimed at identifying drugs that are potentially amenable for individualization of treatments based on knowledge about structural variation in the genes encoding...

  18. The Isochores as a Fundamental Level of Genome Structure and Organization: A General Overview.

    Science.gov (United States)

    Costantini, Maria; Musto, Héctor

    2017-03-01

    The recent availability of a number of fully sequenced genomes (including marine organisms) allowed to map very precisely the isochores, based on DNA sequences, confirming the results obtained before genome sequencing by the ultracentrifugation in CsCl. In fact, the analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong to a small number of families characterized by different GC levels. In this review, we will concentrate on some general genome features regarding the compositional organization from different organisms and their evolution, ranging from vertebrates to invertebrates until unicellular organisms. Since isochores are tightly linked to biological properties such as gene density, replication timing, and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function, and evolution. All the findings reported here confirm the idea that the isochores can be considered as a "fundamental level of genome structure and organization." We stress that we do not discuss in this review the origin of isochores, which is still a matter of controversy, but we focus on well established structural and physiological aspects.

  19. Horizontal DNA transfer from bacteria to eukaryotes and a lesson from experimental transfers.

    Science.gov (United States)

    Suzuki, Katsunori; Moriguchi, Kazuki; Yamamoto, Shinji

    2015-12-01

    Horizontal gene transfer (HGT) is widespread among bacteria and plays a key role in genome dynamics. HGT is much less common in eukaryotes, but is being reported with increasing frequency in eukaryotes. The mechanism as to how eukaryotes acquired genes from distantly related organisms remains obscure yet. This paper cites examples of bacteria-derived genes found in eukaryotic organisms, and then describes experimental DNA transports to eukaryotes by bacterial type 4 secretion systems in optimized conditions. The mechanisms of the latter are efficient, quite reproducible in vitro and predictable, and thereby would provide insight into natural HGT and to the development of new research tools. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  20. RNA structure: merging chemistry and genomics for a holistic perspective.

    Science.gov (United States)

    Kubota, Miles; Chan, Dalen; Spitale, Robert C

    2015-10-01

    The advent of deep sequencing technology has unexpectedly advanced our structural understanding of molecules composed of nucleic acids. A significant amount of progress has been made recently extrapolating the chemical methods to probe RNA structure into sequencing methods. Herein we review some of the canonical methods to analyze RNA structure, and then we outline how these have been used to probe the structure of many RNAs in parallel. The key is the transformation of structural biology problems into sequencing problems, whereby sequencing power can be interpreted to understand nucleic acid proximity, nucleic acid conformation, or nucleic acid-protein interactions. Utilizing such technologies in this way has the promise to provide novel structural insights into the mechanisms that control normal cellular physiology and provide insight into how structure could be perturbed in disease. © 2015 WILEY Periodicals, Inc.

  1. Higher order structure in the 3'-minor domain of small subunit ribosomal RNAs from a gram negative bacterium, a gram positive bacterium and a eukaryote

    DEFF Research Database (Denmark)

    Douthwaite, S; Christensen, A; Garrett, R A

    1983-01-01

    An experimental approach was used to determine and compare the highest order structure within the 150 to 200 nucleotides at the 3'-ends of the RNAs from the small ribosomal subunits of Escherichia coli, Bacillus stearothermophilus and Saccharomyces cerevisiae. Chemical reagents were employed......, T2 and S1. The data enabled the various minimal secondary structural models, proposed for the 3'-regions of the E. coli and S. cerevisiae RNAs, to be critically examined, and to demonstrate that the main common features of these models are correct. The results also reveal the presence and position...... regions of the RNAs are particularly important for the functioning of the ribosome. They are involved in mRNA, tRNA and ribosomal factor binding. The results reveal that while the functionally important RNA sequences tend to be conserved, they are not always accessible in the free RNA; the pyrimidine...

  2. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Science.gov (United States)

    Du, Jiang; Bjornson, Robert D; Zhang, Zhengdong D; Kong, Yong; Snyder, Michael; Gerstein, Mark B

    2009-07-01

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at

  3. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  4. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  5. Genetic diversity of the obligate intracellular bacterium Chlamydophila pneumoniae by genome-wide analysis of single nucleotide polymorphisms: evidence for highly clonal population structure

    Directory of Open Access Journals (Sweden)

    Solbach Werner

    2007-10-01

    Full Text Available Abstract Background Chlamydophila pneumoniae is an obligate intracellular bacterium that replicates in a biphasic life cycle within eukaryotic host cells. Four published genomes revealed an identity of > 99 %. This remarkable finding raised questions about the existence of distinguishable genotypes in correlation with geographical and anatomical origin. Results We studied the genetic diversity of C. pneumoniae by analysing synonymous single nucleotide polymorphisms (sSNPs that are under reduced selection pressure. We conducted an in silico analysis of the four sequenced genomes, chose 232 representative sSNPs and analysed the loci in 38 C. pneumoniae isolates. We identified 15 different genotypes that were separated in four major clusters. Clusters were not associated with anatomical or geographical origin. However, animal lineages are basal on the C. pneumomiae phylogeny, suggesting a recent transmission to humans through successive bottlenecks some 150,000 years ago. A lack of detectable variation in 17 isolates emphasizes the extraordinary genetic conservation of this species and the high clonality of the population. Moreover, the largest cluster, which encompasses 80% of all analysed strains, is an extremely young clade, that went through an important population expansion some 3,300 years ago. Conclusion sSNPs have proven useful as a sensitive marker to gain new insights into genetic diversity, population structure and evolutionary history of C. pneumoniae.

  6. Phylogenetic analysis of ferlin genes reveals ancient eukaryotic origins

    Directory of Open Access Journals (Sweden)

    Lek Monkol

    2010-07-01

    Full Text Available Abstract Background The ferlin gene family possesses a rare and identifying feature consisting of multiple tandem C2 domains and a C-terminal transmembrane domain. Much currently remains unknown about the fundamental function of this gene family, however, mutations in its two most well-characterised members, dysferlin and otoferlin, have been implicated in human disease. The availability of genome sequences from a wide range of species makes it possible to explore the evolution of the ferlin family, providing contextual insight into characteristic features that define the ferlin gene family in its present form in humans. Results Ferlin genes were detected from all species of representative phyla, with two ferlin subgroups partitioned within the ferlin phylogenetic tree based on the presence or absence of a DysF domain. Invertebrates generally possessed two ferlin genes (one with DysF and one without, with six ferlin genes in most vertebrates (three DysF, three non-DysF. Expansion of the ferlin gene family is evident between the divergence of lamprey (jawless vertebrates and shark (cartilaginous fish. Common to almost all ferlins is an N-terminal C2-FerI-C2 sandwich, a FerB motif, and two C-terminal C2 domains (C2E and C2F adjacent to the transmembrane domain. Preservation of these structural elements throughout eukaryotic evolution suggests a fundamental role of these motifs for ferlin function. In contrast, DysF, C2DE, and FerA are optional, giving rise to subtle differences in domain topologies of ferlin genes. Despite conservation of multiple C2 domains in all ferlins, the C-terminal C2 domains (C2E and C2F displayed higher sequence conservation and greater conservation of putative calcium binding residues across paralogs and orthologs. Interestingly, the two most studied non-mammalian ferlins (Fer-1 and Misfire in model organisms C. elegans and D. melanogaster, present as outgroups in the phylogenetic analysis, with results suggesting

  7. Population Structure Analysis of Bull Genomes of European and Western Ancestry

    DEFF Research Database (Denmark)

    Chung, Neo Christopher; Szyda, Joanna; Frąszczak, Magdalena

    2017-01-01

    Since domestication, population bottlenecks, breed formation, and selective breeding have radically shaped the genealogy and genetics of Bos taurus. In turn, characterization of population structure among diverse bull (males of Bos taurus) genomes enables detailed assessment of genetic resources...... and origins. By analyzing 432 unrelated bull genomes from 13 breeds and 16 countries, we demonstrate genetic diversity and structural complexity among the European/Western cattle population. Importantly, we relaxed a strong assumption of discrete or admixed population, by adapting latent variable models...... for individual-specific allele frequencies that directly capture a wide range of complex structure from genome-wide genotypes. As measured by magnitude of differentiation, selection pressure on SNPs within genes is substantially greater than that on intergenic regions. Additionally, broad regions of chromosome 6...

  8. Fundamental differences in diversity and genomic population structure between Atlantic and Pacific Prochlorococcus.

    Science.gov (United States)

    Kashtan, Nadav; Roggensack, Sara E; Berta-Thompson, Jessie W; Grinberg, Maor; Stepanauskas, Ramunas; Chisholm, Sallie W

    2017-09-01

    The Atlantic and Pacific Oceans represent different biogeochemical regimes in which the abundant marine cyanobacterium Prochlorococcus thrives. We have shown that Prochlorococcus populations in the Atlantic are composed of hundreds of genomically, and likely ecologically, distinct coexisting subpopulations with distinct genomic backbones. Here we ask if differences in the ecology and selection pressures between the Atlantic and Pacific are reflected in the diversity and genomic composition of their indigenous Prochlorococcus populations. We applied large-scale single-cell genomics and compared the cell-by-cell genomic composition of wild populations of co-occurring cells from samples from Station ALOHA off Hawaii, and from Bermuda Atlantic Time Series Station off Bermuda. We reveal fundamental differences in diversity and genomic structure of populations between the sites. The Pacific populations are more diverse than those in the Atlantic, composed of significantly more coexisting subpopulations and lacking dominant subpopulations. Prochlorococcus from the two sites seem to be composed of mostly non-overlapping distinct sets of subpopulations with different genomic backbones-likely reflecting different sets of ocean-specific micro-niches. Furthermore, phylogenetically closely related strains carry ocean-associated nutrient acquisition genes likely reflecting differences in major selection pressures between the oceans. This differential selection, along with geographic separation, clearly has a significant role in shaping these populations.

  9. Comparative genome structure, secondary metabolite, and effector coding capacity across Cochliobolus pathogens.

    Directory of Open Access Journals (Sweden)

    Bradford J Condon

    Full Text Available The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus, and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI. The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS, polyketide synthase (PKS, and SSP-encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  10. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    Energy Technology Data Exchange (ETDEWEB)

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  11. Genomic structural variation contributes to phenotypic change of industrial bioethanol yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Zhang, Ke; Zhang, Li-Jie; Fang, Ya-Hong; Jin, Xin-Na; Qi, Lei; Wu, Xue-Chang; Zheng, Dao-Qiong

    2016-03-01

    Genomic structural variation (GSV) is a ubiquitous phenomenon observed in the genomes of Saccharomyces cerevisiae strains with different genetic backgrounds; however, the physiological and phenotypic effects of GSV are not well understood. Here, we first revealed the genetic characteristics of a widely used industrial S. cerevisiae strain, ZTW1, by whole genome sequencing. ZTW1 was identified as an aneuploidy strain and a large-scale GSV was observed in the ZTW1 genome compared with the genome of a diploid strain YJS329. These GSV events led to copy number variations (CNVs) in many chromosomal segments as well as one whole chromosome in the ZTW1 genome. Changes in the DNA dosage of certain functional genes directly affected their expression levels and the resultant ZTW1 phenotypes. Moreover, CNVs of large chromosomal regions triggered an aneuploidy stress in ZTW1. This stress decreased the proliferation ability and tolerance of ZTW1 to various stresses, while aneuploidy response stress may also provide some benefits to the fermentation performance of the yeast, including increased fermentation rates and decreased byproduct generation. This work reveals genomic characters of the bioethanol S. cerevisiae strain ZTW1 and suggests that GSV is an important kind of mutation that changes the traits of industrial S. cerevisiae strains. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Network Structure and Dynamics, and Emergence of Robustness by Stabilizing Selection in an Artificial Genome

    CERN Document Server

    Rohlf, Thimo

    2008-01-01

    Genetic regulation is a key component in development, but a clear understanding of the structure and dynamics of genetic networks is not yet at hand. In this work we investigate these properties within an artificial genome model originally introduced by Reil. We analyze statistical properties of randomly generated genomes both on the sequence- and network level, and show that this model correctly predicts the frequency of genes in genomes as found in experimental data. Using an evolutionary algorithm based on stabilizing selection for a phenotype, we show that robustness against single base mutations, as well as against random changes in initial network states that mimic stochastic fluctuations in environmental conditions, can emerge in parallel. Evolved genomes exhibit characteristic patterns on both sequence and network level.

  13. Protein structure similarity clustering (PSSC) and natural product structure as inspiration sources for drug development and chemical genomics

    NARCIS (Netherlands)

    Dekker, Frank J; Koch, Marcus A; Waldmann, Herbert; Dekker, Frans

    Finding small molecules that modulate protein function is of primary importance in drug development and in the emerging field of chemical genomics. To facilitate the identification of such molecules, we developed a novel strategy making use of structural conservatism found in protein domain

  14. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  15. Horizontal gene acquisitions by eukaryotes as drivers of adaptive evolution.

    Science.gov (United States)

    Schönknecht, Gerald; Weber, Andreas P M; Lercher, Martin J

    2014-01-01

    In contrast to vertical gene transfer from parent to offspring, horizontal (or lateral) gene transfer moves genetic information between different species. Bacteria and archaea often adapt through horizontal gene transfer. Recent analyses indicate that eukaryotic genomes, too, have acquired numerous genes via horizontal transfer from prokaryotes and other lineages. Based on this we raise the hypothesis that horizontally acquired genes may have contributed more to adaptive evolution of eukaryotes than previously assumed. Current candidate sets of horizontally acquired eukaryotic genes may just be the tip of an iceberg. We have recently shown that adaptation of the thermoacidophilic red alga Galdieria sulphuraria to its hot, acid, toxic-metal laden, volcanic environment was facilitated by the acquisition of numerous genes from extremophile bacteria and archaea. Other recently published examples of horizontal acquisitions involved in adaptation include ice-binding proteins in marine algae, enzymes for carotenoid biosynthesis in aphids, and genes involved in fungal metabolism. Editor's suggested further reading in BioEssays Jumping the fine LINE between species: Horizontal transfer of transposable elements in animals catalyses genome evolution Abstract. © 2014 WILEY Periodicals, Inc.

  16. Gene order data from a model amphibian (Ambystoma: new perspectives on vertebrate genome structure and evolution

    Directory of Open Access Journals (Sweden)

    Voss S Randal

    2006-08-01

    Full Text Available Abstract Background Because amphibians arise from a branch of the vertebrate evolutionary tree that is juxtaposed between fishes and amniotes, they provide important comparative perspective for reconstructing character changes that have occurred during vertebrate evolution. Here, we report the first comparative study of vertebrate genome structure that includes a representative amphibian. We used 491 transcribed sequences from a salamander (Ambystoma genetic map and whole genome assemblies for human, mouse, rat, dog, chicken, zebrafish, and the freshwater pufferfish Tetraodon nigroviridis to compare gene orders and rearrangement rates. Results Ambystoma has experienced a rate of genome rearrangement that is substantially lower than mammalian species but similar to that of chicken and fish. Overall, we found greater conservation of genome structure between Ambystoma and tetrapod vertebrates, nevertheless, 57% of Ambystoma-fish orthologs are found in conserved syntenies of four or more genes. Comparisons between Ambystoma and amniotes reveal extensive conservation of segmental homology for 57% of the presumptive Ambystoma-amniote orthologs. Conclusion Our analyses suggest relatively constant interchromosomal rearrangement rates from the euteleost ancestor to the origin of mammals and illustrate the utility of amphibian mapping data in establishing ancestral amniote and tetrapod gene orders. Comparisons between Ambystoma and amniotes reveal some of the key events that have structured the human genome since diversification of the ancestral amniote lineage.

  17. Mitochondrial genome structure and evolution in the living fossil vampire squid, Vampyroteuthis infernalis, and extant cephalopods.

    Science.gov (United States)

    Yokobori, Shin-ichi; Lindsay, Dhugal J; Yoshida, Mari; Tsuchiya, Kotaro; Yamagishi, Akihiko; Maruyama, Tadashi; Oshima, Tairo

    2007-08-01

    Complete nucleotide sequences of mitochondrial (mt) genomes of the "living fossil" cephalopod Vampyroteuthis infernalis (Vampyromorpha) and the cuttlefish Sepia esculenta (Sepiida) were determined. The V. infernalis mt genome structure is identical to the incirrate octopod Octopus vulgaris mt genome structure, and is therefore more similar to that of the polyplacophoran Katharina tunicata, than to that of the other "living fossil" cephalopod Nautilus macromphalus. The mt genome structure of S. esculenta is identical to that of Sepia officinalis. Molecular phylogenetic analyses based on the mt protein genes from the completely sequenced cephalopod mt genomes suggested the monophyletic relationship of two myopsid squids Loligo bleekeri and Sepiotheuthis lessoniana, and the monophyletic relationship of two oegopsid squids Watasenia scintillans, and Todarodes pacificus. Sepiida appeared as the sister group of Teuthida (Myopsida + Oegopsida). The phylogenetic position of Vampyromorpha appeared as the sister group of Octopoda, although the monophyly of Vampyromorpha and Decapodiformes cannot be rejected outright by our phylogenetic analyses. The hypothesis that Vampyromorpha is basal among the coleoid cephalopods can be rejected because of low statistical support. Therefore, it is reasonable to recognize three major groups in Coleoidea--Vampyromorpha, Octopoda, and Decapodiformes.

  18. The genomic structure of the human UFO receptor.

    Science.gov (United States)

    Schulz, A S; Schleithoff, L; Faust, M; Bartram, C R; Janssen, J W

    1993-02-01

    Using a DNA transfection-tumorigenicity assay we have recently identified the UFO oncogene. It encodes a tyrosine kinase receptor characterized by the juxtaposition of two immunoglobulin-like and two fibronectin type III repeats in its extracellular domain. Here we describe the genomic organization of the human UFO locus. The UFO receptor is encoded by 20 exons that are distributed over a region of 44 kb. Different isoforms of UFO mRNA are generated by alternative splicing of exon 10 and differential usage of two imperfect polyadenylation sites resulting in the presence or absence of 1.5-kb 3' untranslated sequences. Primer extension and S1 nuclease analyses revealed multiple transcriptional initiation sites including a major site 169 bp upstream of the translation start site. The promoter region is GC rich, lacks TATA and CAAT boxes, but contains potential recognition sites for a variety of trans-acting factors, including Sp1, AP-2 and the cyclic AMP response element-binding protein. Proto-UFO and its oncogenic counterpart exhibit identical cDNA and promoter regions sequences. Possible modes of UFO activation are discussed.

  19. Genome structure and primitive sex chromosome revealed in Populus

    Energy Technology Data Exchange (ETDEWEB)

    Tuskan, Gerald A [ORNL; Yin, Tongming [ORNL; Gunter, Lee E [ORNL; Blaudez, D [UMR, France

    2008-01-01

    We constructed a comprehensive genetic map for Populus and ordered 332 Mb of sequence scaffolds along the 19 haploid chromosomes in order to compare chromosomal regions among diverse members of the genus. These efforts lead us to conclude that chromosome XIX in Populus is evolving into a sex chromosome. Consistent segregation distortion in favor of the sub-genera Tacamahaca alleles provided evidence of divergent selection among species, particularly at the proximal end of chromosome XIX. A large microsatellite marker (SSR) cluster was detected in the distorted region even though the genome-wide distribute SSR sites was uniform across the physical map. The differences between the genetic map and physical sequence data suggested recombination suppression was occurring in the distorted region. A gender-determination locus and an overabundance of NBS-LRR genes were also co-located to the distorted region and were put forth as the cause for divergent selection and recombination suppression. This hypothesis was verified by using fine-scale mapping of an integrated scaffold in the vicinity of the gender-determination locus. As such it appears that chromosome XIX in Populus is in the process of evolving from an autosome into a sex chromosome and that NBS-LRR genes may play important role in the chromosomal diversification process in Populus.

  20. 3D-GNOME: an integrated web service for structural modeling of the 3D genome.

    Science.gov (United States)

    Szalaj, Przemyslaw; Michalski, Paul J; Wróblewski, Przemysław; Tang, Zhonghui; Kadlof, Michal; Mazzocco, Giovanni; Ruan, Yijun; Plewczynski, Dariusz

    2016-07-08

    Recent advances in high-throughput chromosome conformation capture (3C) technology, such as Hi-C and ChIA-PET, have demonstrated the importance of 3D genome organization in development, cell differentiation and transcriptional regulation. There is now a widespread need for computational tools to generate and analyze 3D structural models from 3C data. Here we introduce our 3D GeNOme Modeling Engine (3D-GNOME), a web service which generates 3D structures from 3C data and provides tools to visually inspect and annotate the resulting structures, in addition to a variety of statistical plots and heatmaps which characterize the selected genomic region. Users submit a bedpe (paired-end BED format) file containing the locations and strengths of long range contact points, and 3D-GNOME simulates the structure and provides a convenient user interface for further analysis. Alternatively, a user may generate structures using published ChIA-PET data for the GM12878 cell line by simply specifying a genomic region of interest. 3D-GNOME is freely available at http://3dgnome.cent.uw.edu.pl/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Evolution of DNA replication protein complexes in eukaryotes and Archaea.

    Directory of Open Access Journals (Sweden)

    Nicholas Chia

    Full Text Available BACKGROUND: The replication of DNA in Archaea and eukaryotes requires several ancillary complexes, including proliferating cell nuclear antigen (PCNA, replication factor C (RFC, and the minichromosome maintenance (MCM complex. Bacterial DNA replication utilizes comparable proteins, but these are distantly related phylogenetically to their archaeal and eukaryotic counterparts at best. METHODOLOGY/PRINCIPAL FINDINGS: While the structures of each of the complexes do not differ significantly between the archaeal and eukaryotic versions thereof, the evolutionary dynamic in the two cases does. The number of subunits in each complex is constant across all taxa. However, they vary subtly with regard to composition. In some taxa the subunits are all identical in sequence, while in others some are homologous rather than identical. In the case of eukaryotes, there is no phylogenetic variation in the makeup of each complex-all appear to derive from a common eukaryotic ancestor. This is not the case in Archaea, where the relationship between the subunits within each complex varies taxon-to-taxon. We have performed a detailed phylogenetic analysis of these relationships in order to better understand the gene duplications and divergences that gave rise to the homologous subunits in Archaea. CONCLUSION/SIGNIFICANCE: This domain level difference in evolution suggests that different forces have driven the evolution of DNA replication proteins in each of these two domains. In addition, the phylogenies of all three gene families support the distinctiveness of the proposed archaeal phylum Thaumarchaeota.

  2. Detection of Genomic Structural Variants from Next-Generation Sequencing Data

    Directory of Open Access Journals (Sweden)

    Lorenzo eTattini

    2015-06-01

    Full Text Available Structural variants are genomic rearrangements larger than 50 bp accounting for around1% of the variation among human genomes. They impact on phenotypic diversityand play a role in various diseases including neurological/neurocognitive disordersand cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approacheshave been proposed in the literature. In this mini review we describe and summarisethe latest tools – and their underlying algorithms – designed for the analysis ofwhole-genome sequencing, whole-exome sequencing, custom captures and ampliconsequencing data, pointing out the major advantages/drawbacks. We also report asummary of the most recent applications of third-generation sequencing platforms.This assessment provides a guided indication – with particular emphasis on humangenetics and copy number variants – for researchers involved in the investigation of thesegenomic events.

  3. Morphology, genome sequence, and structural proteome of type phage P335 from Lactococcus lactis

    DEFF Research Database (Denmark)

    Labrie, Simon J.; Josephsen, Jytte; Neve, Horst

    2008-01-01

    for a shorter tail and a different collar/whisker structure. Its 33,613-bp double-stranded DNA genome had 50 open reading frames. Putative functions were assigned to 29 of them. Unlike other sequenced genomes from lactococcal phages belonging to this species, P335 did not have a lysogeny module. However, it did...... carry a dUTPase gene, the most conserved gene among this phage species. Comparative genomic analyses revealed a high level of identity between the morphogenesis modules of the phages P335, ul36, TP901-1, and Tuc2009 and two putative prophages of L. lactis SK11. Differences were noted in genes coding...... for receptor-binding proteins, in agreement with their distinct host ranges. Sixteen structural proteins of phage P335 were identified by liquid chromatography-tandem mass spectrometry. A 2.8-kb insertion was recognized between the putative genes coding for the activator of late transcription (Alt...

  4. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes.......Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...

  5. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    Science.gov (United States)

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation

  6. Structural and functional-annotation of an equine whole genome oligoarray

    Directory of Open Access Journals (Sweden)

    Chowdhary Bhanu

    2009-10-01

    Full Text Available Abstract Background The horse genome is sequenced, allowing equine researchers to use high-throughput functional genomics platforms such as microarrays; next-generation sequencing for gene expression and proteomics. However, for researchers to derive value from these functional genomics datasets, they must be able to model this data in biologically relevant ways; to do so requires that the equine genome be more fully annotated. There are two interrelated types of genomic annotation: structural and functional. Structural annotation is delineating and demarcating the genomic elements (such as genes, promoters, and regulatory elements. Functional annotation is assigning function to structural elements. The Gene Ontology (GO is the de facto standard for functional annotation, and is routinely used as a basis for modelling and hypothesis testing, large functional genomics datasets. Results An Equine Whole Genome Oligonucleotide (EWGO array with 21,351 elements was developed at Texas A&M University. This 70-mer oligoarray was designed using the approximately 7× assembled and annotated sequence of the equine genome to be one of the most comprehensive arrays available for expressed equine sequences. To assist researchers in determining the biological meaning of data derived from this array, we have structurally annotated it by mapping the elements to multiple database accessions, including UniProtKB, Entrez Gene, NRPD (Non-Redundant Protein Database and UniGene. We next provided GO functional annotations for the gene transcripts represented on this array. Overall, we GO annotated 14,531 gene products (68.1% of the gene products represented on the EWGO array with 57,912 annotations. GAQ (GO Annotation Quality scores were calculated for this array both before and after we added GO annotation. The additional annotations improved the meanGAQ score 16-fold. This data is publicly available at AgBase http://www.agbase.msstate.edu/. Conclusion Providing

  7. Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

    Science.gov (United States)

    Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo

    2014-09-13

    Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral

  8. Recognition of extremophilic archaeal viruses by eukaryotic cells

    DEFF Research Database (Denmark)

    Uldahl, Kristine Buch; Wu, Linping; Hall, Arnaldur

    2016-01-01

    Viruses from the third domain of life, Archaea, exhibit unusual features including extreme stability that allow their survival in harsh environments. In addition, these species have never been reported to integrate into human or any other eukaryotic genomes, and could thus serve for exploration...... of novel medical nanoplatforms. Here, we selected two archaeal viruses Sulfolobus monocaudavirus 1 (SMV1) and Sulfolobus spindle shaped virus 2 (SSV2) owing to their unique spindle shape, hyperthermostable and acid-resistant nature and studied their interaction with mammalian cells. Accordingly, we...... for selective cell targeting. On internalization, both viruses localize to the lysosomal compartments. Neither SMV1, nor SSV2 induced any detrimental effect on cell morphology, plasma membrane and mitochondrial functionality. This is the first study demonstrating recognition of archaeal viruses by eukaryotic...