WorldWideScience

Sample records for genomic organization protein

  1. Architectural protein subclasses shape 3-D organization of genomes during lineage commitment

    Science.gov (United States)

    Phillips-Cremins, Jennifer E.; Sauria, Michael E. G.; Sanyal, Amartya; Gerasimova, Tatiana I.; Lajoie, Bryan R.; Bell, Joshua S. K.; Ong, Chin-Tong; Hookway, Tracy A.; Guo, Changying; Sun, Yuhua; Bland, Michael J.; Wagstaff, William; Dalton, Stephen; McDevitt, Todd C.; Sen, Ranjan; Dekker, Job; Taylor, James; Corces, Victor G.

    2013-01-01

    Summary Understanding the topological configurations of chromatin may reveal valuable insights into how the genome and epigenome act in concert to control cell fate during development. Here we generate high-resolution architecture maps across seven genomic loci in embryonic stem cells and neural progenitor cells. We observe a hierarchy of 3-D interactions that undergo marked reorganization at the sub-Mb scale during differentiation. Distinct combinations of CTCF, Mediator, and cohesin show widespread enrichment in looping interactions at different length scales. CTCF/cohesin anchor long-range constitutive interactions that form the topological basis for invariant sub-domains. Conversely, Mediator/cohesin together with pioneer factors bridge shortrange enhancer-promoter interactions within and between larger sub-domains. Knockdown of Smc1 or Med12 in ES cells results in disruption of spatial architecture and down-regulation of genes found in cohesin-mediated interactions. We conclude that cell type-specific chromatin organization occurs at the sub-Mb scale and that architectural proteins shape the genome in hierarchical length scales. PMID:23706625

  2. A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelay proteins of organisms with fully sequenced genomes

    Directory of Open Access Journals (Sweden)

    Baldiri Salvado

    2015-08-01

    Full Text Available Two Component Systems and Phosphorelays (TCS/PR are environmental signal transduction cascades in prokaryotes and, less frequently, in eukaryotes. The internal domain organization of proteins and the topology of TCS/PR cascades play an important role in shaping the responses of the circuits. It is thus important to maintain updated censuses of TCS/PR proteins in order to identify the various topologies used by nature and enable a systematic study of the dynamics associated with those topologies. To create such a census, we analyzed the proteomes of 7,609 organisms from all domains of life with fully sequenced and annotated genomes. To begin, we survey each proteome searching for proteins containing domains that are associated with internal signal transmission within TCS/PR: Histidine Kinase (HK, Response Regulator (RR and Histidine Phosphotranfer (HPt domains, and analyze how these domains are arranged in the individual proteins. Then, we find all types of operon organization and calculate how much more likely are proteins that contain TCS/PR domains to be coded by neighboring genes than one would expect from the genome background of each organism. Finally, we analyze if the fusion of domains into single TCS/PR proteins is more frequently observed than one might expect from the background of each proteome. We find 50 alternative ways in which the HK, HPt, and RR domains are observed to organize into single proteins. In prokaryotes, TCS/PR coding genes tend to be clustered in operons. 90% of all proteins identified in this study contain just one of the three domains, while 8% of the remaining proteins combine one copy of an HK, a RR, and/or an HPt domain. In eukaryotes, 25% of all TCS/PR proteins have more than one domain. These results might have implications for how signals are internally transmitted within TCS/PR cascades. These implications could explain the selection of the various designs in alternative circumstances.

  3. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  4. Computational Analysis of Uncharacterized Proteins of Environmental Bacterial Genome

    Science.gov (United States)

    Coxe, K. J.; Kumar, M.

    2017-12-01

    Betaproteobacteria strain CB is a gram-negative bacterium in the phylum Proteobacteria and are found naturally in soil and water. In this complex environment, bacteria play a key role in efficiently eliminating the organic material and other pollutants from wastewater. To investigate the process of pollutant removal from wastewater using bacteria, it is important to characterize the proteins encoded by the bacterial genome. Our study combines a number of bioinformatics tools to predict the function of unassigned proteins in the bacterial genome. The genome of Betaproteobacteria strain CB contains 2,112 proteins in which function of 508 proteins are unknown, termed as uncharacterized proteins (UPs). The localization of the UPs with in the cell was determined and the structure of 38 UPs was accurately predicted. These UPs were predicted to belong to various classes of proteins such as enzymes, transporters, binding proteins, signal peptides, transmembrane proteins and other proteins. The outcome of this work will help better understand wastewater treatment mechanism.

  5. Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome

    Science.gov (United States)

    Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.

    2018-03-01

    The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.

  6. Exploring Protein Function Using the Saccharomyces Genome Database.

    Science.gov (United States)

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  7. Structural Genomics of Minimal Organisms: Pipeline and Results

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  8. Rodent malaria parasites : genome organization & comparative genomics

    NARCIS (Netherlands)

    Kooij, Taco W.A.

    2006-01-01

    The aim of the studies described in this thesis was to investigate the genome organization of rodent malaria parasites (RMPs) and compare the organization and gene content of the genomes of RMPs and the human malaria parasite P. falciparum. The release of the complete genome sequence of P.

  9. A scored human protein-protein interaction network to catalyze genomic interpretation

    DEFF Research Database (Denmark)

    Li, Taibo; Wernersson, Rasmus; Hansen, Rasmus B

    2017-01-01

    Genome-scale human protein-protein interaction networks are critical to understanding cell biology and interpreting genomic data, but challenging to produce experimentally. Through data integration and quality control, we provide a scored human protein-protein interaction network (InWeb_InBioMap,......Genome-scale human protein-protein interaction networks are critical to understanding cell biology and interpreting genomic data, but challenging to produce experimentally. Through data integration and quality control, we provide a scored human protein-protein interaction network (In...

  10. Avian papillomaviruses: the parrot Psittacus erithacus papillomavirus (PePV genome has a unique organization of the early protein region and is phylogenetically related to the chaffinch papillomavirus

    Directory of Open Access Journals (Sweden)

    Jenson A Bennett

    2002-07-01

    Full Text Available Abstract Background An avian papillomavirus genome has been cloned from a cutaneous exophytic papilloma from an African grey parrot (Psittacus erithacus. The nucleotide sequence, genome organization, and phylogenetic position of the Psittacus erithacus papillomavirus (PePV were determined. This PePV sequence represents the first complete avian papillomavirus genome defined. Results The PePV genome (7304 basepairs differs from other papillomaviruses, in that it has a unique organization of the early protein region lacking classical E6 and E7 open reading frames. Phylogenetic comparison of the PePV sequence with partial E1 and L1 sequences of the chaffinch (Fringilla coelebs papillomavirus (FPV reveals that these two avian papillomaviruses form a monophyletic cluster with a common branch that originates near the unresolved center of the papillomavirus evolutionary tree. Conclusions The PePV genome has a unique layout of the early protein region which represents a novel prototypic genomic organization for avian papillomaviruses. The close relationship between PePV and FPV, and between their Psittaciformes and Passeriformes hosts, supports the hypothesis that papillomaviruses have co-evolved and speciated together with their host species throughout evolution.

  11. Uncovering the functional constraints underlying the genomic organization of the odorant-binding protein genes.

    Science.gov (United States)

    Librado, Pablo; Rozas, Julio

    2013-01-01

    Animal olfactory systems have a critical role for the survival and reproduction of individuals. In insects, the odorant-binding proteins (OBPs) are encoded by a moderately sized gene family, and mediate the first steps of the olfactory processing. Most OBPs are organized in clusters of a few paralogs, which are conserved over time. Currently, the biological mechanism explaining the close physical proximity among OBPs is not yet established. Here, we conducted a comprehensive study aiming to gain insights into the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large conserved arrangements. These organizations also include other non-OBP genes, which often encode proteins integral to plasma membrane. Moreover, the conservation degree of such large clusters is related to the following: 1) the promoter architecture of the confined genes, 2) a characteristic transcriptional environment, and 3) the chromatin conformation of the chromosomal region. Our results suggest that chromatin domains may restrict the location of OBP genes to regions having the appropriate transcriptional environment, leading to the OBP cluster structure. However, the appropriate transcriptional environment for OBP and the other neighbor genes is not dominated by reduced levels of expression noise. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.

  12. Hepatitis A Virus Genome Organization and Replication Strategy.

    Science.gov (United States)

    McKnight, Kevin L; Lemon, Stanley M

    2018-04-02

    Hepatitis A virus (HAV) is a positive-strand RNA virus classified in the genus Hepatovirus of the family Picornaviridae It is an ancient virus with a long evolutionary history and multiple features of its capsid structure, genome organization, and replication cycle that distinguish it from other mammalian picornaviruses. HAV proteins are produced by cap-independent translation of a single, long open reading frame under direction of an inefficient, upstream internal ribosome entry site (IRES). Genome replication occurs slowly and is noncytopathic, with transcription likely primed by a uridylated protein primer as in other picornaviruses. Newly produced quasi-enveloped virions (eHAV) are released from cells in a nonlytic fashion in a unique process mediated by interactions of capsid proteins with components of the host cell endosomal sorting complexes required for transport (ESCRT) system. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  13. Structural organization of poliovirus RNA replication is mediated by viral proteins of the P2 genomic region

    International Nuclear Information System (INIS)

    Bienz, K.; Egger, D.; Troxler, M.; Pasamontes, L.

    1990-01-01

    Transcriptionally active replication complexes bound to smooth membrane vesicles were isolated from poliovirus-infected cells. In electron microscopic, negatively stained preparations, the replication complex appeared as an irregularly shaped, oblong structure attached to several virus-induced vesicles of a rosettelike arrangement. Electron microscopic immunocytochemistry of such preparations demonstrated that the poliovirus replication complex contains the proteins coded by the P2 genomic region (P2 proteins) in a membrane-associated form. In addition, the P2 proteins are also associated with viral RNA, and they can be cross-linked to viral RNA by UV irradiation. Guanidine hydrochloride prevented the P2 proteins from becoming membrane bound but did not change their association with viral RNA. The findings allow the conclusion that the protein 2C or 2C-containing precursor(s) is responsible for the attachment of the viral RNA to the vesicular membrane and for the spatial organization of the replication complex necessary for its proper functioning in viral transcription. A model for the structure of the viral replication complex and for the function of the 2C-containing P2 protein(s) and the vesicular membranes is proposed

  14. Genome Defense Mechanisms in Neurospora and Associated Specialized Proteins

    Directory of Open Access Journals (Sweden)

    Ranjan Tamuli

    2010-06-01

    Full Text Available Neurospora crassa, the filamentous fungus possesses widest array of genome defense mechanisms known to any eukaryotic organism, including a process called repeat-induced point mutation (RIP. RIP is a genome defense mechanism that hypermutates repetitive DNA sequences; analogous to genomic imprinting in mammals. As an impact of RIP, Neurospora possesses many fewer genes in multigene families than expected. A DNA methyltransferase homologue, RID was shown to be essential for RIP. Recently, a variant catalytic subunit of translesion DNA polymerase zeta (Pol zeta has been found to be essential for dominant RIP suppressor phenotype. Meiotic silencing and quelling are two other genome defense mechanisms in Neurospora, and proteins required for these two processes have been identified through genetic screens.

  15. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2007-11-01

    Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and

  16. Enhanced heterologous protein productivity by genome reduction in Lactococcus lactis NZ9000.

    Science.gov (United States)

    Zhu, Duolong; Fu, Yuxin; Liu, Fulu; Xu, Haijin; Saris, Per Erik Joakim; Qiao, Mingqiang

    2017-01-03

    The implementation of novel chassis organisms to be used as microbial cell factories in industrial applications is an intensive research field. Lactococcus lactis, which is one of the most extensively studied model organisms, exhibits superior ability to be used as engineered host for fermentation of desirable products. However, few studies have reported about genome reduction of L. lactis as a clean background for functional genomic studies and a model chassis for desirable product fermentation. Four large nonessential DNA regions accounting for 2.83% in L. lactis NZ9000 (L. lactis 9 k) genome (2,530,294 bp) were deleted using the Cre-loxP deletion system as the first steps toward a minimized genome in this study. The mutants were compared with the parental strain in several physiological traits and evaluated as microbial cell factories for heterologous protein production (intracellular and secretory expression) with the red fluorescent protein (RFP) and the bacteriocin leucocin C (LecC) as reporters. The four mutants grew faster, yielded enhanced biomass, achieved increased adenosine triphosphate content, and diminished maintenance demands compared with the wild strain in the two media tested. In particular, L. lactis 9 k-4 with the largest deletion was identified as the optimum candidate host for recombinant protein production. With nisin induction, not only the transcriptional efficiency but also the production levels of the expressed reporters were approximately three- to fourfold improved compared with the wild strain. The expression of lecC gene controlled with strong constitutive promoters P5 and P8 in L. lactis 9 k-4 was also improved significantly. The genome-streamlined L. lactis 9 k-4 outcompeted the parental strain in several physiological traits assessed. Moreover, L. lactis 9 k-4 exhibited good properties as platform organism for protein production. In future works, the genome of L. lactis will be maximally reduced by using our specific design

  17. The Population Genomics of Sunflowers and Genomic Determinants of Protein Evolution Revealed by RNAseq

    Directory of Open Access Journals (Sweden)

    Loren H. Rieseberg

    2012-10-01

    Full Text Available Few studies have investigated the causes of evolutionary rate variation among plant nuclear genes, especially in recently diverged species still capable of hybridizing in the wild. The recent advent of Next Generation Sequencing (NGS permits investigation of genome wide rates of protein evolution and the role of selection in generating and maintaining divergence. Here, we use individual whole-transcriptome sequencing (RNAseq to refine our understanding of the population genomics of wild species of sunflowers (Helianthus spp. and the factors that affect rates of protein evolution. We aligned 35 GB of transcriptome sequencing data and identified 433,257 polymorphic sites (SNPs in a reference transcriptome comprising 16,312 genes. Using SNP markers, we identified strong population clustering largely corresponding to the three species analyzed here (Helianthus annuus, H. petiolaris, H. debilis, with one distinct early generation hybrid. Then, we calculated the proportions of adaptive substitution fixed by selection (alpha and identified gene ontology categories with elevated values of alpha. The “response to biotic stimulus” category had the highest mean alpha across the three interspecific comparisons, implying that natural selection imposed by other organisms plays an important role in driving protein evolution in wild sunflowers. Finally, we examined the relationship between protein evolution (dN/dS ratio and several genomic factors predicted to co-vary with protein evolution (gene expression level, divergence and specificity, genetic divergence [FST], and nucleotide diversity pi. We find that variation in rates of protein divergence was correlated with gene expression level and specificity, consistent with results from a broad range of taxa and timescales. This would in turn imply that these factors govern protein evolution both at a microevolutionary and macroevolutionary timescale. Our results contribute to a general understanding of the

  18. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  19. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  20. The insulator protein SU(HW fine-tunes nuclear lamina interactions of the Drosophila genome.

    Directory of Open Access Journals (Sweden)

    Joke G van Bemmel

    Full Text Available Specific interactions of the genome with the nuclear lamina (NL are thought to assist chromosome folding inside the nucleus and to contribute to the regulation of gene expression. High-resolution mapping has recently identified hundreds of large, sharply defined lamina-associated domains (LADs in the human genome, and suggested that the insulator protein CTCF may help to demarcate these domains. Here, we report the detailed structure of LADs in Drosophila cells, and investigate the putative roles of five insulator proteins in LAD organization. We found that the Drosophila genome is also organized in discrete LADs, which are about five times smaller than human LADs but contain on average a similar number of genes. Systematic comparison to new and published insulator binding maps shows that only SU(HW binds preferentially at LAD borders and at specific positions inside LADs, while GAF, CTCF, BEAF-32 and DWG are mostly absent from these regions. By knockdown and overexpression studies we demonstrate that SU(HW weakens genome - NL interactions through a local antagonistic effect, but we did not obtain evidence that it is essential for border formation. Our results provide insights into the evolution of LAD organization and identify SU(HW as a fine-tuner of genome - NL interactions.

  1. Protein analysis in dissolved organic matter: What proteins from organic debris, soil leachate and surface water can tell us - a perspective

    Directory of Open Access Journals (Sweden)

    W. X. Schulze

    2005-01-01

    Full Text Available Mass spectrometry based analysis of proteins is widely used to study cellular processes in model organisms. However, it has not yet routinely been applied in environmental research. Based on observations that protein can readily be detected as a component of dissolved organic matter (DOM, this article gives an example about the possible use of protein analysis in ecology and environmental sciences focusing on different terrestrial ecosystems. At this stage, there are two areas of interest: (1 the identification of phylogenetic groups contributing to the environmental protein pool, and (2 identification of the organismic origin of specific enzymes that are important for ecosystem processes. In this paper, mass spectrometric protein analysis was applied to identify proteins from decomposing plant material and DOM of soil leachates and surface water samples derived from different environments. It is concluded, that mass spectrometric protein analysis is capable of distinguishing phylogenetic origin of proteins from litter protein extracts, leachates of different soil horizons, and from various sources of terrestrial surface water. Current limitation is imposed by the limited knowledge of complete genomes of soil organisms. The protein analysis allows to relate protein presence to biogeochemical processes, and to identify the source organisms for specific active enzymes. Further applications, such as in pollution research are conceivable. In summary, the analysis of proteins opens a new area of research between the fields of microbiology and biogeochemistry.

  2. Genomes2Drugs: identifies target proteins and lead drugs from proteome data.

    LENUS (Irish Health Repository)

    Toomey, David

    2009-01-01

    BACKGROUND: Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins\\/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and\\/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. METHODOLOGY\\/PRINCIPAL FINDINGS: To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. CONCLUSIONS\\/SIGNIFICANCE: Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under \\'change-of-application\\' patents.

  3. Genomes2Drugs: identifies target proteins and lead drugs from proteome data.

    Directory of Open Access Journals (Sweden)

    David Toomey

    Full Text Available BACKGROUND: Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. METHODOLOGY/PRINCIPAL FINDINGS: To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i homologous to previously crystallized proteins or (ii targets of known drugs, but are (iii not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. CONCLUSIONS/SIGNIFICANCE: Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under 'change-of-application' patents.

  4. Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).

    Science.gov (United States)

    Natale, D A; Shankavaram, U T; Galperin, M Y; Wolf, Y I; Aravind, L; Koonin, E V

    2000-01-01

    Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and

  5. Integrated genomics and proteomics of the Torpedo californica electric organ: concordance with the mammalian neuromuscular junction

    Directory of Open Access Journals (Sweden)

    Mate Suzanne E

    2011-05-01

    Full Text Available Abstract Background During development, the branchial mesoderm of Torpedo californica transdifferentiates into an electric organ capable of generating high voltage discharges to stun fish. The organ contains a high density of cholinergic synapses and has served as a biochemical model for the membrane specialization of myofibers, the neuromuscular junction (NMJ. We studied the genome and proteome of the electric organ to gain insight into its composition, to determine if there is concordance with skeletal muscle and the NMJ, and to identify novel synaptic proteins. Results Of 435 proteins identified, 300 mapped to Torpedo cDNA sequences with ≥2 peptides. We identified 14 uncharacterized proteins in the electric organ that are known to play a role in acetylcholine receptor clustering or signal transduction. In addition, two human open reading frames, C1orf123 and C6orf130, showed high sequence similarity to electric organ proteins. Our profile lists several proteins that are highly expressed in skeletal muscle or are muscle specific. Synaptic proteins such as acetylcholinesterase, acetylcholine receptor subunits, and rapsyn were present in the electric organ proteome but absent in the skeletal muscle proteome. Conclusions Our integrated genomic and proteomic analysis supports research describing a muscle-like profile of the organ. We show that it is a repository of NMJ proteins but we present limitations on its use as a comprehensive model of the NMJ. Finally, we identified several proteins that may become candidates for signaling proteins not previously characterized as components of the NMJ.

  6. The Proteins API: accessing key integrated protein and genome information.

    Science.gov (United States)

    Nightingale, Andrew; Antunes, Ricardo; Alpi, Emanuele; Bursteinas, Borisas; Gonzales, Leonardo; Liu, Wudong; Luo, Jie; Qi, Guoying; Turner, Edd; Martin, Maria

    2017-07-03

    The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to 'talk' to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. An overview on genome organization of marine organisms.

    Science.gov (United States)

    Costantini, Maria

    2015-12-01

    In this review we will concentrate on some general genome features of marine organisms and their evolution, ranging from vertebrate to invertebrates until unicellular organisms. Before genome sequencing, the ultracentrifugation in CsCl led to high resolution of mammalian DNA (without seeing at the sequence). The analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong in a small number of families characterized by different GC levels. The recent availability of a number of fully sequenced genomes allowed mapping very precisely the isochores, based on DNA sequences. Since isochores are tightly linked to biological properties such as gene density, replication timing and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function and evolution. This led the current level of knowledge and to further insights. Copyright © 2015. Published by Elsevier B.V.

  8. Genomic organization and evolution of the Atlantic salmon hemoglobin repertoire

    Directory of Open Access Journals (Sweden)

    Phillips Ruth B

    2010-10-01

    Full Text Available Abstract Background The genomes of salmonids are considered pseudo-tetraploid undergoing reversion to a stable diploid state. Given the genome duplication and extensive biological data available for salmonids, they are excellent model organisms for studying comparative genomics, evolutionary processes, fates of duplicated genes and the genetic and physiological processes associated with complex behavioral phenotypes. The evolution of the tetrapod hemoglobin genes is well studied; however, little is known about the genomic organization and evolution of teleost hemoglobin genes, particularly those of salmonids. The Atlantic salmon serves as a representative salmonid species for genomics studies. Given the well documented role of hemoglobin in adaptation to varied environmental conditions as well as its use as a model protein for evolutionary analyses, an understanding of the genomic structure and organization of the Atlantic salmon α and β hemoglobin genes is of great interest. Results We identified four bacterial artificial chromosomes (BACs comprising two hemoglobin gene clusters spanning the entire α and β hemoglobin gene repertoire of the Atlantic salmon genome. Their chromosomal locations were established using fluorescence in situ hybridization (FISH analysis and linkage mapping, demonstrating that the two clusters are located on separate chromosomes. The BACs were sequenced and assembled into scaffolds, which were annotated for putatively functional and pseudogenized hemoglobin-like genes. This revealed that the tail-to-tail organization and alternating pattern of the α and β hemoglobin genes are well conserved in both clusters, as well as that the Atlantic salmon genome houses substantially more hemoglobin genes, including non-Bohr β globin genes, than the genomes of other teleosts that have been sequenced. Conclusions We suggest that the most parsimonious evolutionary path leading to the present organization of the Atlantic salmon

  9. Genomic organization and evolution of the Atlantic salmon hemoglobin repertoire

    Science.gov (United States)

    2010-01-01

    Background The genomes of salmonids are considered pseudo-tetraploid undergoing reversion to a stable diploid state. Given the genome duplication and extensive biological data available for salmonids, they are excellent model organisms for studying comparative genomics, evolutionary processes, fates of duplicated genes and the genetic and physiological processes associated with complex behavioral phenotypes. The evolution of the tetrapod hemoglobin genes is well studied; however, little is known about the genomic organization and evolution of teleost hemoglobin genes, particularly those of salmonids. The Atlantic salmon serves as a representative salmonid species for genomics studies. Given the well documented role of hemoglobin in adaptation to varied environmental conditions as well as its use as a model protein for evolutionary analyses, an understanding of the genomic structure and organization of the Atlantic salmon α and β hemoglobin genes is of great interest. Results We identified four bacterial artificial chromosomes (BACs) comprising two hemoglobin gene clusters spanning the entire α and β hemoglobin gene repertoire of the Atlantic salmon genome. Their chromosomal locations were established using fluorescence in situ hybridization (FISH) analysis and linkage mapping, demonstrating that the two clusters are located on separate chromosomes. The BACs were sequenced and assembled into scaffolds, which were annotated for putatively functional and pseudogenized hemoglobin-like genes. This revealed that the tail-to-tail organization and alternating pattern of the α and β hemoglobin genes are well conserved in both clusters, as well as that the Atlantic salmon genome houses substantially more hemoglobin genes, including non-Bohr β globin genes, than the genomes of other teleosts that have been sequenced. Conclusions We suggest that the most parsimonious evolutionary path leading to the present organization of the Atlantic salmon hemoglobin genes involves

  10. Multiple roles of genome-attached bacteriophage terminal proteins

    International Nuclear Information System (INIS)

    Redrejo-Rodríguez, Modesto; Salas, Margarita

    2014-01-01

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer

  11. Multiple roles of genome-attached bacteriophage terminal proteins

    Energy Technology Data Exchange (ETDEWEB)

    Redrejo-Rodríguez, Modesto; Salas, Margarita, E-mail: msalas@cbm.csic.es

    2014-11-15

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer.

  12. Pdsg1 and Pdsg2, novel proteins involved in developmental genome remodelling in Paramecium.

    Directory of Open Access Journals (Sweden)

    Miroslav Arambasic

    Full Text Available The epigenetic influence of maternal cells on the development of their progeny has long been studied in various eukaryotes. Multicellular organisms usually provide their zygotes not only with nutrients but also with functional elements required for proper development, such as coding and non-coding RNAs. These maternally deposited RNAs exhibit a variety of functions, from regulating gene expression to assuring genome integrity. In ciliates, such as Paramecium these RNAs participate in the programming of large-scale genome reorganization during development, distinguishing germline-limited DNA, which is excised, from somatic-destined DNA. Only a handful of proteins playing roles in this process have been identified so far, including typical RNAi-derived factors such as Dicer-like and Piwi proteins. Here we report and characterize two novel proteins, Pdsg1 and Pdsg2 (Paramecium protein involved in Development of the Somatic Genome 1 and 2, involved in Paramecium genome reorganization. We show that these proteins are necessary for the excision of germline-limited DNA during development and the survival of sexual progeny. Knockdown of PDSG1 and PDSG2 genes affects the populations of small RNAs known to be involved in the programming of DNA elimination (scanRNAs and iesRNAs and chromatin modification patterns during development. Our results suggest an association between RNA-mediated trans-generational epigenetic signal and chromatin modifications in the process of Paramecium genome reorganization.

  13. Pdsg1 and Pdsg2, novel proteins involved in developmental genome remodelling in Paramecium.

    Science.gov (United States)

    Arambasic, Miroslav; Sandoval, Pamela Y; Hoehener, Cristina; Singh, Aditi; Swart, Estienne C; Nowacki, Mariusz

    2014-01-01

    The epigenetic influence of maternal cells on the development of their progeny has long been studied in various eukaryotes. Multicellular organisms usually provide their zygotes not only with nutrients but also with functional elements required for proper development, such as coding and non-coding RNAs. These maternally deposited RNAs exhibit a variety of functions, from regulating gene expression to assuring genome integrity. In ciliates, such as Paramecium these RNAs participate in the programming of large-scale genome reorganization during development, distinguishing germline-limited DNA, which is excised, from somatic-destined DNA. Only a handful of proteins playing roles in this process have been identified so far, including typical RNAi-derived factors such as Dicer-like and Piwi proteins. Here we report and characterize two novel proteins, Pdsg1 and Pdsg2 (Paramecium protein involved in Development of the Somatic Genome 1 and 2), involved in Paramecium genome reorganization. We show that these proteins are necessary for the excision of germline-limited DNA during development and the survival of sexual progeny. Knockdown of PDSG1 and PDSG2 genes affects the populations of small RNAs known to be involved in the programming of DNA elimination (scanRNAs and iesRNAs) and chromatin modification patterns during development. Our results suggest an association between RNA-mediated trans-generational epigenetic signal and chromatin modifications in the process of Paramecium genome reorganization.

  14. Genome organization, instabilities, stem cells, and cancer

    Directory of Open Access Journals (Sweden)

    Senthil Kumar Pazhanisamy

    2009-01-01

    Full Text Available It is now widely recognized that advances in exploring genome organization provide remarkable insights on the induction and progression of chromosome abnormalities. Much of what we know about how mutations evolve and consequently transform into genome instabilities has been characterized in the spatial organization context of chromatin. Nevertheless, many underlying concepts of impact of the chromatin organization on perpetuation of multiple mutations and on propagation of chromosomal aberrations remain to be investigated in detail. Genesis of genome instabilities from accumulation of multiple mutations that drive tumorigenesis is increasingly becoming a focal theme in cancer studies. This review focuses on structural alterations evolve to raise a variety of genome instabilities that are manifested at the nucleotide, gene or sub-chromosomal, and whole chromosome level of genome. Here we explore an underlying connection between genome instability and cancer in the light of genome architecture. This review is limited to studies directed towards spatial organizational aspects of origin and propagation of aberrations into genetically unstable tumors.

  15. MIPS: analysis and annotation of proteins from whole genomes in 2005.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

    2006-01-01

    The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).

  16. Primary structure of the human follistatin precursor and its genomic organization

    International Nuclear Information System (INIS)

    Shimasaki, Shunichi; Koga, Makoto; Esch, F.

    1988-01-01

    Follistatin is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release. By use of the recently characterized porcine follistatin cDNA as a probe to screen a human testis cDNA library and a genomic library, the structure of the complete human follistatin precursor as well as its genomic organization have been determined. Three of eight cDNA clones that were sequenced predicted a precursor with 344 amino acids, whereas the remaining five cDNA clones encoded a 317 amino acid precursor, resulting from alternative splicing of the precursor mRNA. Mature follistatins contain four contiguous domains that are encoded by precisely separated exons; three of the domains are highly similar to each other, as well as to human epidermal growth factor and human pancreatic secretory trypsin inhibitor. The genomic organization of the human follistatin is similar to that of the human epidermal growth factor gene and thus supports the notion of exon shuffling during evolution

  17. The candidate phylum Poribacteria by single-cell genomics: new insights into phylogeny, cell-compartmentation, eukaryote-like repeat proteins, and other genomic features.

    Directory of Open Access Journals (Sweden)

    Janine Kamke

    Full Text Available The candidate phylum Poribacteria is one of the most dominant and widespread members of the microbial communities residing within marine sponges. Cell compartmentalization had been postulated along with their discovery about a decade ago and their phylogenetic association to the Planctomycetes, Verrucomicrobia, Chlamydiae superphylum was proposed soon thereafter. In the present study we revised these features based on genomic data obtained from six poribacterial single cells. We propose that Poribacteria form a distinct monophyletic phylum contiguous to the PVC superphylum together with other candidate phyla. Our genomic analyses supported the possibility of cell compartmentalization in form of bacterial microcompartments. Further analyses of eukaryote-like protein domains stressed the importance of such proteins with features including tetratricopeptide repeats, leucin rich repeats as well as low density lipoproteins receptor repeats, the latter of which are reported here for the first time from a sponge symbiont. Finally, examining the most abundant protein domain family on poribacterial genomes revealed diverse phyH family proteins, some of which may be related to dissolved organic posphorus uptake.

  18. Protein annotation in the era of personal genomics

    DEFF Research Database (Denmark)

    Holberg Blicher, Thomas; Gupta, Ramneek; Wesolowska, Agata

    2010-01-01

    the differences between many individuals of the same species-humans in particular-the focus needs be on the functional impact of individual residue variation. To fulfil the promises of personal genomics, we need to start asking not only what is in a genome but also how millions of small differences between......Protein annotation provides a condensed and systematic view on the function of individual proteins. It has traditionally dealt with sorting proteins into functional categories, which for example has proven to be successful for the comparison of different species. However, if we are to understand...... individual genomes affect protein function and in turn human health. Copyright © 2010 Elsevier Ltd. All rights reserved....

  19. MIPS: a database for protein sequences and complete genomes.

    Science.gov (United States)

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  20. Genomic organization of the rat alpha 2u-globulin gene cluster.

    Science.gov (United States)

    McFadyen, D A; Addison, W; Locke, J

    1999-05-01

    The alpha 2u-globulin are a group of similar proteins, belonging to the lipocalin superfamily of proteins, that are synthesized in a subset of secretory tissues in rats. The many alpha 2u-globulin isoforms are encoded by a multigene family that exhibits extensive homology. Despite a high degree of sequence identity, individual family members show diverse expression patterns involving complex hormonal, tissue-specific, and developmental regulation. Analysis suggests that there are approximately 20 alpha 2u-globulin genes in the rat genome. We have used fluorescence in situ hybridization (FISH) to show that the alpha 2u-globulin genes are clustered at a single site on rat Chromosome (Chr) 5 (5q22-24). Southern blots of rat genomic DNA separated by pulsed field gel electrophoresis indicated that the alpha 2u-globulin genes are contained on two NruI fragments with a total size of 880 kbp. Analysis of three P1 clones containing alpha 2u-globulin genes indicated that the alpha 2u-globulin genes are tandemly arranged in a head-to-tail fashion. The organization of the alpha 2u-globulin genes in the rat as a tandem array of single genes differs from the homologous major urinary protein genes in the mouse, which are organized as tandem arrays of divergently oriented gene pairs. The structure of these gene clusters may have consequences for the proposed function, as a pheromone transporter, for the protein products encoded by these genes.

  1. The DNA-encoded nucleosome organization of a eukaryotic genome.

    Science.gov (United States)

    Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran

    2009-03-19

    Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.

  2. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  3. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  4. Complete nucleotide sequence and genome organization of Olive latent virus 3, a new putative member of the family Tymoviridae.

    Science.gov (United States)

    Alabdullah, Abdulkader; Minafra, Angelantonio; Elbeaino, Toufic; Saponari, Maria; Savino, Vito; Martelli, Giovanni P

    2010-09-01

    The complete nucleotide sequence and the genome organization were determined of a putative new member of the family Tymoviridae, tentatively named Olive latent virus 3 (OLV-3), recovered in southern Italy from a symptomless olive tree. The sequenced ssRNA genome comprises 7148 nucleotides excluding the poly(A) tail and contains four open reading frames (ORFs). ORF1 encodes a polyprotein of 221.6kDa in size, containing the conserved signatures of the methyltransferase (MTR), papain-like protease (PRO), helicase (HEL) and RNA-dependent RNA polymerase (RdRp) domains of the replication-associated proteins of positive-strand RNA viruses. ORF2 overlaps completely ORF1 and encodes a putative protein of 43.33kDa showing limited sequence similarity with the putative movement protein of Maize rayado fino virus (MRFV). ORF3 codes for a protein with predicted molecular mass of 28.46kDa, identified as the coat protein (CP), whereas ORF4 overlaps ORF3 and encodes a putative protein of 16kDa with sequence similarity to the p16 and p31 proteins of Citrus sudden death-associated virus (CSDaV) and Grapevine fleck virus (GFkV), respectively. Within the family Tymoviridae, OLV-3 genome has the closest identity level (49-52%) with members of the genus Marafivirus, from which, however, it differs because of the diverse genome organization and the presence of a single type of CP subunits. Copyright (c) 2010 Elsevier B.V. All rights reserved.

  5. Honey bee protein atlas at organ-level resolution.

    Science.gov (United States)

    Chan, Queenie W T; Chan, Man Yi; Logan, Michelle; Fang, Yuan; Higo, Heather; Foster, Leonard J

    2013-11-01

    Genome sequencing has provided us with gene lists but cannot tell us where and how their encoded products work together to support life. Complex organisms rely on differential expression of subsets of genes/proteins in organs and tissues, and, in concert, evolved to their present state as they function together to improve an organism's overall reproductive fitness. Proteomics studies of individual organs help us understand their basic functions, but this reductionist approach misses the larger context of the whole organism. This problem could be circumvented if all the organs in an organism were comprehensively studied by the same methodology and analyzed together. Using honey bees (Apis mellifera L.) as a model system, we report here an initial whole proteome of a complex organism, measuring 29 different organ/tissue types among the three honey bee castes: queen, drone, and worker. The data reveal that, e.g., workers have a heightened capacity to deal with environmental toxins and queens have a far more robust pheromone detection system than their nestmates. The data also suggest that workers altruistically sacrifice not only their own reproductive capacity but also their immune potential in favor of their queen. Finally, organ-level resolution of protein expression offers a systematic insight into how organs may have developed.

  6. Complete genome sequence of Klebsiella pneumoniae J1, a protein-based microbial flocculant-producing bacterium.

    Science.gov (United States)

    Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan

    2016-02-20

    Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Occupancy of chromatin organizers in the Epstein-Barr virus genome.

    Science.gov (United States)

    Holdorf, Meghan M; Cooper, Samantha B; Yamamoto, Keith R; Miranda, J J L

    2011-06-20

    The human CCCTC-binding factor, CTCF, regulates transcription of the double-stranded DNA genomes of herpesviruses. The architectural complex cohesin and RNA Polymerase II also contribute to this organization. We profiled the occupancy of CTCF, cohesin, and RNA Polymerase II on the episomal genome of the Epstein-Barr virus in a cell culture model of latent infection. CTCF colocalizes with cohesin but not RNA Polymerase II. CTCF and cohesin bind specific sequences throughout the genome that are found not just proximal to the regulatory elements of latent genes, but also near lytic genes. In addition to tracking with known transcripts, RNA Polymerase II appears at two unannotated positions, one of which lies within the latent origin of replication. The widespread occupancy profile of each protein reveals binding near or at a myriad of regulatory elements and suggests context-dependent functions. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. Preserving genome integrity: the DdrA protein of Deinococcus radiodurans R1.

    Science.gov (United States)

    Harris, Dennis R; Tanaka, Masashi; Saveliev, Sergei V; Jolivet, Edmond; Earl, Ashlee M; Cox, Michael M; Battista, John R

    2004-10-01

    The bacterium Deinococcus radiodurans can withstand extraordinary levels of ionizing radiation, reflecting an equally extraordinary capacity for DNA repair. The hypothetical gene product DR0423 has been implicated in the recovery of this organism from DNA damage, indicating that this protein is a novel component of the D. radiodurans DNA repair system. DR0423 is a homologue of the eukaryotic Rad52 protein. Following exposure to ionizing radiation, DR0423 expression is induced relative to an untreated control, and strains carrying a deletion of the DR0423 gene exhibit increased sensitivity to ionizing radiation. When recovering from ionizing-radiation-induced DNA damage in the absence of nutrients, wild-type D. radiodurans reassembles its genome while the mutant lacking DR0423 function does not. In vitro, the purified DR0423 protein binds to single-stranded DNA with an apparent affinity for 3' ends, and protects those ends from nuclease degradation. We propose that DR0423 is part of a DNA end-protection system that helps to preserve genome integrity following exposure to ionizing radiation. We designate the DR0423 protein as DNA damage response A protein.

  9. Preserving genome integrity: the DdrA protein of Deinococcus radiodurans R1.

    Directory of Open Access Journals (Sweden)

    Dennis R Harris

    2004-10-01

    Full Text Available The bacterium Deinococcus radiodurans can withstand extraordinary levels of ionizing radiation, reflecting an equally extraordinary capacity for DNA repair. The hypothetical gene product DR0423 has been implicated in the recovery of this organism from DNA damage, indicating that this protein is a novel component of the D. radiodurans DNA repair system. DR0423 is a homologue of the eukaryotic Rad52 protein. Following exposure to ionizing radiation, DR0423 expression is induced relative to an untreated control, and strains carrying a deletion of the DR0423 gene exhibit increased sensitivity to ionizing radiation. When recovering from ionizing-radiation-induced DNA damage in the absence of nutrients, wild-type D. radiodurans reassembles its genome while the mutant lacking DR0423 function does not. In vitro, the purified DR0423 protein binds to single-stranded DNA with an apparent affinity for 3' ends, and protects those ends from nuclease degradation. We propose that DR0423 is part of a DNA end-protection system that helps to preserve genome integrity following exposure to ionizing radiation. We designate the DR0423 protein as DNA damage response A protein.

  10. Lentiviral Delivery of Proteins for Genome Engineering.

    Science.gov (United States)

    Cai, Yujia; Mikkelsen, Jacob Giehm

    2016-01-01

    Viruses have evolved to traverse cellular barriers and travel to the nucleus by mechanisms that involve active transport through the cytoplasm and viral quirks to resist cellular restriction factors and innate immune responses. Virus-derived vector systems exploit the capacity of viruses to ferry genetic information into cells, and now - more than three decades after the discovery of HIV - lentiviral vectors based on HIV-1 have become instrumental in biomedical research and gene therapies that require genomic insertion of transgenes. By now, the efficacy of lentiviral gene delivery to stem cells, cells of the immune system including T cells, hepatic cells, and many other therapeutically relevant cell types is well established. Along with nucleic acids, HIV-1 virions carry the enzymatic tools that are essential for early steps of infection. Such capacity to package enzymes, even proteins of nonviral origin, has unveiled new ways of exploiting cellular intrusion of HIV-1. Based on early findings demonstrating the packaging of heterologous proteins into virus particles as part of the Gag and GagPol polypeptides, we have established lentiviral protein transduction for delivery of DNA transposases and designer nucleases. This strategy for delivering genome-engineering proteins facilitates high enzymatic activity within a short time frame and may potentially improve the safety of genome editing. Exploiting the full potential of lentiviral vectors, incorporation of foreign protein can be combined with the delivery of DNA transposons or a donor sequence for homology-directed repair in so-called 'all-in-one' lentiviral vectors. Here, we briefly describe intracellular restrictions that may affect lentiviral gene and protein delivery and review the current status of lentiviral particles as carriers of tool kits for genome engineering.

  11. Structure and expression strategy of the genome of Culex pipiens densovirus, a mosquito densovirus with an ambisense organization.

    Science.gov (United States)

    Baquerizo-Audiot, Elizabeth; Abd-Alla, Adly; Jousset, Françoise-Xavière; Cousserans, François; Tijssen, Peter; Bergoin, Max

    2009-07-01

    The genome of all densoviruses (DNVs) so far isolated from mosquitoes or mosquito cell lines consists of a 4-kb single-stranded DNA molecule with a monosense organization (genus Brevidensovirus, subfamily Densovirinae). We previously reported the isolation of a Culex pipiens DNV (CpDNV) that differs significantly from brevidensoviruses by (i) having a approximately 6-kb genome, (ii) lacking sequence homology, and (iii) lacking antigenic cross-reactivity with Brevidensovirus capsid polypeptides. We report here the sequence organization and transcription map of this virus. The cloned genome of CpDNV is 5,759 nucleotides (nt) long, and it possesses an inverted terminal repeat (ITR) of 285 nt and an ambisense organization of its genes. The nonstructural (NS) proteins NS-1, NS-2, and NS-3 are located in the 5' half of one strand and are organized into five open reading frames (ORFs) due to the split of both NS-1 and NS-2 into two ORFs. The ORF encoding capsid polypeptides is located in the 5' half of the complementary strand. The expression of NS proteins is controlled by two promoters, P7 and P17, driving the transcription of a 2.4-kb mRNA encoding NS-3 and of a 1.8-kb mRNA encoding NS-1 and NS-2, respectively. The two NS mRNAs species are spliced off a 53-nt sequence. Capsid proteins are translated from an unspliced 2.3-kb mRNA driven by the P88 promoter. CpDNV thus appears as a new type of mosquito DNV, and based on the overall organization and expression modalities of its genome, it may represent the prototype of a new genus of DNV.

  12. Genomic analysis of Xenopus organizer function

    Directory of Open Access Journals (Sweden)

    Suhai Sándor

    2006-06-01

    Full Text Available Abstract Background Studies of the Xenopus organizer have laid the foundation for our understanding of the conserved signaling pathways that pattern vertebrate embryos during gastrulation. The two primary activities of the organizer, BMP and Wnt inhibition, can regulate a spectrum of genes that pattern essentially all aspects of the embryo during gastrulation. As our knowledge of organizer signaling grows, it is imperative that we begin knitting together our gene-level knowledge into genome-level signaling models. The goal of this paper was to identify complete lists of genes regulated by different aspects of organizer signaling, thereby providing a deeper understanding of the genomic mechanisms that underlie these complex and fundamental signaling events. Results To this end, we ectopically overexpress Noggin and Dkk-1, inhibitors of the BMP and Wnt pathways, respectively, within ventral tissues. After isolating embryonic ventral halves at early and late gastrulation, we analyze the transcriptional response to these molecules within the generated ectopic organizers using oligonucleotide microarrays. An efficient statistical analysis scheme, combined with a new Gene Ontology biological process annotation of the Xenopus genome, allows reliable and faithful clustering of molecules based upon their roles during gastrulation. From this data, we identify new organizer-related expression patterns for 19 genes. Moreover, our data sub-divides organizer genes into separate head and trunk organizing groups, which each show distinct responses to Noggin and Dkk-1 activity during gastrulation. Conclusion Our data provides a genomic view of the cohorts of genes that respond to Noggin and Dkk-1 activity, allowing us to separate the role of each in organizer function. These patterns demonstrate a model where BMP inhibition plays a largely inductive role during early developmental stages, thereby initiating the suites of genes needed to pattern dorsal tissues

  13. The complete genome of Zunongwangia profunda SM-A87 reveals its adaptation to the deep-sea environment and ecological role in sedimentary organic nitrogen degradation

    Directory of Open Access Journals (Sweden)

    Zhou Bai-Cheng

    2010-04-01

    Full Text Available Abstract Background Zunongwangia profunda SM-A87, which was isolated from deep-sea sediment, is an aerobic, gram-negative bacterium that represents a new genus of Flavobacteriaceae. This is the first sequenced genome of a deep-sea bacterium from the phylum Bacteroidetes. Results The Z. profunda SM-A87 genome has a single 5 128 187-bp circular chromosome with no extrachromosomal elements and harbors 4 653 predicted protein-coding genes. SM-A87 produces a large amount of capsular polysaccharides and possesses two polysaccharide biosynthesis gene clusters. It has a total of 130 peptidases, 61 of which have signal peptides. In addition to extracellular peptidases, SM-A87 also has various extracellular enzymes for carbohydrate, lipid and DNA degradation. These extracellular enzymes suggest that the bacterium is able to hydrolyze organic materials in the sediment, especially carbohydrates and proteinaceous organic nitrogen. There are two clustered regularly interspaced short palindromic repeats in the genome, but their spacers do not match any sequences in the public sequence databases. SM-A87 is a moderate halophile. Our protein isoelectric point analysis indicates that extracellular proteins have lower predicted isoelectric points than intracellular proteins. SM-A87 accumulates organic osmolytes in the cell, so its extracelluar proteins are more halophilic than its intracellular proteins. Conclusion Here, we present the first complete genome of a deep-sea sedimentary bacterium from the phylum Bacteroidetes. The genome analysis shows that SM-A87 has some common features of deep-sea bacteria, as well as an important capacity to hydrolyze sedimentary organic nitrogen.

  14. Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms.

    Science.gov (United States)

    Mattick, John S

    2003-10-01

    The central dogma of biology holds that genetic information normally flows from DNA to RNA to protein. As a consequence it has been generally assumed that genes generally code for proteins, and that proteins fulfil not only most structural and catalytic but also most regulatory functions, in all cells, from microbes to mammals. However, the latter may not be the case in complex organisms. A number of startling observations about the extent of non-protein-coding RNA (ncRNA) transcription in the higher eukaryotes and the range of genetic and epigenetic phenomena that are RNA-directed suggests that the traditional view of the structure of genetic regulatory systems in animals and plants may be incorrect. ncRNA dominates the genomic output of the higher organisms and has been shown to control chromosome architecture, mRNA turnover and the developmental timing of protein expression, and may also regulate transcription and alternative splicing. This paper re-examines the available evidence and suggests a new framework for considering and understanding the genomic programming of biological complexity, autopoietic development and phenotypic variation. Copyright 2003 Wiley Periodicals, Inc.

  15. Genomes to Proteomes

    Energy Technology Data Exchange (ETDEWEB)

    Panisko, Ellen A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Grigoriev, Igor [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Daly, Don S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Webb-Robertson, Bobbie-Jo [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Baker, Scott E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2009-03-01

    Biologists are awash with genomic sequence data. In large part, this is due to the rapid acceleration in the generation of DNA sequence that occurred as public and private research institutes raced to sequence the human genome. In parallel with the large human genome effort, mostly smaller genomes of other important model organisms were sequenced. Projects following on these initial efforts have made use of technological advances and the DNA sequencing infrastructure that was built for the human and other organism genome projects. As a result, the genome sequences of many organisms are available in high quality draft form. While in many ways this is good news, there are limitations to the biological insights that can be gleaned from DNA sequences alone; genome sequences offer only a bird's eye view of the biological processes endemic to an organism or community. Fortunately, the genome sequences now being produced at such a high rate can serve as the foundation for other global experimental platforms such as proteomics. Proteomic methods offer a snapshot of the proteins present at a point in time for a given biological sample. Current global proteomics methods combine enzymatic digestion, separations, mass spectrometry and database searching for peptide identification. One key aspect of proteomics is the prediction of peptide sequences from mass spectrometry data. Global proteomic analysis uses computational matching of experimental mass spectra with predicted spectra based on databases of gene models that are often generated computationally. Thus, the quality of gene models predicted from a genome sequence is crucial in the generation of high quality peptide identifications. Once peptides are identified they can be assigned to their parent protein. Proteins identified as expressed in a given experiment are most useful when compared to other expressed proteins in a larger biological context or biochemical pathway. In this chapter we will discuss the automatic

  16. Archaeal Genome Guardians Give Insights into Eukaryotic DNA Replication and Damage Response Proteins

    Directory of Open Access Journals (Sweden)

    David S. Shin

    2014-01-01

    Full Text Available As the third domain of life, archaea, like the eukarya and bacteria, must have robust DNA replication and repair complexes to ensure genome fidelity. Archaea moreover display a breadth of unique habitats and characteristics, and structural biologists increasingly appreciate these features. As archaea include extremophiles that can withstand diverse environmental stresses, they provide fundamental systems for understanding enzymes and pathways critical to genome integrity and stress responses. Such archaeal extremophiles provide critical data on the periodic table for life as well as on the biochemical, geochemical, and physical limitations to adaptive strategies allowing organisms to thrive under environmental stress relevant to determining the boundaries for life as we know it. Specifically, archaeal enzyme structures have informed the architecture and mechanisms of key DNA repair proteins and complexes. With added abilities to temperature-trap flexible complexes and reveal core domains of transient and dynamic complexes, these structures provide insights into mechanisms of maintaining genome integrity despite extreme environmental stress. The DNA damage response protein structures noted in this review therefore inform the basis for genome integrity in the face of environmental stress, with implications for all domains of life as well as for biomanufacturing, astrobiology, and medicine.

  17. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    Science.gov (United States)

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-03-01

    Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org

  18. Sampling the genomic pool of protein tyrosine kinase genes using the polymerase chain reaction with genomic DNA.

    Science.gov (United States)

    Oates, A C; Wollberg, P; Achen, M G; Wilks, A F

    1998-08-28

    The polymerase chain reaction (PCR), with cDNA as template, has been widely used to identify members of protein families from many species. A major limitation of using cDNA in PCR is that detection of a family member is dependent on temporal and spatial patterns of gene expression. To circumvent this restriction, and in order to develop a technique that is broadly applicable we have tested the use of genomic DNA as PCR template to identify members of protein families in an expression-independent manner. This test involved amplification of DNA encoding protein tyrosine kinase (PTK) genes from the genomes of three animal species that are well known development models; namely, the mouse Mus musculus, the fruit fly Drosophila melanogaster, and the nematode worm Caenorhabditis elegans. Ten PTK genes were identified from the mouse, 13 from the fruit fly, and 13 from the nematode worm. Among these kinases were 13 members of the PTK family that had not been reported previously. Selected PTKs from this screen were shown to be expressed during development, demonstrating that the amplified fragments did not arise from pseudogenes. This approach will be useful for the identification of many novel members of gene families in organisms of agricultural, medical, developmental and evolutionary significance and for analysis of gene families from any species, or biological sample whose habitat precludes the isolation of mRNA. Furthermore, as a tool to hasten the discovery of members of gene families that are of particular interest, this method offers an opportunity to sample the genome for new members irrespective of their expression pattern.

  19. Universal features in the genome-level evolution of protein domains.

    Science.gov (United States)

    Cosentino Lagomarsino, Marco; Sellerio, Alessandro L; Heijning, Philip D; Bassetti, Bruno

    2009-01-01

    Protein domains can be used to study proteome evolution at a coarse scale. In particular, they are found on genomes with notable statistical distributions. It is known that the distribution of domains with a given topology follows a power law. We focus on a further aspect: these distributions, and the number of distinct topologies, follow collective trends, or scaling laws, depending on the total number of domains only, and not on genome-specific features. We present a stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication. Furthermore, we study a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates that the domains of a genome are built with a combination of specificity and robust self-organizing phenomena. The latter are related to the basic evolutionary 'moves' of duplication and innovation, and give rise to the observed scaling laws, a priori of the specific evolutionary history of a genome. We interpret this as the concurrent effect of neutral and selective drives, which increase duplication and decrease innovation in larger and more complex genomes. The validity of our model would imply that the empirical observation of a small number of folds in nature may be a consequence of their evolution.

  20. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    Science.gov (United States)

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  1. Building a complete image of genome regulation in the model organism Escherichia coli.

    Science.gov (United States)

    Ishihama, Akira

    2018-01-15

    The model organism, Escherichia coli, contains a total of more than 4,500 genes, but the total number of RNA polymerase (RNAP) core enzyme or the transcriptase is only about 2,000 molecules per genome. The regulatory targets of RNAP are, however, modulated by changing its promoter selectivity through two-steps of protein-protein interplay with 7 species of the sigma factor in the first step, and then 300 species of the transcription factor (TF) in the second step. Scientists working in the field of prokaryotic transcription in Japan have made considerable contributions to the elucidation of genetic frameworks and regulatory modes of the genome transcription in E. coli K-12. This review summarizes the findings by this group, first focusing on three sigma factors, the stationary-phase sigma RpoS, the heat-shock sigma RpoH, and the flagellar-chemotaxis sigma RpoF, as examples. It also presents an overview of the current state of the systematic research being carried out to identify the regulatory functions of all TFs from a single and the same bacterium E. coli K-12, using the genomic SELEX and PS-TF screening systems. All these studies have been undertaken with the aim of understanding the genome regulation in E. coli K-12 as a whole.

  2. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  3. Genome, secretome and glucose transport highlight unique features of the protein production host Pichia pastoris

    Directory of Open Access Journals (Sweden)

    Mattanovich Diethard

    2009-06-01

    Full Text Available Abstract Background Pichia pastoris is widely used as a production platform for heterologous proteins and model organism for organelle proliferation. Without a published genome sequence available, strain and process development relied mainly on analogies to other, well studied yeasts like Saccharomyces cerevisiae. Results To investigate specific features of growth and protein secretion, we have sequenced the 9.4 Mb genome of the type strain DSMZ 70382 and analyzed the secretome and the sugar transporters. The computationally predicted secretome consists of 88 ORFs. When grown on glucose, only 20 proteins were actually secreted at detectable levels. These data highlight one major feature of P. pastoris, namely the low contamination of heterologous proteins with host cell protein, when applying glucose based expression systems. Putative sugar transporters were identified and compared to those of related yeast species. The genome comprises 2 homologs to S. cerevisiae low affinity transporters and 2 to high affinity transporters of other Crabtree negative yeasts. Contrary to other yeasts, P. pastoris possesses 4 H+/glycerol transporters. Conclusion This work highlights significant advantages of using the P. pastoris system with glucose based expression and fermentation strategies. As only few proteins and no proteases are actually secreted on glucose, it becomes evident that cell lysis is the relevant cause of proteolytic degradation of secreted proteins. The endowment with hexose transporters, dominantly of the high affinity type, limits glucose uptake rates and thus overflow metabolism as observed in S. cerevisiae. The presence of 4 genes for glycerol transporters explains the high specific growth rates on this substrate and underlines the suitability of a glycerol/glucose based fermentation strategy. Furthermore, we present an open access web based genome browser http://www.pichiagenome.org.

  4. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Science.gov (United States)

    Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  5. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  6. HIV Genome-Wide Protein Associations: a Review of 30 Years of Research

    Science.gov (United States)

    2016-01-01

    SUMMARY The HIV genome encodes a small number of viral proteins (i.e., 16), invariably establishing cooperative associations among HIV proteins and between HIV and host proteins, to invade host cells and hijack their internal machineries. As a known example, the HIV envelope glycoprotein GP120 is closely associated with GP41 for viral entry. From a genome-wide perspective, a hypothesis can be worked out to determine whether 16 HIV proteins could develop 120 possible pairwise associations either by physical interactions or by functional associations mediated via HIV or host molecules. Here, we present the first systematic review of experimental evidence on HIV genome-wide protein associations using a large body of publications accumulated over the past 3 decades. Of 120 possible pairwise associations between 16 HIV proteins, at least 34 physical interactions and 17 functional associations have been identified. To achieve efficient viral replication and infection, HIV protein associations play essential roles (e.g., cleavage, inhibition, and activation) during the HIV life cycle. In either a dispensable or an indispensable manner, each HIV protein collaborates with another viral protein to accomplish specific activities that precisely take place at the proper stages of the HIV life cycle. In addition, HIV genome-wide protein associations have an impact on anti-HIV inhibitors due to the extensive cross talk between drug-inhibited proteins and other HIV proteins. Overall, this study presents for the first time a comprehensive overview of HIV genome-wide protein associations, highlighting meticulous collaborations between all viral proteins during the HIV life cycle. PMID:27357278

  7. Filtering high-throughput protein-protein interaction data using a combination of genomic features

    Directory of Open Access Journals (Sweden)

    Patil Ashwini

    2005-04-01

    Full Text Available Abstract Background Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. Results In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90% and good specificity (63%. We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at http://helix.protein.osaka-u.ac.jp/htp/. Conclusion A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.

  8. Deorphanizing the human transmembrane genome: A landscape of uncharacterized membrane proteins.

    Science.gov (United States)

    Babcock, Joseph J; Li, Min

    2014-01-01

    The sequencing of the human genome has fueled the last decade of work to functionally characterize genome content. An important subset of genes encodes membrane proteins, which are the targets of many drugs. They reside in lipid bilayers, restricting their endogenous activity to a relatively specialized biochemical environment. Without a reference phenotype, the application of systematic screens to profile candidate membrane proteins is not immediately possible. Bioinformatics has begun to show its effectiveness in focusing the functional characterization of orphan proteins of a particular functional class, such as channels or receptors. Here we discuss integration of experimental and bioinformatics approaches for characterizing the orphan membrane proteome. By analyzing the human genome, a landscape reference for the human transmembrane genome is provided.

  9. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  10. Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae(S. mikatae, S. bayanus, and S. paradoxus, or a long time ago (Neurospora crassa and Schizosaccharomyces pombe, contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting.

  11. Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions.

    Directory of Open Access Journals (Sweden)

    Pedro Beltrao

    2005-08-01

    Full Text Available Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae(S. mikatae, S. bayanus, and S. paradoxus, or a long time ago (Neurospora crassa and Schizosaccharomyces pombe, contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting.

  12. Impact of nuclear organization and chromatin structure on DNA repair and genome stability

    International Nuclear Information System (INIS)

    Batte, Amandine

    2016-01-01

    The non-random organization of the eukaryotic cell nucleus and the folding of genome in chromatin more or less condensed can influence many functions related to DNA metabolism, including genome stability. Double-strand breaks (DSBs) are the most deleterious DNA damages for the cells. To preserve genome integrity, eukaryotic cells thus developed DSB repair mechanisms conserved from yeast to human, among which homologous recombination (HR) that uses an intact homologous sequence to repair a broken chromosome. HR can be separated in two sub-pathways: Gene Conversion (GC) transfers genetic information from one molecule to its homologous and Break Induced Replication (BIR) establishes a replication fork than can proceed until the chromosome end. My doctorate work was focused on the contribution of the chromatin context and 3D genome organization on DSB repair. In S. cerevisiae, nuclear organization and heterochromatin spreading at sub-telomeres can be modified through the overexpression of the Sir3 or sir3A2Q mutant proteins. We demonstrated that reducing the physical distance between homologous sequences increased GC rates, reinforcing the notion that homology search is a limiting step for recombination. We also showed that hetero-chromatinization of DSB site fine-tunes DSB resection, limiting the loss of the DSB ends required to perform homology search and complete HR. Finally, we noticed that the presence of heterochromatin at the donor locus decreased both GC and BIR efficiencies, probably by affecting strand invasion. This work highlights new regulatory pathways of DNA repair. (author) [fr

  13. Organization of plastid genomes in the freshwater red algal order Batrachospermales (Rhodophyta).

    Science.gov (United States)

    Paiano, Monica Orlandi; Del Cortona, Andrea; Costa, Joana F; Liu, Shao-Lun; Verbruggen, Heroen; De Clerck, Olivier; Necchi, Orlando

    2018-02-01

    Little is known about genome organization in members of the order Batrachospermales, and the infra-ordinal relationship remains unresolved. Plastid (cp) genomes of seven members of the freshwater red algal order Batrachospermales were sequenced, with the following aims: (i) to describe the characteristics of cp genomes and compare these with other red algal groups; (ii) to infer the phylogenetic relationships among these members to better understand the infra-ordinal classification. Cp genomes of Batrachospermales are large, with several cases of gene loss, they are gene-dense (high gene content for the genome size and short intergenic regions) and have highly conserved gene order. Phylogenetic analyses based on concatenated nucleotide genome data roughly supports the current taxonomic system for the order. Comparative analyses confirm data for members of the class Florideophyceae that cp genomes in Batrachospermales is highly conserved, with little variation in gene composition. However, relevant new features were revealed in our study: genome sizes in members of Batrachospermales are close to the lowest values reported for Florideophyceae; differences in cp genome size within the order are large in comparison with other orders (Ceramiales, Gelidiales, Gracilariales, Hildenbrandiales, and Nemaliales); and members of Batrachospermales have the lowest number of protein-coding genes among the Florideophyceae. In terms of gene loss, apcF, which encodes the allophycocyanin beta subunit, is absent in all sequenced taxa of Batrachospermales. We reinforce that the interordinal relationships between the freshwater orders Batrachospermales and Thoreales within the Nemaliophycidae is not well resolved due to limited taxon sampling. © 2017 Phycological Society of America.

  14. Human Ro60 (SSA2) genomic organization and sequence alterations, examined in cutaneous lupus erythematosus.

    Science.gov (United States)

    Millard, T P; Ashton, G H S; Kondeatis, E; Vaughan, R W; Hughes, G R V; Khamashta, M A; Hawk, J L M; McGregor, J M; McGrath, J A

    2002-02-01

    The Ro 60 kDa protein (Ro60 or SSA2) is the major component of the Ro ribonucleoprotein (Ro RNP) complex, to which an immune response is a specific feature of several autoimmune diseases. The genomic organization and any sequence variation within the DNA encoding Ro60 are unknown. To characterize the Ro60 gene structure and to assess whether any sequence alterations might be associated with serum anti-Ro antibody in subacute cutaneous lupus erythematosus (SCLE), thus potentially providing new insight into disease pathogenesis. The cDNA sequence for Ro60 was obtained from the NCBI database and used for a BLAST search for a clone containing the entire genomic sequence. The intron-exon borders were confirmed by designing intronic primer pairs to flank each exon, which were then used to amplify genomic DNA for automated sequencing from 36 caucasian patients with SCLE (anti-Ro positive) and 49 with discoid LE (DLE, anti-Ro negative), in addition to 36 healthy caucasian controls. Heteroduplex analysis of polymerase chain reaction (PCR) products from patients and controls spanning all Ro60 exons (1-8) revealed a common bandshift in the PCR products spanning exon 7. Sequencing of the corresponding PCR products demonstrated an A > G substitution at nucleotide position 1318-7, within the consensus acceptor splice site of exon 7 (GenBank XM001901). The allele frequencies were major allele A (0.71) and minor allele G (0.29) in 72 control chromosomes, with no significant differences found between SCLE patients, DLE patients and controls. The genomic organization of the DNA encoding the Ro60 protein is described, including a common polymorphism within the consensus acceptor splice site of exon 7. Our delineation of a strategy for the genomic amplification of Ro60 forms a basis for further examination of the pathological functions of the Ro RNP in autoimmune disease.

  15. The three-dimensional genome organization of Drosophila melanogaster through data integration.

    Science.gov (United States)

    Li, Qingjiao; Tjong, Harianto; Li, Xiao; Gong, Ke; Zhou, Xianghong Jasmine; Chiolo, Irene; Alber, Frank

    2017-07-31

    Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.

  16. Genome wide binding (ChIP-Seq of murine Bapx1 and Sox9 proteins in vivo and in vitro

    Directory of Open Access Journals (Sweden)

    Sumantra Chatterjee

    2016-12-01

    Full Text Available This work pertains to GEO submission GSE36672, in vivo and in vitro genome wide binding (ChIP-Seq of Bapx1/Nkx3.2 and Sox9 proteins. We have previously shown that data from a genome wide binding assay combined with transcriptional profiling is an insightful means to divulge the mechanisms directing cell type specification and the generation of tissues and subsequent organs [1]. Our earlier work identified the role of the DNA-binding homeodomain containing protein Bapx1/Nkx3.2 in midgestation murine embryos. Microarray analysis of EGFP-tagged cells (both wildtype and null was integrated using ChIP-Seq analysis of Bapx1/Nkx3.2 and Sox9 DNA-binding proteins in living tissue.

  17. Bioinformatics analysis of disordered proteins in prokaryotes

    Directory of Open Access Journals (Sweden)

    Malkov Saša N

    2011-03-01

    Full Text Available Abstract Background A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorder content of proteins from prokaryotic organisms (i.e., superkingdoms Archaea and Bacteria with respect to functional categories they belong to, i.e., Clusters of Orthologous Groups of proteins (COGs and groups of COGs-Cellular processes (Cp, Information storage and processing (Isp, Metabolism (Me and Poorly characterized (Pc. We also analyzed the disorder content of proteins with respect to various genomic, metabolic and ecological characteristics of the organism they belong to. We used correlations and association rule mining in order to identify the most confident associations between specific modalities of the characteristics considered and disorder content. Results Bacteria are shown to have a somewhat higher level of protein disorder than archaea, except for proteins in the Me functional group. It is demonstrated that the Isp and Cp functional groups in particular (L-repair function and N-cell motility and secretion COGs of proteins in specific possess the highest disorder content, while Me proteins, in general, posses the lowest. Disorder fractions have been confirmed to have the lowest level for the so-called order-promoting amino acids and the highest level for the so-called disorder promoters. For each pair of organism characteristics, specific modalities are identified with the maximum disorder proteins in the corresponding organisms, e.g., high genome size-high GC content organisms, facultative anaerobic-low GC content organisms, aerobic-high genome size organisms, etc. Maximum disorder in archaea is observed for high GC content-low genome size organisms, high GC content

  18. Coordination of genomic structure and transcription by the main bacterial nucleoid-associated protein HU

    Science.gov (United States)

    Berger, Michael; Farcas, Anca; Geertz, Marcel; Zhelyazkova, Petya; Brix, Klaudia; Travers, Andrew; Muskhelishvili, Georgi

    2010-01-01

    The histone-like protein HU is a highly abundant DNA architectural protein that is involved in compacting the DNA of the bacterial nucleoid and in regulating the main DNA transactions, including gene transcription. However, the coordination of the genomic structure and function by HU is poorly understood. Here, we address this question by comparing transcript patterns and spatial distributions of RNA polymerase in Escherichia coli wild-type and hupA/B mutant cells. We demonstrate that, in mutant cells, upregulated genes are preferentially clustered in a large chromosomal domain comprising the ribosomal RNA operons organized on both sides of OriC. Furthermore, we show that, in parallel to this transcription asymmetry, mutant cells are also impaired in forming the transcription foci—spatially confined aggregations of RNA polymerase molecules transcribing strong ribosomal RNA operons. Our data thus implicate HU in coordinating the global genomic structure and function by regulating the spatial distribution of RNA polymerase in the nucleoid. PMID:20010798

  19. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. MIPS plant genome information resources.

    Science.gov (United States)

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  1. The complete nucleotide sequence, genome organization, and origin of human adenovirus type 11

    International Nuclear Information System (INIS)

    Stone, Daniel; Furthmann, Anne; Sandig, Volker; Lieber, Andre

    2003-01-01

    The complete DNA sequence and transcription map of human adenovirus type 11 are reported here. This is the first published sequence for a subgenera B human adenovirus and demonstrates a genome organization highly similar to those of other human adenoviruses. All of the genes from the early, intermediate, and late regions are present in the expected locations of the genome for a human adenovirus. The genome size is 34,794 bp in length and has a GC content of 48.9%. Sequence alignment with genomes of groups A (Ad12), C (Ad5), D (Ad17), E (Simian adenovirus 25), and F (Ad40) revealed homologies of 64, 54, 68, 75, and 52%, respectively. Detailed genomic analysis demonstrated that Ads 11 and 35 are highly conserved in all areas except the hexon hypervariable regions and fiber. Similarly, comparison of Ad11 with subgroup E SAV25 revealed poor homology between fibers but high homology in proteins encoded by all other areas of the genome. We propose an evolutionary model in which functional viruses can be reconstituted following fiber substitution from one serotype to another. According to this model either the Ad11 genome is a derivative of Ad35, from which the fiber was substituted with Ad7, or the Ad35 genome is the product of a fiber substitution from Ad21 into the Ad11 genome. This model also provides a possible explanation for the origin of group E Ads, which are evolutionarily derived from a group C fiber substitution into a group B genome

  2. Novel rod-shaped viruses isolated from garlic, Allium sativum, possessing a unique genome organization.

    Science.gov (United States)

    Sumi, S; Tsuneyoshi, T; Furutani, H

    1993-09-01

    Rod-shaped flexuous viruses were partially purified from garlic plants (Allium sativum) showing typical mosaic symptoms. The genome was shown to be composed of RNA with a poly(A) tail of an estimated size of 10 kb as shown by denaturing agarose gel electrophoresis. We constructed cDNA libraries and screened four independent clones, which were designated GV-A, GV-B, GV-C and GV-D, using Northern and Southern blot hybridization. Nucleotide sequence determination of the cDNAs, two of which correspond to nearly one-third of the virus genomic RNA, shows that all of these viruses possess an identical genomic structure and that also at least four proteins are encoded in the viral cDNA, their M(r)s being estimated to be 15K, 27K, 40K and 11K. The 15K open reading frame (ORF) encodes the core-like sequence of a zinc finger protein preceded by a cluster of basic amino acid residues. The 27K ORF probably encodes the viral coat protein (CP), based on both the existence of some conserved sequences observed in many other rod-shaped or flexuous virus CPs and an overall amino acid sequence similarity to potexvirus and carlavirus CPs. The 11K ORF shows significant amino acid sequence similarities to the corresponding 12K proteins of the potexviruses and carlaviruses. On the other hand, the 40K ORF product does not resemble any other plant virus gene products reported so far. The genomic organization in the 3' region of the garlic viruses resembles, but clearly differs from, that of carlaviruses. Phylogenetic analysis based upon the amino acid sequence of the viral capsid protein also indicates that the garlic viruses have a unique and distinct domain different from those of the potexvirus and carlavirus groups. The results suggest that the garlic viruses described here belong to an unclassified and new virus group closely related to the carlaviruses.

  3. Genome-wide analysis of protein-protein interactions and involvement of viral proteins in SARS-CoV replication.

    Directory of Open Access Journals (Sweden)

    Ji'an Pan

    Full Text Available Analyses of viral protein-protein interactions are an important step to understand viral protein functions and their underlying molecular mechanisms. In this study, we adopted a mammalian two-hybrid system to screen the genome-wide intraviral protein-protein interactions of SARS coronavirus (SARS-CoV and therefrom revealed a number of novel interactions which could be partly confirmed by in vitro biochemical assays. Three pairs of the interactions identified were detected in both directions: non-structural protein (nsp 10 and nsp14, nsp10 and nsp16, and nsp7 and nsp8. The interactions between the multifunctional nsp10 and nsp14 or nsp16, which are the unique proteins found in the members of Nidovirales with large RNA genomes including coronaviruses and toroviruses, may have important implication for the mechanisms of replication/transcription complex assembly and functions of these viruses. Using a SARS-CoV replicon expressing a luciferase reporter under the control of a transcription regulating sequence, it has been shown that several viral proteins (N, X and SUD domains of nsp3, and nsp12 provided in trans stimulated the replicon reporter activity, indicating that these proteins may regulate coronavirus replication and transcription. Collectively, our findings provide a basis and platform for further characterization of the functions and mechanisms of coronavirus proteins.

  4. PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications.

    Science.gov (United States)

    Pasquier, C; Promponas, V J; Hamodrakas, S J

    2001-08-15

    A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLASS.

  5. ProteinSplit: splitting of multi-domain proteins using prediction of ordered and disordered regions in protein sequences for virtual structural genomics

    International Nuclear Information System (INIS)

    Wyrwicz, Lucjan S; Koczyk, Grzegorz; Rychlewski, Leszek; Plewczynski, Dariusz

    2007-01-01

    The annotation of protein folds within newly sequenced genomes is the main target for semi-automated protein structure prediction (virtual structural genomics). A large number of automated methods have been developed recently with very good results in the case of single-domain proteins. Unfortunately, most of these automated methods often fail to properly predict the distant homology between a given multi-domain protein query and structural templates. Therefore a multi-domain protein should be split into domains in order to overcome this limitation. ProteinSplit is designed to identify protein domain boundaries using a novel algorithm that predicts disordered regions in protein sequences. The software utilizes various sequence characteristics to assess the local propensity of a protein to be disordered or ordered in terms of local structure stability. These disordered parts of a protein are likely to create interdomain spacers. Because of its speed and portability, the method was successfully applied to several genome-wide fold annotation experiments. The user can run an automated analysis of sets of proteins or perform semi-automated multiple user projects (saving the results on the server). Additionally the sequences of predicted domains can be sent to the Bioinfo.PL Protein Structure Prediction Meta-Server for further protein three-dimensional structure and function prediction. The program is freely accessible as a web service at http://lucjan.bioinfo.pl/proteinsplit together with detailed benchmark results on the critical assessment of a fully automated structure prediction (CAFASP) set of sequences. The source code of the local version of protein domain boundary prediction is available upon request from the authors

  6. Use of the Operon Structure of the C. elegans Genome as a Tool to Identify Functionally Related Proteins

    Directory of Open Access Journals (Sweden)

    Silvia Dossena

    2013-12-01

    Full Text Available One of the most pressing challenges in the post genomic era is the identification and characterization of protein-protein interactions (PPIs, as these are essential in understanding the cellular physiology of health and disease. Experimental techniques suitable for characterizing PPIs (X-ray crystallography or nuclear magnetic resonance spectroscopy, among others are usually laborious, time-consuming and often difficult to apply to membrane proteins, and therefore require accurate prediction of the candidate interacting partners. High-throughput experimental methods (yeast two-hybrid and affinity purification succumb to the same shortcomings, and can also lead to high rates of false positive and negative results. Therefore, reliable tools for predicting PPIs are needed. The use of the operon structure in the eukaryote Caenorhabditis elegans genome is a valuable, though underserved, tool for identifying physically or functionally interacting proteins. Based on the concept that genes organized in the same operon may encode physically or functionally related proteins, this algorithm is easy to be applied and, importantly, gives a limited number of candidate partners of a given protein, allowing for focused experimental verification. Moreover, this approach can be successfully used to predict PPIs in the human system, including those of membrane proteins.

  7. Detailed analysis of putative genes encoding small proteins in legume genomes

    Directory of Open Access Journals (Sweden)

    Gabriel eGuillén

    2013-06-01

    Full Text Available Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30-150 amino acids encoded by short open reading frames (sORFs. SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription, presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10461, 30521, and 23599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation.

  8. MIPS: a database for protein sequences, homology data and yeast genome information.

    Science.gov (United States)

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  9. Long-Range Order and Fractality in the Structure and Organization of Eukaryotic Genomes

    Science.gov (United States)

    Polychronopoulos, Dimitris; Tsiagkas, Giannis; Athanasopoulou, Labrini; Sellis, Diamantis; Almirantis, Yannis

    2014-12-01

    The late Professor J.S. Nicolis always emphasized, both in his writings and in presentations and discussions with students and friends, the relevance of a dynamical systems approach to biology. In particular, viewing the genome as a "biological text" captures the dynamical character of both the evolution and function of the organisms in the form of correlations indicating the presence of a long-range order. This genomic structure can be expressed in forms reminiscent of natural languages and several temporal and spatial traces l by the functioning of dynamical systems: Zipf laws, self-similarity and fractality. Here we review several works of our group and recent unpublished results, focusing on the chromosomal distribution of biologically active genomic components: Genes and protein-coding segments, CpG islands, transposable elements belonging to all major classes and several types of conserved non-coding genomic elements. We report the systematic appearance of power-laws in the size distribution of the distances between elements belonging to each of these types of functional genomic elements. Moreover, fractality is also found in several cases, using box-counting and entropic scaling.We present here, for the first time in a unified way, an aggregative model of the genomic dynamics which can explain the observed patterns on the grounds of known phenomena accompanying genome evolution. Our results comply with recent findings about a "fractal globule" geometry of chromatin in the eukaryotic nucleus.

  10. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming

    2012-06-28

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  11. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming; Wong, Ka-Chun; Ryu, Tae Woo; Ravasi, Timothy; Gao, Xin

    2012-01-01

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  12. The genomic organization of plant pathogenicity in Fusarium species

    NARCIS (Netherlands)

    Rep, M.; Kistler, H.C.

    2010-01-01

    Comparative genomics is a powerful tool to infer the molecular basis of fungal pathogenicity and its evolution by identifying differences in gene content and genomic organization between fungi with different hosts or modes of infection. Through comparative analysis, pathogenicity-related chromosomes

  13. Azolla--a model organism for plant genomic studies.

    Science.gov (United States)

    Qiu, Yin-Long; Yu, Jun

    2003-02-01

    The aquatic ferns of the genus Azolla are nitrogen-fixing plants that have great potentials in agricultural production and environmental conservation. Azolla in many aspects is qualified to serve as a model organism for genomic studies because of its importance in agriculture, its unique position in plant evolution, its symbiotic relationship with the N2-fixing cyanobacterium, Anabaena azollae, and its moderate-sized genome. The goals of this genome project are not only to understand the biology of the Azolla genome to promote its applications in biological research and agriculture practice but also to gain critical insights about evolution of plant genomes. Together with the strategic and technical improvement as well as cost reduction of DNA sequencing, the deciphering of their genetic code is imminent.

  14. Mycoplasma hyopneumoniae Transcription Unit Organization: Genome Survey and Prediction

    Science.gov (United States)

    Siqueira, Franciele Maboni; Schrank, Augusto; Schrank, Irene Silveira

    2011-01-01

    Mycoplasma hyopneumoniae is associated with swine respiratory diseases. Although gene organization and regulation are well known in many prokaryotic organisms, knowledge on mycoplasma is limited. This study performed a comparative analysis of three strains of M. hyopneumoniae (7448, J and 232), with a focus on genome organization and gene comparison for open read frame (ORF) cluster (OC) identification. An in silico analysis of gene organization demonstrated 117 OCs and 34 single ORFs in M. hyopneumoniae 7448 and J, while 116 OCs and 36 single ORFs were identified in M. hyopneumoniae 232. Genomic comparison revealed high synteny and conservation of gene order between the OCs defined for 7448 and J strains as well as for 7448 and 232 strains. Twenty-one OCs were chosen and experimentally confirmed by reverse transcription–PCR from M. hyopneumoniae 7448 genome, validating our prediction. A subset of the ORFs within an OC could be independently transcribed due to the presence of internal promoters. Our results suggest that transcription occurs in ‘run-on’ from an upstream promoter in M. hyopneumoniae, thus forming large ORF clusters (from 2 to 29 ORFs in the same orientation) and indicating a complex transcriptional organization. PMID:22086999

  15. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    Directory of Open Access Journals (Sweden)

    James J Davis

    2016-02-01

    Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  16. [Compartmentalization of the cell nucleus and spatial organization of the genome].

    Science.gov (United States)

    Gavrilov, A A; Razin, S V

    2015-01-01

    The eukaryotic cell nucleus is one of the most complex cell organelles. Despite the absence of membranes, the nuclear space is divided into numerous compartments where different processes in- volved in the genome activity take place. The most important nuclear compartments include nucleoli, nuclear speckles, PML bodies, Cajal bodies, histone locus bodies, Polycomb bodies, insulator bodies, transcription and replication factories. The structural basis for the nuclear compartmentalization is provided by genomic DNA that occupies most of the nuclear volume. Nuclear compartments, in turn, guide the chromosome folding by providing a platform for the spatial interaction of individual genomic loci. In this review, we discuss fundamental principles of higher order genome organization with a focus on chromosome territories and chromosome domains, as well as consider the structure and function of the key nuclear compartments. We show that the func- tional compartmentalization of the cell nucleus and genome spatial organization are tightly interconnected, and that this form of organization is highly dynamic and is based on stochastic processes.

  17. Approaching the Sequential and Three-Dimensional Organization of Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2006-01-01

    textabstractGenomes are one of the major foundations of life due to their role in information storage, process regulation and evolution. To achieve a deeper unterstanding of the human genome the three-dimensional organization of the human cell nucleus, the structural-, scaling- and dynamic

  18. Genome organization in the nucleus: From dynamic measurements to a functional model.

    Science.gov (United States)

    Vivante, Anat; Brozgol, Eugene; Bronshtein, Irena; Garini, Yuval

    2017-07-01

    A biological system is by definition a dynamic environment encompassing kinetic processes that occur at different length scales and time ranges. To explore this type of system, spatial information needs to be acquired at different time scales. This means overcoming significant hurdles, including the need for stable and precise labeling of the required probes and the use of state of the art optical methods. However, to interpret the acquired data, biophysical models that can account for these biological mechanisms need to be developed. The structure and function of a biological system are closely related to its dynamic properties, thus further emphasizing the importance of identifying the rules governing the dynamics that cannot be directly deduced from information on the structure itself. In eukaryotic cells, tens of thousands of genes are packed in the small volume of the nucleus. The genome itself is organized in chromosomes that occupy specific volumes referred to as chromosome territories. This organization is preserved throughout the cell cycle, even though there are no sub-compartments in the nucleus itself. This organization, which is still not fully understood, is crucial for a large number of cellular functions such as gene regulation, DNA breakage repair and error-free cell division. Various techniques are in use today, including imaging, live cell imaging and molecular methods such as chromosome conformation capture (3C) methods to better understand these mechanisms. Live cell imaging methods are becoming well established. These include methods such as Single Particle Tracking (SPT), Continuous Photobleaching (CP), Fluorescence Recovery After Photobleaching (FRAP) and Fluorescence Correlation Spectroscopy (FCS) that are currently used for studying proteins, RNA, DNA, gene loci and nuclear bodies. They provide crucial information on its mobility, reorganization, interactions and binding properties. Here we describe how these dynamic methods can be used to

  19. Genome-wide comparative analysis of ABC systems in the Bdellovibrio-and-like organisms.

    Science.gov (United States)

    Li, Nan; Chen, Huan; Williams, Henry N

    2015-05-10

    Bdellovibrio-and-like organisms (BALOs) are gram-negative, predatory bacteria with wide variations in genome sizes and GC content and ecological habitats. The ATP-binding cassette (ABC) systems have been identified in several prokaryotes, fungi and plants and have a role in transport of materials in and out of cells and in cellular processes. However, knowledge of the ABC systems of BALOs remains obscure. A total of 269 putative ABC proteins were identified in BALOs. The genes encoding these ABC systems occupy nearly 1.3% of the gene content in freshwater Bdellovibrio strains and about 0.7% in their saltwater counterparts. The proteins found belong to 25 ABC system families based on their structural characteristics and functions. Among these, 16 families function as importers, 6 as exporters and 3 are involved in various cellular processes. Eight of these 25 ABC system families were deduced to be the core set of ABC systems conserved in all BALOs. All Bacteriovorax strains have 28 or less ABC systems. On the contrary, the freshwater Bdellovibrio strains have more ABC systems, typically around 51. In the genome of Bdellovibrio exovorus JSS (CP003537.1), 53 putative ABC systems were detected, representing the highest number among all the BALO genomes examined in this study. Unexpected high numbers of ABC systems involved in cellular processes were found in all BALOs. Phylogenetic analysis suggests that the majority of ABC proteins can be assigned into many separate families with high bootstrap supports (>50%). In this study, a general framework of sequence-structure-function connections for the ABC systems in BALOs was revealed providing novel insights for future investigations. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Protein identification from two-dimensional gel electrophoresis analysis of Klebsiella pneumoniae by combined use of mass spectrometry data and raw genome sequences

    Directory of Open Access Journals (Sweden)

    Zeng An-Ping

    2003-12-01

    Full Text Available Abstract Separation of proteins by two-dimensional gel electrophoresis (2-DE coupled with identification of proteins through peptide mass fingerprinting (PMF by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS is the widely used technique for proteomic analysis. This approach relies, however, on the presence of the proteins studied in public-accessible protein databases or the availability of annotated genome sequences of an organism. In this work, we investigated the reliability of using raw genome sequences for identifying proteins by PMF without the need of additional information such as amino acid sequences. The method is demonstrated for proteomic analysis of Klebsiella pneumoniae grown anaerobically on glycerol. For 197 spots excised from 2-DE gels and submitted for mass spectrometric analysis 164 spots were clearly identified as 122 individual proteins. 95% of the 164 spots can be successfully identified merely by using peptide mass fingerprints and a strain-specific protein database (ProtKpn constructed from the raw genome sequences of K. pneumoniae. Cross-species protein searching in the public databases mainly resulted in the identification of 57% of the 66 high expressed protein spots in comparison to 97% by using the ProtKpn database. 10 dha regulon related proteins that are essential for the initial enzymatic steps of anaerobic glycerol metabolism were successfully identified using the ProtKpn database, whereas none of them could be identified by cross-species searching. In conclusion, the use of strain-specific protein database constructed from raw genome sequences makes it possible to reliably identify most of the proteins from 2-DE analysis simply through peptide mass fingerprinting.

  1. Ectopic Expression of Testis Germ Cell Proteins in Cancer and Its Potential Role in Genomic Instability

    Directory of Open Access Journals (Sweden)

    Aaraby Yoheswaran Nielsen

    2016-06-01

    Full Text Available Genomic instability is a hallmark of human cancer and an enabling factor for the genetic alterations that drive cancer development. The processes involved in genomic instability resemble those of meiosis, where genetic material is interchanged between homologous chromosomes. In most types of human cancer, epigenetic changes, including hypomethylation of gene promoters, lead to the ectopic expression of a large number of proteins normally restricted to the germ cells of the testis. Due to the similarities between meiosis and genomic instability, it has been proposed that activation of meiotic programs may drive genomic instability in cancer cells. Some germ cell proteins with ectopic expression in cancer cells indeed seem to promote genomic instability, while others reduce polyploidy and maintain mitotic fidelity. Furthermore, oncogenic germ cell proteins may indirectly contribute to genomic instability through induction of replication stress, similar to classic oncogenes. Thus, current evidence suggests that testis germ cell proteins are implicated in cancer development by regulating genomic instability during tumorigenesis, and these proteins therefore represent promising targets for novel therapeutic strategies.

  2. Protein Dynamics in Mammalian Genome Maintenance

    NARCIS (Netherlands)

    A. Zotter (Angelika)

    2008-01-01

    textabstractThe integrity of the genome, carrier of the blueprint for each organism, is under constant attack from environmental as well as endogenous DNA damaging agents. An agent with substantial impact on our DNA is the UV-fraction of sunlight. It inflicts bulky DNA lesions, which can interfere

  3. Analysis of Genome-Scale Data

    NARCIS (Netherlands)

    Kemmeren, P.P.C.W.

    2005-01-01

    The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has

  4. Bacillus subtilis genome diversity.

    Science.gov (United States)

    Earl, Ashlee M; Losick, Richard; Kolter, Roberto

    2007-02-01

    Microarray-based comparative genomic hybridization (M-CGH) is a powerful method for rapidly identifying regions of genome diversity among closely related organisms. We used M-CGH to examine the genome diversity of 17 strains belonging to the nonpathogenic species Bacillus subtilis. Our M-CGH results indicate that there is considerable genetic heterogeneity among members of this species; nearly one-third of Bsu168-specific genes exhibited variability, as measured by the microarray hybridization intensities. The variable loci include those encoding proteins involved in antibiotic production, cell wall synthesis, sporulation, and germination. The diversity in these genes may reflect this organism's ability to survive in diverse natural settings.

  5. Analysis of Genome-Scale Data

    OpenAIRE

    Kemmeren, P.P.C.W.

    2005-01-01

    The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has given rise to the parallel development of other high-throughput approaches such as determining mRNA expression level changes, gene-deletion phenotypes, chromosomal location of DNA binding proteins, cel...

  6. Mass spectrometry allows direct identification of proteins in large genomes

    DEFF Research Database (Denmark)

    Küster, B; Mortensen, Peter V.; Andersen, Jens S.

    2001-01-01

    Proteome projects seek to provide systematic functional analysis of the genes uncovered by genome sequencing initiatives. Mass spectrometric protein identification is a key requirement in these studies but to date, database searching tools rely on the availability of protein sequences derived fro...

  7. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    Science.gov (United States)

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  8. Identification of proteins likely to be involved in morphogenesis, cell division, and signal transduction in Planctomycetes by comparative genomics.

    Science.gov (United States)

    Jogler, Christian; Waldmann, Jost; Huang, Xiaoluo; Jogler, Mareike; Glöckner, Frank Oliver; Mascher, Thorsten; Kolter, Roberto

    2012-12-01

    Members of the Planctomycetes clade share many unusual features for bacteria. Their cytoplasm contains membrane-bound compartments, they lack peptidoglycan and FtsZ, they divide by polar budding, and they are capable of endocytosis. Planctomycete genomes have remained enigmatic, generally being quite large (up to 9 Mb), and on average, 55% of their predicted proteins are of unknown function. Importantly, proteins related to the unusual traits of Planctomycetes remain largely unknown. Thus, we embarked on bioinformatic analyses of these genomes in an effort to predict proteins that are likely to be involved in compartmentalization, cell division, and signal transduction. We used three complementary strategies. First, we defined the Planctomycetes core genome and subtracted genes of well-studied model organisms. Second, we analyzed the gene content and synteny of morphogenesis and cell division genes and combined both methods using a "guilt-by-association" approach. Third, we identified signal transduction systems as well as sigma factors. These analyses provide a manageable list of candidate genes for future genetic studies and provide evidence for complex signaling in the Planctomycetes akin to that observed for bacteria with complex life-styles, such as Myxococcus xanthus.

  9. Predicting co-complexed protein pairs using genomic and proteomic data integration

    Directory of Open Access Journals (Sweden)

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  10. G2S: A web-service for annotating genomic variants on 3D protein structures.

    Science.gov (United States)

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-01-27

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that support programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design conception and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    Science.gov (United States)

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.

  12. Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource.

    Science.gov (United States)

    Sharpton, Thomas J; Jospin, Guillaume; Wu, Dongying; Langille, Morgan G I; Pollard, Katherine S; Eisen, Jonathan A

    2012-10-13

    New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as "Sifting Families," or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology-based analyses. We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/).

  13. A genome-wide association study of seed protein and oil content in soybean.

    Science.gov (United States)

    Hwang, Eun-Young; Song, Qijian; Jia, Gaofeng; Specht, James E; Hyten, David L; Costa, Jose; Cregan, Perry B

    2014-01-02

    Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise

  14. Discovery and annotation of small proteins using genomics, proteomics and computational approaches

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Xiaohan; Tschaplinski, Timothy J.; Hurst, Gregory B.; Jawdy, Sara; Abraham, Paul E.; Lankford, Patricia K.; Adams, Rachel M.; Shah, Manesh B.; Hettich, Robert L.; Lindquist, Erika; Kalluri, Udaya C.; Gunter, Lee E.; Pennacchio, Christa; Tuskan, Gerald A.

    2011-03-02

    Small proteins (10 200 amino acids aa in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained 2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10 200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) codingpotential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were new to the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.

  15. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  16. Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress.

    Science.gov (United States)

    Chowdhary, Surabhi; Kainth, Amoldeep S; Gross, David S

    2017-12-15

    Three-dimensional (3D) chromatin organization is important for proper gene regulation, yet how the genome is remodeled in response to stress is largely unknown. Here, we use a highly sensitive version of chromosome conformation capture in combination with fluorescence microscopy to investigate Heat Shock Protein ( HSP ) gene conformation and 3D nuclear organization in budding yeast. In response to acute thermal stress, HSP genes undergo intense intragenic folding interactions that go well beyond 5'-3' gene looping previously described for RNA polymerase II genes. These interactions include looping between upstream activation sequence (UAS) and promoter elements, promoter and terminator regions, and regulatory and coding regions (gene "crumpling"). They are also dynamic, being prominent within 60 s, peaking within 2.5 min, and attenuating within 30 min, and correlate with HSP gene transcriptional activity. With similarly striking kinetics, activated HSP genes, both chromosomally linked and unlinked, coalesce into discrete intranuclear foci. Constitutively transcribed genes also loop and crumple yet fail to coalesce. Notably, a missense mutation in transcription factor TFIIB suppresses gene looping, yet neither crumpling nor HSP gene coalescence is affected. An inactivating promoter mutation, in contrast, obviates all three. Our results provide evidence for widespread, transcription-associated gene crumpling and demonstrate the de novo assembly and disassembly of HSP gene foci. Copyright © 2017 American Society for Microbiology.

  17. Incoming human papillomavirus 16 genome is lost in PML protein-deficient HaCaT keratinocytes.

    Science.gov (United States)

    Bienkowska-Haba, Malgorzata; Luszczek, Wioleta; Keiffer, Timothy R; Guion, Lucile G M; DiGiuseppe, Stephen; Scott, Rona S; Sapp, Martin

    2017-05-01

    Human papillomaviruses (HPVs) target promyelocytic leukemia (PML) nuclear bodies (NBs) during infectious entry and PML protein is important for efficient transcription of incoming viral genome. However, the transcriptional down regulation was shown to be promoter-independent in that heterologous promoters delivered by papillomavirus particles were also affected. To further investigate the role of PML protein in HPV entry, we used small hairpin RNA to knockdown PML protein in HaCaT keratinocytes. Confirming previous findings, PML knockdown in HaCaT cells reduced HPV16 transcript levels significantly following infectious entry without impairing binding and trafficking. However, when we quantified steady-state levels of pseudogenomes in interphase cells, we found strongly reduced genome levels compared with parental HaCaT cells. Because nuclear delivery was comparable in both cell lines, we conclude that viral pseudogenome must be removed after successful nuclear delivery. Transcriptome analysis by gene array revealed that PML knockdown in clonal HaCaT cells was associated with a constitutive interferon response. Abrogation of JAK1/2 signaling prevented genome loss, however, did not restore viral transcription. In contrast, knockdown of PML protein in HeLa cells did not affect HPV genome delivery and transcription. HeLa cells are transformed by HPV18 oncogenes E6 and E7, which have been shown to interfere with the JAK/Stat signaling pathway. Our data imply that PML NBs protect incoming HPV genomes. Furthermore, they provide evidence that PML NBs are key regulators of the innate immune response in keratinocytes. Promyelocytic leukemia nuclear bodies (PML NBs) are important for antiviral defense. Many DNA viruses target these subnuclear structures and reorganize them. Reorganization of PML NBs by viral proteins is important for establishment of infection. In contrast, HPVs require the presence of PML protein for efficient transcription of incoming viral genome. Our

  18. Evolution and structural organization of the C proteins of paramyxovirinae.

    Directory of Open Access Journals (Sweden)

    Michael K Lo

    Full Text Available The phosphoprotein (P gene of most Paramyxovirinae encodes several proteins in overlapping frames: P and V, which share a common N-terminus (PNT, and C, which overlaps PNT. Overlapping genes are of particular interest because they encode proteins originated de novo, some of which have unknown structural folds, challenging the notion that nature utilizes only a limited, well-mapped area of fold space. The C proteins cluster in three groups, comprising measles, Nipah, and Sendai virus. We predicted that all C proteins have a similar organization: a variable, disordered N-terminus and a conserved, α-helical C-terminus. We confirmed this predicted organization by biophysically characterizing recombinant C proteins from Tupaia paramyxovirus (measles group and human parainfluenza virus 1 (Sendai group. We also found that the C of the measles and Nipah groups have statistically significant sequence similarity, indicating a common origin. Although the C of the Sendai group lack sequence similarity with them, we speculate that they also have a common origin, given their similar genomic location and structural organization. Since C is dispensable for viral replication, unlike PNT, we hypothesize that C may have originated de novo by overprinting PNT in the ancestor of Paramyxovirinae. Intriguingly, in measles virus and Nipah virus, PNT encodes STAT1-binding sites that overlap different regions of the C-terminus of C, indicating they have probably originated independently. This arrangement, in which the same genetic region encodes simultaneously a crucial functional motif (a STAT1-binding site and a highly constrained region (the C-terminus of C, seems paradoxical, since it should severely reduce the ability of the virus to adapt. The fact that it originated twice suggests that it must be balanced by an evolutionary advantage, perhaps from reducing the size of the genetic region vulnerable to mutations.

  19. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    ://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  20. Genome-scale prediction of proteins with long intrinsically disordered regions.

    Science.gov (United States)

    Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

    2014-01-01

    Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.

  1. Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

    Directory of Open Access Journals (Sweden)

    Sharpton Thomas J

    2012-10-01

    Full Text Available Abstract Background New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. Results We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as “Sifting Families,” or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology–based analyses. Conclusions We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/.

  2. The actinome of Dictyostelium discoideum in comparison to actins and actin-related proteins from other organisms.

    Directory of Open Access Journals (Sweden)

    Jayabalan M Joseph

    Full Text Available Actin belongs to the most abundant proteins in eukaryotic cells which harbor usually many conventional actin isoforms as well as actin-related proteins (Arps. To get an overview over the sometimes confusing multitude of actins and Arps, we analyzed the Dictyostelium discoideum actinome in detail and compared it with the genomes from other model organisms. The D. discoideum actinome comprises 41 actins and actin-related proteins. The genome contains 17 actin genes which most likely arose from consecutive gene duplications, are all active, in some cases developmentally regulated and coding for identical proteins (Act8-group. According to published data, the actin fraction in a D. discoideum cell consists of more than 95% of these Act8-type proteins. The other 16 actin isoforms contain a conventional actin motif profile as well but differ in their protein sequences. Seven actin genes are potential pseudogenes. A homology search of the human genome using the most typical D. discoideum actin (Act8 as query sequence finds the major actin isoforms such as cytoplasmic beta-actin as best hit. This suggests that the Act8-group represents a nearly perfect actin throughout evolution. Interestingly, limited data from D. fasciculatum, a more ancient member among the social amoebae, show different relationships between conventional actins. The Act8-type isoform is most conserved throughout evolution. Modeling of the putative structures suggests that the majority of the actin-related proteins is functionally unrelated to canonical actin. The data suggest that the other actin variants are not necessary for the cytoskeleton itself but rather regulators of its dynamical features or subunits in larger protein complexes.

  3. Trichomonas vaginalis surface proteins: a view from the genome

    DEFF Research Database (Denmark)

    Hirt, R. P.; Noel, C. J.; Sicheritz-Pontén, Thomas

    2007-01-01

    Surface proteins of mucosal microbial pathogens play multiple and essential roles in initiating and sustaining the colonization of the heavily defended mucosa. The protist Trichomonas vaginalis is one of the most common human sexually transmitted pathogens that colonize the urogenital mucosa....... However, little is known about its surface proteins. The recently completed draft genome sequence of T. vaginalis provides an invaluable resource to guide molecular and cellular characterization of surface proteins and to investigate their role in pathogenicity. Here, we review the existing data on T...

  4. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation.

    Science.gov (United States)

    Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V

    2017-01-04

    The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of 'index' orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  5. Molecular mapping and genomics of soybean seed protein: a review and perspective for the future.

    Science.gov (United States)

    Patil, Gunvant; Mian, Rouf; Vuong, Tri; Pantalone, Vince; Song, Qijian; Chen, Pengyin; Shannon, Grover J; Carter, Tommy C; Nguyen, Henry T

    2017-10-01

    Genetic improvement of soybean protein meal is a complex process because of negative correlation with oil, yield, and temperature. This review describes the progress in mapping and genomics, identifies knowledge gaps, and highlights the need of integrated approaches. Meal protein derived from soybean [Glycine max (L) Merr.] seed is the primary source of protein in poultry and livestock feed. Protein is a key factor that determines the nutritional and economical value of soybean. Genetic improvement of soybean seed protein content is highly desirable, and major quantitative trait loci (QTL) for soybean protein have been detected and repeatedly mapped on chromosomes (Chr.) 20 (LG-I), and 15 (LG-E). However, practical breeding progress is challenging because of seed protein content's negative genetic correlation with seed yield, other seed components such as oil and sucrose, and interaction with environmental effects such as temperature during seed development. In this review, we discuss rate-limiting factors related to soybean protein content and nutritional quality, and potential control factors regulating seed storage protein. In addition, we describe advances in next-generation sequencing technologies for precise detection of natural variants and their integration with conventional and high-throughput genotyping technologies. A syntenic analysis of QTL on Chr. 15 and 20 was performed. Finally, we discuss comprehensive approaches for integrating protein and amino acid QTL, genome-wide association studies, whole-genome resequencing, and transcriptome data to accelerate identification of genomic hot spots for allele introgression and soybean meal protein improvement.

  6. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...

  7. cDNA structure, genomic organization and expression patterns of ...

    African Journals Online (AJOL)

    Visfatin was a newly identified adipocytokine, which was involved in various physiologic and pathologic processes of organisms. The cDNA structure, genomic organization and expression patterns of silver Prussian carp visfatin were described in this report. The silver Prussian carp visfatin cDNA cloned from the liver was ...

  8. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups......In November 2013, there was around 21.000 different prokaryotic genomes sequenced and publicly available, and the number is growing daily with another 20.000 or more genomes expected to be sequenced and deposited by the end of 2014. An important part of the analysis of this data is the functional...... annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...

  9. Genomic organization of plant aminopropyl transferases.

    Science.gov (United States)

    Rodríguez-Kessler, Margarita; Delgado-Sánchez, Pablo; Rodríguez-Kessler, Gabriela Theresia; Moriguchi, Takaya; Jiménez-Bremont, Juan Francisco

    2010-07-01

    Aminopropyl transferases like spermidine synthase (SPDS; EC 2.5.1.16), spermine synthase and thermospermine synthase (SPMS, tSPMS; EC 2.5.1.22) belong to a class of widely distributed enzymes that use decarboxylated S-adenosylmethionine as an aminopropyl donor and putrescine or spermidine as an amino acceptor to form in that order spermidine, spermine or thermospermine. We describe the analysis of plant genomic sequences encoding SPDS, SPMS, tSPMS and PMT (putrescine N-methyltransferase; EC 2.1.1.53). Genome organization (including exon size, gain and loss, as well as intron number, size, loss, retention, placement and phase, and the presence of transposons) of plant aminopropyl transferase genes were compared between the genomic sequences of SPDS, SPMS and tSPMS from Zea mays, Oryza sativa, Malus x domestica, Populus trichocarpa, Arabidopsis thaliana and Physcomitrella patens. In addition, the genomic organization of plant PMT genes, proposed to be derived from SPDS during the evolution of alkaloid metabolism, is illustrated. Herein, a particular conservation and arrangement of exon and intron sequences between plant SPDS, SPMS and PMT genes that clearly differs with that of ACL5 genes, is shown. The possible acquisition of the plant SPMS exon II and, in particular exon XI in the monocot SPMS genes, is a remarkable feature that allows their differentiation from SPDS genes. In accordance with our in silico analysis, functional complementation experiments of the maize ZmSPMS1 enzyme (previously considered to be SPDS) in yeast demonstrated its spermine synthase activity. Another significant aspect is the conservation of intron sequences among SPDS and PMT paralogs. In addition the existence of microsynteny among some SPDS paralogs, especially in P. trichocarpa and A. thaliana, supports duplication events of plant SPDS genes. Based in our analysis, we hypothesize that SPMS genes appeared with the divergence of vascular plants by a processes of gene duplication and the

  10. Genome-wide identification, sequence characterization, and protein-protein interaction properties of DDB1 (damaged DNA binding protein-1)-binding WD40-repeat family members in Solanum lycopersicum.

    Science.gov (United States)

    Zhu, Yunye; Huang, Shengxiong; Miao, Min; Tang, Xiaofeng; Yue, Junyang; Wang, Wenjie; Liu, Yongsheng

    2015-06-01

    One hundred DDB1 (damaged DNA binding protein-1)-binding WD40-repeat domain (DWD) family genes were identified in the S. lycopersicum genome. The DWD genes encode proteins presumably functioning as the substrate recognition subunits of the cullin4-ring ubiquitin E3 ligase complex. These findings provide candidate genes and a research platform for further gene functionality and molecular breeding study. A subclass of DDB1 (damaged DNA binding protein-1)-binding WD40-repeat domain (DWD) family proteins has been demonstrated to function as the substrate recognition subunits of the cullin4-ring ubiquitin E3 ligase complex. However, little information is available about the cognate subfamily genes in tomato (S. lycopersicum). In this study, based on the recently released tomato genome sequences, 100 tomato genes encoding DWD proteins that potentially interact with DDB1 were identified and characterized, including analyses of the detailed annotations, chromosome locations and compositions of conserved amino acid domains. In addition, a phylogenetic tree, which comprises of three main groups, of the subfamily genes was constructed. The physical interaction between tomato DDB1 and 14 representative DWD proteins was determined by yeast two-hybrid and co-immunoprecipitation assays. The subcellular localization of these 14 representative DWD proteins was determined. Six of them were localized in both nucleus and cytoplasm, seven proteins exclusively in cytoplasm, and one protein either in nucleus and cytoplasm, or exclusively in cytoplasm. Comparative genomic analysis demonstrated that the expansion of these subfamily members in tomato predominantly resulted from two whole-genome triplication events in the evolution history.

  11. Proteomic strategy for the identification of critical actors in reorganization of the post-meiotic male genome.

    Science.gov (United States)

    Govin, Jerome; Gaucher, Jonathan; Ferro, Myriam; Debernardi, Alexandra; Garin, Jerome; Khochbin, Saadi; Rousseaux, Sophie

    2012-01-01

    After meiosis, during the final stages of spermatogenesis, the haploid male genome undergoes major structural changes, resulting in a shift from a nucleosome-based genome organization to the sperm-specific, highly compacted nucleoprotamine structure. Recent data support the idea that region-specific programming of the haploid male genome is of high importance for the post-fertilization events and for successful embryo development. Although these events constitute a unique and essential step in reproduction, the mechanisms by which they occur have remained completely obscure and the factors involved have mostly remained uncharacterized. Here, we sought a strategy to significantly increase our understanding of proteins controlling the haploid male genome reprogramming, based on the identification of proteins in two specific pools: those with the potential to bind nucleic acids (basic proteins) and proteins capable of binding basic proteins (acidic proteins). For the identification of acidic proteins, we developed an approach involving a transition-protein (TP)-based chromatography, which has the advantage of retaining not only acidic proteins due to the charge interactions, but also potential TP-interacting factors. A second strategy, based on an in-depth bioinformatic analysis of the identified proteins, was then applied to pinpoint within the lists obtained, male germ cells expressed factors relevant to the post-meiotic genome organization. This approach reveals a functional network of DNA-packaging proteins and their putative chaperones and sheds a new light on the way the critical transitions in genome organizations could take place. This work also points to a new area of research in male infertility and sperm quality assessments.

  12. Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens.

    Science.gov (United States)

    Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji

    2015-04-01

    Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.

  13. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Jando, Marlen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J C [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Polytene Chromosomes - A Portrait of Functional Organization of the Drosophila Genome.

    Science.gov (United States)

    Zykova, Tatyana Yu; Levitsky, Victor G; Belyaeva, Elena S; Zhimulev, Igor F

    2018-04-01

    This mini-review is devoted to the problem genetic meaning of main polytene chromosome structures - bands and interbands. Generally, densely packed chromatin forms black bands, moderately condensed regions form grey loose bands, whereas decondensed regions of the genome appear as interbands. Recent progress in the annotation of the Drosophila genome and epigenome has made it possible to compare the banding pattern and the structural organization of genes, as well as their activity. This was greatly aided by our ability to establish the borders of bands and interbands on the physical map, which allowed to perform comprehensive side-by-side comparisons of cytology, genetic and epigenetic maps and to uncover the association between the morphological structures and the functional domains of the genome. These studies largely conclude that interbands 5'-ends of housekeeping genes that are active across all cell types. Interbands are enriched with proteins involved in transcription and nucleosome remodeling, as well as with active histone modifications. Notably, most of the replication origins map to interband regions. As for grey loose bands adjacent to interbands, they typically host the bodies of house-keeping genes. Thus, the bipartite structure composed of an interband and an adjacent grey band functions as a standalone genetic unit. Finally, black bands harbor tissue-specific genes with narrow temporal and tissue expression profiles. Thus, the uniform and permanent activity of interbands combined with the inactivity of genes in bands forms the basis of the universal banding pattern observed in various Drosophila tissues.

  15. Exploiting genomic data to identify proteins involved in abalone reproduction.

    Science.gov (United States)

    Mendoza-Porras, Omar; Botwright, Natasha A; McWilliam, Sean M; Cook, Mathew T; Harris, James O; Wijffels, Gene; Colgrave, Michelle L

    2014-08-28

    Aside from their critical role in reproduction, abalone gonads serve as an indicator of sexual maturity and energy balance, two key considerations for effective abalone culture. Temperate abalone farmers face issues with tank restocking with highly marketable abalone owing to inefficient spawning induction methods. The identification of key proteins in sexually mature abalone will serve as the foundation for a greater understanding of reproductive biology. Addressing this knowledge gap is the first step towards improving abalone aquaculture methods. Proteomic profiling of female and male gonads of greenlip abalone, Haliotis laevigata, was undertaken using liquid chromatography-mass spectrometry. Owing to the incomplete nature of abalone protein databases, in addition to searching against two publicly available databases, a custom database comprising genomic data was used. Overall, 162 and 110 proteins were identified in females and males respectively with 40 proteins common to both sexes. For proteins involved in sexual maturation, sperm and egg structure, motility, acrosomal reaction and fertilization, 23 were identified only in females, 18 only in males and 6 were common. Gene ontology analysis revealed clear differences between the female and male protein profiles reflecting a higher rate of protein synthesis in the ovary and higher metabolic activity in the testis. A comprehensive mass spectrometry-based analysis was performed to profile the abalone gonad proteome providing the foundation for future studies of reproduction in abalone. Key proteins involved in both reproduction and energy balance were identified. Genomic resources were utilised to build a database of molluscan proteins yielding >60% more protein identifications than in a standard workflow employing public protein databases. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Microbial genome analysis: the COG approach.

    Science.gov (United States)

    Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2017-09-14

    For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  17. Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

    Science.gov (United States)

    Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

    2014-01-01

    Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.

  18. Acetone utilization by sulfate-reducing bacteria: draft genome sequence of Desulfococcus biacutus and a proteomic survey of acetone-inducible proteins.

    Science.gov (United States)

    Gutiérrez Acosta, Olga B; Schleheck, David; Schink, Bernhard

    2014-07-11

    The sulfate-reducing bacterium Desulfococcus biacutus is able to utilize acetone for growth by an inducible degradation pathway that involves a novel activation reaction for acetone with CO as a co-substrate. The mechanism, enzyme(s) and gene(s) involved in this acetone activation reaction are of great interest because they represent a novel and yet undefined type of activation reaction under strictly anoxic conditions. In this study, a draft genome sequence of D. biacutus was established. Sequencing, assembly and annotation resulted in 159 contigs with 5,242,029 base pairs and 4773 predicted genes; 4708 were predicted protein-encoding genes, and 3520 of these had a functional prediction. Proteins and genes were identified that are specifically induced during growth with acetone. A thiamine diphosphate-requiring enzyme appeared to be highly induced during growth with acetone and is probably involved in the activation reaction. Moreover, a coenzyme B12- dependent enzyme and proteins that are involved in redox reactions were also induced during growth with acetone. We present for the first time the genome of a sulfate reducer that is able to grow with acetone. The genome information of this organism represents an important tool for the elucidation of a novel reaction mechanism that is employed by a sulfate reducer in acetone activation.

  19. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    Science.gov (United States)

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  20. Toxicogenomics: Applications of new functional genomics technologies in toxicology

    NARCIS (Netherlands)

    Heijne, W.H.M.

    2004-01-01

    Toxicogenomics studies toxic effects of substances on organisms in relation to the composition of the genome. It applies the functional genomics technologies transcriptomics, proteomics and metabolomics that determine expression of the genes, proteins and metabolites in a sample. These methods could

  1. A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.

    Science.gov (United States)

    Moraes, Fernanda; Góes, Andréa

    2016-05-06

    The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.

  2. Exploring the function of protein kinases in schistosomes: perspectives from the laboratory and from comparative genomics

    Directory of Open Access Journals (Sweden)

    Anthony John Walker

    2014-07-01

    Full Text Available Eukaryotic protein kinases are well conserved through evolution. The genome of Schistosoma mansoni, which causes intestinal schistosomiasis, encodes over 250 putative protein kinases with all of the main eukaryotic groups represented. However, unraveling functional roles for these kinases is a considerable endeavour, particularly as protein kinases regulate multiple and sometimes overlapping cell and tissue functions in organisms. In this article, elucidating protein kinase signal transduction and function in schistosomes is considered from the perspective of the state-of-the-art methodologies used and comparative organismal biology, with a focus on current advances and future directions. Using the free-living nematode Caenorhabditis elegans as a comparator we predict roles for various schistosome protein kinases in processes vital for host invasion and successful parasitism such as sensory behaviour, growth and development. It is anticipated that the characterization of schistosome protein kinases in the context of parasite function will catalyze cutting edge research into host-parasite interactions and will reveal new targets for developing drug interventions against human schistosomiasis.

  3. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes

    Directory of Open Access Journals (Sweden)

    Schnitzler Christine E

    2012-12-01

    Full Text Available Abstract Background Calcium-activated photoproteins are luciferase variants found in photocyte cells of bioluminescent jellyfish (Phylum Cnidaria and comb jellies (Phylum Ctenophora. The complete genomic sequence from the ctenophore Mnemiopsis leidyi, a representative of the earliest branch of animals that emit light, provided an opportunity to examine the genome of an organism that uses this class of luciferase for bioluminescence and to look for genes involved in light reception. To determine when photoprotein genes first arose, we examined the genomic sequence from other early-branching taxa. We combined our genomic survey with gene trees, developmental expression patterns, and functional protein assays of photoproteins and opsins to provide a comprehensive view of light production and light reception in Mnemiopsis. Results The Mnemiopsis genome has 10 full-length photoprotein genes situated within two genomic clusters with high sequence conservation that are maintained due to strong purifying selection and concerted evolution. Photoprotein-like genes were also identified in the genomes of the non-luminescent sponge Amphimedon queenslandica and the non-luminescent cnidarian Nematostella vectensis, and phylogenomic analysis demonstrated that photoprotein genes arose at the base of all animals. Photoprotein gene expression in Mnemiopsis embryos begins during gastrulation in migrating precursors to photocytes and persists throughout development in the canals where photocytes reside. We identified three putative opsin genes in the Mnemiopsis genome and show that they do not group with well-known bilaterian opsin subfamilies. Interestingly, photoprotein transcripts are co-expressed with two of the putative opsins in developing photocytes. Opsin expression is also seen in the apical sensory organ. We present evidence that one opsin functions as a photopigment in vitro, absorbing light at wavelengths that overlap with peak photoprotein light

  4. Cloning, production, and purification of proteins for a medium-scale structural genomics project.

    Science.gov (United States)

    Quevillon-Cheruel, Sophie; Collinet, Bruno; Trésaugues, Lionel; Minard, Philippe; Henckes, Gilles; Aufrère, Robert; Blondeau, Karine; Zhou, Cong-Zhao; Liger, Dominique; Bettache, Nabila; Poupon, Anne; Aboulfath, Ilham; Leulliot, Nicolas; Janin, Joël; van Tilbeurgh, Herman

    2007-01-01

    The South-Paris Yeast Structural Genomics Pilot Project (http://www.genomics.eu.org) aims at systematically expressing, purifying, and determining the three-dimensional structures of Saccharomyces cerevisiae proteins. We have already cloned 240 yeast open reading frames in the Escherichia coli pET system. Eighty-two percent of the targets can be expressed in E. coli, and 61% yield soluble protein. We have currently purified 58 proteins. Twelve X-ray structures have been solved, six are in progress, and six other proteins gave crystals. In this chapter, we present the general experimental flowchart applied for this project. One of the main difficulties encountered in this pilot project was the low solubility of a great number of target proteins. We have developed parallel strategies to recover these proteins from inclusion bodies, including refolding, coexpression with chaperones, and an in vitro expression system. A limited proteolysis protocol, developed to localize flexible regions in proteins that could hinder crystallization, is also described.

  5. Integrating the genomic architecture of human nucleolar organizer regions with the biophysical properties of nucleoli.

    Science.gov (United States)

    Mangan, Hazel; Gailín, Michael Ó; McStay, Brian

    2017-12-01

    Nucleoli are the sites of ribosome biogenesis and the largest membraneless subnuclear structures. They are intimately linked with growth and proliferation control and function as sensors of cellular stress. Nucleoli form around arrays of ribosomal gene (rDNA) repeats also called nucleolar organizer regions (NORs). In humans, NORs are located on the short arms of all five human acrocentric chromosomes. Multiple NORs contribute to the formation of large heterochromatin-surrounded nucleoli observed in most human cells. Here we will review recent findings about their genomic architecture. The dynamic nature of nucleoli began to be appreciated with the advent of photodynamic experiments using fluorescent protein fusions. We review more recent data on nucleoli in Xenopus germinal vesicles (GVs) which has revealed a liquid droplet-like behavior that facilitates nucleolar fusion. Further analysis in both XenopusGVs and Drosophila embryos indicates that the internal organization of nucleoli is generated by a combination of liquid-liquid phase separation and active processes involving rDNA. We will attempt to integrate these recent findings with the genomic architecture of human NORs to advance our understanding of how nucleoli form and respond to stress in human cells. © 2017 Federation of European Biochemical Societies.

  6. Chloroplast Genome Sequence of pigeonpea (Cajanus cajan (L. Millspaugh and Cajanus scarabaeoides: Genome organization and Comparison with other legumes

    Directory of Open Access Journals (Sweden)

    Tanvi Kaila

    2016-12-01

    Full Text Available Pigeonpea (Cajanus cajan (L. Millspaugh, a diploid (2n = 22 legume crop with a genome size of 852 Mbp, serves as an important source of human dietary protein especially in South East Asian and African regions. In this study, the draft chloroplast genomes of Cajanus cajan and Cajanus scarabaeoides were sequenced. Cajanus scarabaeoides is an important species of the Cajanus gene pool and has also been used for developing promising CMS system by different groups. A male sterile genotype harbouring the Cajanus scarabaeoides cytoplasm was used for sequencing the plastid genome. The cp genome of Cajanus cajan is 152,242bp long, having a quadripartite structure with LSC of 83,455 bp and SSC of 17,871 bp separated by IRs of 25,398 bp. Similarly, the cp genome of Cajanus scarabaeoides is 152,201bp long, having a quadripartite structure in which IRs of 25,402 bp length separates 83,423 bp of LSC and 17,854 bp of SSC. The pigeonpea cp genome contains 116 unique genes, including 30 tRNA, 4 rRNA, 78 predicted protein coding genes and 5 pseudogenes. A 50kb inversion was observed in the LSC region of pigeonpea cp genome, consistent with other legumes. Comparison of cp genome with other legumes revealed the contraction of IR boundaries due to the absence of rps19 gene in the IR region. Chloroplast SSRs were mined and a total of 280 and 292 cpSSRs were identified in Cajanus scarabaeoides and Cajanus cajan respectively. RNA editing was observed at 37 sites in both Cajanus scarabaeoides and Cajanus cajan, with maximum occurrence in the ndh genes. The pigeonpea cp genome sequence would be beneficial in providing informative molecular markers which can be utilized for genetic diversity analysis and aid in understanding the plant systematics studies among major grain legumes.

  7. CAGO: a software tool for dynamic visual comparison and correlation measurement of genome organization.

    Directory of Open Access Journals (Sweden)

    Yi-Feng Chang

    Full Text Available CAGO (Comparative Analysis of Genome Organization is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago.

  8. Marine Genomics: A clearing-house for genomic and transcriptomic data of marine organisms

    Directory of Open Access Journals (Sweden)

    Trent Harold F

    2005-03-01

    Full Text Available Abstract Background The Marine Genomics project is a functional genomics initiative developed to provide a pipeline for the curation of Expressed Sequence Tags (ESTs and gene expression microarray data for marine organisms. It provides a unique clearing-house for marine specific EST and microarray data and is currently available at http://www.marinegenomics.org. Description The Marine Genomics pipeline automates the processing, maintenance, storage and analysis of EST and microarray data for an increasing number of marine species. It currently contains 19 species databases (over 46,000 EST sequences that are maintained by registered users from local and remote locations in Europe and South America in addition to the USA. A collection of analysis tools are implemented. These include a pipeline upload tool for EST FASTA file, sequence trace file and microarray data, an annotative text search, automated sequence trimming, sequence quality control (QA/QC editing, sequence BLAST capabilities and a tool for interactive submission to GenBank. Another feature of this resource is the integration with a scientific computing analysis environment implemented by MATLAB. Conclusion The conglomeration of multiple marine organisms with integrated analysis tools enables users to focus on the comprehensive descriptions of transcriptomic responses to typical marine stresses. This cross species data comparison and integration enables users to contain their research within a marine-oriented data management and analysis environment.

  9. Conservation and divergence of ADAM family proteins in the Xenopus genome

    Directory of Open Access Journals (Sweden)

    Shah Anoop

    2010-07-01

    Full Text Available Abstract Background Members of the disintegrin metalloproteinase (ADAM family play important roles in cellular and developmental processes through their functions as proteases and/or binding partners for other proteins. The amphibian Xenopus has long been used as a model for early vertebrate development, but genome-wide analyses for large gene families were not possible until the recent completion of the X. tropicalis genome sequence and the availability of large scale expression sequence tag (EST databases. In this study we carried out a systematic analysis of the X. tropicalis genome and uncovered several interesting features of ADAM genes in this species. Results Based on the X. tropicalis genome sequence and EST databases, we identified Xenopus orthologues of mammalian ADAMs and obtained full-length cDNA clones for these genes. The deduced protein sequences, synteny and exon-intron boundaries are conserved between most human and X. tropicalis orthologues. The alternative splicing patterns of certain Xenopus ADAM genes, such as adams 22 and 28, are similar to those of their mammalian orthologues. However, we were unable to identify an orthologue for ADAM7 or 8. The Xenopus orthologue of ADAM15, an active metalloproteinase in mammals, does not contain the conserved zinc-binding motif and is hence considered proteolytically inactive. We also found evidence for gain of ADAM genes in Xenopus as compared to other species. There is a homologue of ADAM10 in Xenopus that is missing in most mammals. Furthermore, a single scaffold of X. tropicalis genome contains four genes encoding ADAM28 homologues, suggesting genome duplication in this region. Conclusions Our genome-wide analysis of ADAM genes in X. tropicalis revealed both conservation and evolutionary divergence of these genes in this amphibian species. On the one hand, all ADAMs implicated in normal development and health in other species are conserved in X. tropicalis. On the other hand, some

  10. Comparative and functional genomics of Legionella identified eukaryotic like proteins as key players in host-pathogen interactions

    Directory of Open Access Journals (Sweden)

    Laura eGomez-Valero

    2011-10-01

    Full Text Available Although best known for its ability to cause severe pneumonia in people whose immune defenses are weakened, Legionella pneumophila and Legionella longbeachae are two species of a large genus of bacteria that are ubiquitous in nature, where they parasitize protozoa. Adaptation to the host environment and exploitation of host cell functions are critical for the success of these intracellular pathogens. The establishment and publication of the complete genome sequences of L. pneumophila and L. longbeachae isolates paved the way for major breakthroughs in understanding the biology of these organisms. In this review we present the knowledge gained from the analyses and comparison of the complete genome sequences of different L. pneumophila and L. longbeachae strains. Emphasis is given on putative virulence and Legionella life cycle related functions, such as the identification of an extended array of eukaryotic-like proteins, many of which have been shown to modulate host cell functions to the pathogen's advantage. Surprisingly, many of the eukaryotic domain proteins identified in L. pneumophila as well as many substrates of the Dot/Icm type IV secretion system essential for intracellular replication are different between these two species, although they cause the same disease. Finally, evolutionary aspects regarding the eukaryotic like proteins in Legionella are discussed.

  11. Modeling heterogeneous (co)variances from adjacent-SNP groups improves genomic prediction for milk protein composition traits

    DEFF Research Database (Denmark)

    Gebreyesus, Grum; Lund, Mogens Sandø; Buitenhuis, Albert Johannes

    2017-01-01

    Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci...... of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we...... developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls...

  12. Complete genome sequence of Halorhodospira halophila SL1

    Energy Technology Data Exchange (ETDEWEB)

    Challacombe, Jean F [ORNL; Majid, Sophia [University of Chicago; Deole, Ratnakar [Oklahoma State University; Brettin, Thomas S. [Argonne National Laboratory (ANL); Bruce, David [Los Alamos National Laboratory (LANL); Delano, Susana [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Gleasner, Cheryl D. [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Misra, Monica [Los Alamos National Laboratory (LANL); Reitenga, Krista K. [Los Alamos National Laboratory (LANL); Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Hoff, Wouter D. [Oklahoma State University

    2013-01-01

    Halorhodospira halophila is among the most halophilic organisms known. It is an obligately photosynthetic and anaerobic purple sulfur bacterium that exhibits autotrophic growth up to saturated NaCl concentrations. The type strain H. halophila SL1 was isolated from a hypersaline lake in Oregon. Here we report the determination of its entire genome in a single contig. This is the first genome of a phototrophic extreme halophile. The genome consists of 2,678,452 bp, encoding 2493 predicted genes as determined by automated genome annotation. Of the 2407 predicted proteins, 1905 were assigned to a putative function. Future detailed analysis of this genome promises to yield insights into the halophilic adaptations of this organism, its ability for photoautotrophic growth under extreme conditions, and its characteristic sulfur metabolism.

  13. Protein linguistics - a grammar for modular protein assembly?

    Science.gov (United States)

    Gimona, Mario

    2006-01-01

    The correspondence between biology and linguistics at the level of sequence and lexical inventories, and of structure and syntax, has fuelled attempts to describe genome structure by the rules of formal linguistics. But how can we define protein linguistic rules? And how could compositional semantics improve our understanding of protein organization and functional plasticity?

  14. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  16. Genome-scale modeling of the protein secretory machinery in yeast

    DEFF Research Database (Denmark)

    Feizi, Amir; Österlund, Tobias; Petranovic, Dina

    2013-01-01

    The protein secretory machinery in Eukarya is involved in post-translational modification (PTMs) and sorting of the secretory and many transmembrane proteins. While the secretory machinery has been well-studied using classic reductionist approaches, a holistic view of its complex nature is lacking....... Here, we present the first genome-scale model for the yeast secretory machinery which captures the knowledge generated through more than 50 years of research. The model is based on the concept of a Protein Specific Information Matrix (PSIM: characterized by seven PTMs features). An algorithm...

  17. Split photosystem protein, linear-mapping topology, and growth of structural complexity in the plastid genome of chromera velia

    KAUST Repository

    Janouškovec, Jan

    2013-08-22

    The canonical photosynthetic plastid genomes consist of a single circular-mapping chromosome that encodes a highly conserved protein core, involved in photosynthesis and ATP generation. Here, we demonstrate that the plastid genome of the photosynthetic relative of apicomplexans, Chromera velia, departs from this view in several unique ways. Core photosynthesis proteins PsaA and AtpB have been broken into two fragments, which we show are independently transcribed, oligoU-tailed, translated, and assembled into functional photosystem I and ATP synthase complexes. Genome-wide transcription profiles support expression of many other highly modified proteins, including several that contain extensions amounting to hundreds of amino acids in length. Canonical gene clusters and operons have been fragmented and reshuffled into novel putative transcriptional units. Massive genomic coverage by paired-end reads, coupled with pulsed-field gel electrophoresis and polymerase chain reaction, consistently indicate that the C. velia plastid genome is linear-mapping, a unique state among all plastids. Abundant intragenomic duplication probably mediated by recombination can explain protein splits, extensions, and genome linearization and is perhaps the key driving force behind the many features that defy the conventional ways of plastid genome architecture and function. © The Author 2013.

  18. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    Science.gov (United States)

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  19. Searching for genomic constraints

    Energy Technology Data Exchange (ETDEWEB)

    Lio` , P [Cambridge, Univ. (United Kingdom). Genetics Dept.; Ruffo, S [Florence, Univ. (Italy). Fac. di Ingegneria. Dipt. di Energetica ` S. Stecco`

    1998-01-01

    The authors have analyzed general properties of very long DNA sequences belonging to simple and complex organisms, by using different correlation methods. They have distinguished those base compositional rules that concern the entire genome which they call `genomic constraints` from the rules that depend on the `external natural selection` acting on single genes, i. e. protein-centered constraints. They show that G + C content, purine / pyrimidine distributions and biological complexity of the organism are the most important factors which determine base compositional rules and genome complexity. Three main facts are here reported: bacteria with high G + C content have more restrictions on base composition than those with low G + C content; at constant G + C content more complex organisms, ranging from prokaryotes to higher eukaryotes (e.g. human) display an increase of repeats 10-20 nucleotides long, which are also partly responsible for long-range correlations; work selection of length 3 to 10 is stronger in human and in bacteria for two distinct reasons. With respect to previous studies, they have also compared the genomic sequence of the archeon Methanococcus jannaschii with those of bacteria and eukaryotes: it shows sometimes an intermediate statistical behaviour.

  20. Searching for genomic constraints

    International Nuclear Information System (INIS)

    Lio', P.; Ruffo, S.

    1998-01-01

    The authors have analyzed general properties of very long DNA sequences belonging to simple and complex organisms, by using different correlation methods. They have distinguished those base compositional rules that concern the entire genome which they call 'genomic constraints' from the rules that depend on the 'external natural selection' acting on single genes, i. e. protein-centered constraints. They show that G + C content, purine / pyrimidine distributions and biological complexity of the organism are the most important factors which determine base compositional rules and genome complexity. Three main facts are here reported: bacteria with high G + C content have more restrictions on base composition than those with low G + C content; at constant G + C content more complex organisms, ranging from prokaryotes to higher eukaryotes (e.g. human) display an increase of repeats 10-20 nucleotides long, which are also partly responsible for long-range correlations; work selection of length 3 to 10 is stronger in human and in bacteria for two distinct reasons. With respect to previous studies, they have also compared the genomic sequence of the archeon Methanococcus jannaschii with those of bacteria and eukaryotes: it shows sometimes an intermediate statistical behaviour

  1. The genome and transcriptome of Phalaenopsis yield insights into floral organ development and flowering regulation

    Directory of Open Access Journals (Sweden)

    Jian-Zhi Huang

    2016-05-01

    Full Text Available The Phalaenopsis orchid is an important potted flower of high economic value around the world. We report the 3.1 Gb draft genome assembly of an important winter flowering Phalaenopsis ‘KHM190’ cultivar. We generated 89.5 Gb RNA-seq and 113 million sRNA-seq reads to use these data to identify 41,153 protein-coding genes and 188 miRNA families. We also generated a draft genome for Phalaenopsis pulcherrima ‘B8802,’ a summer flowering species, via resequencing. Comparison of genome data between the two Phalaenopsis cultivars allowed the identification of 691,532 single-nucleotide polymorphisms. In this study, we reveal that the key role of PhAGL6b in the regulation of labellum organ development involves alternative splicing in the big lip mutant. Petal or sepal overexpressing PhAGL6b leads to the conversion into a lip-like structure. We also discovered that the gibberellin pathway that regulates the expression of flowering time genes during the reproductive phase change is induced by cool temperature. Our work thus depicted a valuable resource for the flowering control, flower architecture development, and breeding of the Phalaenopsis orchids.

  2. Structural analysis of a set of proteins resulting from a bacterial genomics project.

    Science.gov (United States)

    Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R

    2005-09-01

    The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.

  3. Rubella virus capsid protein modulation of viral genomic and subgenomic RNA synthesis

    International Nuclear Information System (INIS)

    Tzeng, W.-P.; Frey, Teryl K.

    2005-01-01

    The ratio of the subgenomic (SG) to genome RNA synthesized by rubella virus (RUB) replicons expressing the green fluorescent protein reporter gene (RUBrep/GFP) is substantially higher than the ratio of these species synthesized by RUB (4.3 for RUBrep/GFP vs. 1.3-1.4 for RUB). It was hypothesized that this modulation of the viral RNA synthesis was by one of the virus structural protein genes and it was found that introduction of the capsid (C) protein gene into the replicons as an in-frame fusion with GFP resulted in an increase of genomic RNA production (reducing the SG/genome RNA ratio), confirming the hypothesis and showing that the C gene was the moiety responsible for the modulation effect. The N-terminal one-third of the C gene was required for the effect of be exhibited. A similar phenomenon was not observed with the replicons of Sindbis virus, a related Alphavirus. Interestingly, modulation was not observed when RUBrep/GFP was co-transfected with either other RUBrep or plasmid constructs expressing the C gene, demonstrating that modulation could occur only when the C gene was provided in cis. Mutations that prevented translation of the C protein failed to modulate RNA synthesis, indicating that the C protein was the moiety responsible for modulation; consistent with this conclusion, modulation of RNA synthesis was maintained when synonymous codon mutations were introduced at the 5' end of the C gene that changed the C gene sequence without altering the amino acid sequence of the C protein. These results indicate that C protein translated in proximity of viral replication complexes, possibly from newly synthesized SG RNA, participate in regulating the replication of viral RNA

  4. Molecular basis for the genome engagement by Sox proteins.

    Science.gov (United States)

    Hou, Linlin; Srivastava, Yogesh; Jauch, Ralf

    2017-03-01

    The Sox transcription factor family consists of 20 members in the human genome. Many of them are key determinants of cellular identities and possess the capacity to reprogram cell fates by pioneering the epigenetic remodeling of the genome. This activity is intimately tied to their ability to specifically bind and bend DNA alone or with other proteins. Here we discuss our current knowledge on how Sox transcription factors such as Sox2, Sox17, Sox18 and Sox9 'read' the genome to find and regulate their target genes and highlight the roles of partner factors including Pax6, Nanog, Oct4 and Brn2. We integrate insights from structural and biochemical studies as well as high-throughput assays to probe DNA specificity in vitro as well as in cells and tissues. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  5. Comparison of Various Nuclear Localization Signal-Fused Cas9 Proteins and Cas9 mRNA for Genome Editing in Zebrafish.

    Science.gov (United States)

    Hu, Peinan; Zhao, Xueying; Zhang, Qinghua; Li, Weiming; Zu, Yao

    2018-03-02

    The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has been proven to be an efficient and precise genome editing technology in various organisms. However, the gene editing efficiencies of Cas9 proteins with a nuclear localization signal (NLS) fused to different termini and Cas9 mRNA have not been systematically compared. Here, we compared the ability of Cas9 proteins with NLS fused to the N-, C-, or both the N- and C-termini and N-NLS-Cas9-NLS-C mRNA to target two sites in the tyr gene and two sites in the gol gene related to pigmentation in zebrafish. Phenotypic analysis revealed that all types of Cas9 led to hypopigmentation in similar proportions of injected embryos. Genome analysis by T7 Endonuclease I (T7E1) assays demonstrated that all types of Cas9 similarly induced mutagenesis in four target sites. Sequencing results further confirmed that a high frequency of indels occurred in the target sites ( tyr1 > 66%, tyr2 > 73%, gol1 > 50%, and gol2 > 35%), as well as various types (more than six) of indel mutations observed in all four types of Cas9-injected embryos. Furthermore, all types of Cas9 showed efficient targeted mutagenesis on multiplex genome editing, resulting in multiple phenotypes simultaneously. Collectively, we conclude that various NLS-fused Cas9 proteins and Cas9 mRNAs have similar genome editing efficiencies on targeting single or multiple genes, suggesting that the efficiency of CRISPR/Cas9 genome editing is highly dependent on guide RNAs (gRNAs) and gene loci. These findings may help to simplify the selection of Cas9 for gene editing using the CRISPR/Cas9 system. Copyright © 2018 Hu et al.

  6. The Genomic Code: Genome Evolution and Potential Applications

    KAUST Repository

    Bernardi, Giorgio

    2016-01-25

    The genome of metazoans is organized according to a genomic code which comprises three laws: 1) Compositional correlations hold between contiguous coding and non-coding sequences, as well as among the three codon positions of protein-coding genes; these correlations are the consequence of the fact that the genomes under consideration consist of fairly homogeneous, long (≥200Kb) sequences, the isochores; 2) Although isochores are defined on the basis of purely compositional properties, GC levels of isochores are correlated with all tested structural and functional properties of the genome; 3) GC levels of isochores are correlated with chromosome architecture from interphase to metaphase; in the case of interphase the correlation concerns isochores and the three-dimensional “topological associated domains” (TADs); in the case of mitotic chromosomes, the correlation concerns isochores and chromosomal bands. Finally, the genomic code is the fourth and last pillar of molecular biology, the first three pillars being 1) the double helix structure of DNA; 2) the regulation of gene expression in prokaryotes; and 3) the genetic code.

  7. Genomic definition of species. Revision 1

    Energy Technology Data Exchange (ETDEWEB)

    Crkvenjakov, R.; Dramanac, R.

    1992-06-01

    A genome is the sum total of the DNA sequences in the cells of an individual organism. The common usage that species possess genomes comes naturally to biochemists, who have shown that all protein and nucleic acid molecules are at the same time species and individual-specific, with minor individual variations being superimposed on a consensus sequence that is constant for a species. By extension, this property is attributed to the common features of DNA in the chromosomes of members of a given species and is called (species) genome. The definition of species based on chromosomes, genes, or genome common to its member organisms has been implied or mentioned in passing numerous times. Some population biologists think that members of species have similar ``homeostatic genotypes,`` which are to a degree resistant to mutation or environmental change in the production of a basic phenotype.

  8. Comprehensive analysis of LANA interacting proteins essential for viral genome tethering and persistence.

    Directory of Open Access Journals (Sweden)

    Subhash C Verma

    Full Text Available Kaposi's sarcoma associated herpesvirus is tightly linked to multiple human malignancies including Kaposi's sarcoma (KS, Primary Effusion Lymphoma (PEL and Multicentric Castleman's Disease (MCD. KSHV like other herpesviruses establishes life-long latency in the infected host by persisting as chromatin and tethering to host chromatin through the virally encoded protein Latency Associated Nuclear Antigen (LANA. LANA, a multifunctional protein, is capable of binding to a large number of cellular proteins responsible for transcriptional regulation of various cellular and viral pathways involved in blocking cell death and promoting cell proliferation. This leads to enhanced cell division and replication of the viral genome, which segregates faithfully in the dividing tumor cells. The mechanism of genome segregation is well known and the binding of LANA to nucleosomal proteins, throughout the cell cycle, suggests that these interactions play an important role in efficient segregation. Various biochemical methods have identified a large number of LANA binding proteins, including histone H2A/H2B, histone H1, MeCP2, DEK, CENP-F, NuMA, Bub1, HP-1, and Brd4. These nucleosomal proteins may have various functions in tethering of the viral genome during specific phases of the viral life cycle. Therefore, we performed a comprehensive analysis of their interaction with LANA using a number of different assays. We show that LANA binds to core nucleosomal histones and also associates with other host chromatin proteins including histone H1 and high mobility group proteins (HMGs. We used various biochemical assays including co-immunoprecipitation and in-vivo localization by split GFP and fluorescence resonance energy transfer (FRET to demonstrate their association.

  9. Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology

    DEFF Research Database (Denmark)

    Rossin, Elizabeth J.; Hansen, Kasper Lage; Raychaudhuri, Soumya

    2011-01-01

    Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these r......Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed...... in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more...... that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non...

  10. Were protein internal repeats formed by "bricolage"?

    Science.gov (United States)

    Lavorgna, G; Patthy, L; Boncinelli, E

    2001-03-01

    Is evolution an engineer, or is it a tinkerer--a "bricoleur"--building up complex molecules in organisms by increasing and adapting the materials at hand? An analysis of completely sequenced genomes suggests the latter, showing that increasing repetition of modules within the proteins encoded by these genomes is correlated with increasing complexity of the organism.

  11. Chromatin structure and dynamics in hot environments: architectural proteins and DNA topoisomerases of thermophilic archaea.

    Science.gov (United States)

    Visone, Valeria; Vettone, Antonella; Serpe, Mario; Valenti, Anna; Perugino, Giuseppe; Rossi, Mosè; Ciaramella, Maria

    2014-09-25

    In all organisms of the three living domains (Bacteria, Archaea, Eucarya) chromosome-associated proteins play a key role in genome functional organization. They not only compact and shape the genome structure, but also regulate its dynamics, which is essential to allow complex genome functions. Elucidation of chromatin composition and regulation is a critical issue in biology, because of the intimate connection of chromatin with all the essential information processes (transcription, replication, recombination, and repair). Chromatin proteins include architectural proteins and DNA topoisomerases, which regulate genome structure and remodelling at two hierarchical levels. This review is focussed on architectural proteins and topoisomerases from hyperthermophilic Archaea. In these organisms, which live at high environmental temperature (>80 °C <113 °C), chromatin proteins and modulation of the DNA secondary structure are concerned with the problem of DNA stabilization against heat denaturation while maintaining its metabolic activity.

  12. Chromatin Structure and Dynamics in Hot Environments: Architectural Proteins and DNA Topoisomerases of Thermophilic Archaea

    Directory of Open Access Journals (Sweden)

    Valeria Visone

    2014-09-01

    Full Text Available In all organisms of the three living domains (Bacteria, Archaea, Eucarya chromosome-associated proteins play a key role in genome functional organization. They not only compact and shape the genome structure, but also regulate its dynamics, which is essential to allow complex genome functions. Elucidation of chromatin composition and regulation is a critical issue in biology, because of the intimate connection of chromatin with all the essential information processes (transcription, replication, recombination, and repair. Chromatin proteins include architectural proteins and DNA topoisomerases, which regulate genome structure and remodelling at two hierarchical levels. This review is focussed on architectural proteins and topoisomerases from hyperthermophilic Archaea. In these organisms, which live at high environmental temperature (>80 °C <113 °C, chromatin proteins and modulation of the DNA secondary structure are concerned with the problem of DNA stabilization against heat denaturation while maintaining its metabolic activity.

  13. Genomic definition of species. Revision 2

    Energy Technology Data Exchange (ETDEWEB)

    Crkvenjakov, R.; Drmanac, R.

    1993-03-01

    A genome is the sum total of the DNA sequences in the cells of an individual organism. The common usage that species possess genomes comes naturally to biochemists, who have shown that all protein and nucleic acid molecules are at the same time species- and individual-specific, with minor individual variations being superimposed on a consensus sequence that is constant for a species. By extension, this property is attributed to the common features of DNA in the chromosomes of members of a given species and is called species genome. Our proposal for the definition of a biological species is as follows: A species comprises a group of actual and potential biological organisms built according to a unique genome program that is recorded, and at least in part expressed, in the structures of their genomic nucleic acid molecule(s), having intragroup sequence differences which can be fully interconverted in the process of organismal reproduction.

  14. Protein Charge and Mass Contribute to the Spatio-temporal Dynamics of Protein-Protein Interactions in a Minimal Proteome

    Science.gov (United States)

    Xu, Yu; Wang, Hong; Nussinov, Ruth; Ma, Buyong

    2013-01-01

    We constructed and simulated a ‘minimal proteome’ model using Langevin dynamics. It contains 206 essential protein types which were compiled from the literature. For comparison, we generated six proteomes with randomized concentrations. We found that the net charges and molecular weights of the proteins in the minimal genome are not random. The net charge of a protein decreases linearly with molecular weight, with small proteins being mostly positively charged and large proteins negatively charged. The protein copy numbers in the minimal genome have the tendency to maximize the number of protein-protein interactions in the network. Negatively charged proteins which tend to have larger sizes can provide large collision cross-section allowing them to interact with other proteins; on the other hand, the smaller positively charged proteins could have higher diffusion speed and are more likely to collide with other proteins. Proteomes with random charge/mass populations form less stable clusters than those with experimental protein copy numbers. Our study suggests that ‘proper’ populations of negatively and positively charged proteins are important for maintaining a protein-protein interaction network in a proteome. It is interesting to note that the minimal genome model based on the charge and mass of E. Coli may have a larger protein-protein interaction network than that based on the lower organism M. pneumoniae. PMID:23420643

  15. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia.

    Science.gov (United States)

    Morrison, Hilary G; McArthur, Andrew G; Gillin, Frances D; Aley, Stephen B; Adam, Rodney D; Olsen, Gary J; Best, Aaron A; Cande, W Zacheus; Chen, Feng; Cipriano, Michael J; Davids, Barbara J; Dawson, Scott C; Elmendorf, Heidi G; Hehl, Adrian B; Holder, Michael E; Huse, Susan M; Kim, Ulandt U; Lasek-Nesselquist, Erica; Manning, Gerard; Nigam, Anuranjini; Nixon, Julie E J; Palm, Daniel; Passamaneck, Nora E; Prabhu, Anjali; Reich, Claudia I; Reiner, David S; Samuelson, John; Svard, Staffan G; Sogin, Mitchell L

    2007-09-28

    The genome of the eukaryotic protist Giardia lamblia, an important human intestinal parasite, is compact in structure and content, contains few introns or mitochondrial relics, and has simplified machinery for DNA replication, transcription, RNA processing, and most metabolic pathways. Protein kinases comprise the single largest protein class and reflect Giardia's requirement for a complex signal transduction network for coordinating differentiation. Lateral gene transfer from bacterial and archaeal donors has shaped Giardia's genome, and previously unknown gene families, for example, cysteine-rich structural proteins, have been discovered. Unexpectedly, the genome shows little evidence of heterozygosity, supporting recent speculations that this organism is sexual. This genome sequence will not only be valuable for investigating the evolution of eukaryotes, but will also be applied to the search for new therapeutics for this parasite.

  16. Short Toxin-like Proteins Abound in Cnidaria Genomes

    Directory of Open Access Journals (Sweden)

    Michal Linial

    2012-11-01

    Full Text Available Cnidaria is a rich phylum that includes thousands of marine species. In this study, we focused on Anthozoa and Hydrozoa that are represented by the Nematostella vectensis (Sea anemone and Hydra magnipapillata genomes. We present a method for ranking the toxin-like candidates from complete proteomes of Cnidaria. Toxin-like functions were revealed using ClanTox, a statistical machine-learning predictor trained on ion channel inhibitors from venomous animals. Fundamental features that were emphasized in training ClanTox include cysteines and their spacing along the sequences. Among the 83,000 proteins derived from Cnidaria representatives, we found 170 candidates that fulfill the properties of toxin-like-proteins, the vast majority of which were previously unrecognized as toxins. An additional 394 short proteins exhibit characteristics of toxin-like proteins at a moderate degree of confidence. Remarkably, only 11% of the predicted toxin-like proteins were previously classified as toxins. Based on our prediction methodology and manual annotation, we inferred functions for over 400 of these proteins. Such functions include protease inhibitors, membrane pore formation, ion channel blockers and metal binding proteins. Many of the proteins belong to small families of paralogs. We conclude that the evolutionary expansion of toxin-like proteins in Cnidaria contributes to their fitness in the complex environment of the aquatic ecosystem.

  17. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    Science.gov (United States)

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-05-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.

  18. Structural Genomics and Drug Discovery for Infectious Diseases

    International Nuclear Information System (INIS)

    Anderson, W.F.

    2009-01-01

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  19. The footprint of metabolism in the organization of mammalian genomes

    Directory of Open Access Journals (Sweden)

    Berná Luisa

    2012-05-01

    Full Text Available Abstract Background At present five evolutionary hypotheses have been proposed to explain the great variability of the genomic GC content among and within genomes: the mutational bias, the biased gene conversion, the DNA breakpoints distribution, the thermal stability and the metabolic rate. Several studies carried out on bacteria and teleostean fish pointed towards the critical role played by the environment on the metabolic rate in shaping the base composition of genomes. In mammals the debate is still open, and evidences have been produced in favor of each evolutionary hypothesis. Human genes were assigned to three large functional categories (as well as to the corresponding functional classes according to the KOG database: (i information storage and processing, (ii cellular processes and signaling, and (iii metabolism. The classification was extended to the organisms so far analyzed performing a reciprocal Blastp and selecting the best reciprocal hit. The base composition was calculated for each sequence of the whole CDS dataset. Results The GC3 level of the above functional categories was increasing from (i to (iii. This specific compositional pattern was found, as footprint, in all mammalian genomes, but not in frog and lizard ones. Comparative analysis of human versus both frog and lizard functional categories showed that genes involved in the metabolic processes underwent the highest GC3 increment. Analyzing the KOG functional classes of genes, again a well defined intra-genomic pattern was found in all mammals. Not only genes of metabolic pathways, but also genes involved in chromatin structure and dynamics, transcription, signal transduction mechanisms and cytoskeleton, showed an average GC3 level higher than that of the whole genome. In the case of the human genome, the genes of the aforementioned functional categories showed a high probability to be associated with the chromosomal bands. Conclusions In the light of different

  20. From the genome sequence to the protein inventory of Bacillus subtilis.

    Science.gov (United States)

    Becher, Dörte; Büttner, Knut; Moche, Martin; Hessling, Bernd; Hecker, Michael

    2011-08-01

    Owing to the low number of proteins necessary to render a bacterial cell viable, bacteria are extremely attractive model systems to understand how the genome sequence is translated into actual life processes. One of the most intensively investigated model organisms is Bacillus subtilis. It has attracted world-wide research interest, addressing cell differentiation and adaptation on a molecular scale as well as biotechnological production processes. Meanwhile, we are looking back on more than 25 years of B. subtilis proteomics. A wide range of methods have been developed during this period for the large-scale qualitative and quantitative proteome analysis. Currently, it is possible to identify and quantify more than 50% of the predicted proteome in different cellular subfractions. In this review, we summarize the development of B. subtilis proteomics during the past 25 years. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    Directory of Open Access Journals (Sweden)

    Michael Strong

    2009-12-01

    Full Text Available As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php.

  2. Comparative genome analysis of entomopathogenic fungi reveals a complex set of secreted proteins.

    Science.gov (United States)

    Staats, Charley Christian; Junges, Angela; Guedes, Rafael Lucas Muniz; Thompson, Claudia Elizabeth; de Morais, Guilherme Loss; Boldo, Juliano Tomazzoni; de Almeida, Luiz Gonzaga Paula; Andreis, Fábio Carrer; Gerber, Alexandra Lehmkuhl; Sbaraini, Nicolau; da Paixão, Rana Louise de Andrade; Broetto, Leonardo; Landell, Melissa; Santi, Lucélia; Beys-da-Silva, Walter Orlando; Silveira, Carolina Pereira; Serrano, Thaiane Rispoli; de Oliveira, Eder Silva; Kmetzsch, Lívia; Vainstein, Marilene Henning; de Vasconcelos, Ana Tereza Ribeiro; Schrank, Augusto

    2014-09-29

    Metarhizium anisopliae is an entomopathogenic fungus used in the biological control of some agricultural insect pests, and efforts are underway to use this fungus in the control of insect-borne human diseases. A large repertoire of proteins must be secreted by M. anisopliae to cope with the various available nutrients as this fungus switches through different lifestyles, i.e., from a saprophytic, to an infectious, to a plant endophytic stage. To further evaluate the predicted secretome of M. anisopliae, we employed genomic and transcriptomic analyses, coupled with phylogenomic analysis, focusing on the identification and characterization of secreted proteins. We determined the M. anisopliae E6 genome sequence and compared this sequence to other entomopathogenic fungi genomes. A robust pipeline was generated to evaluate the predicted secretomes of M. anisopliae and 15 other filamentous fungi, leading to the identification of a core of secreted proteins. Transcriptomic analysis using the tick Rhipicephalus microplus cuticle as an infection model during two periods of infection (48 and 144 h) allowed the identification of several differentially expressed genes. This analysis concluded that a large proportion of the predicted secretome coding genes contained altered transcript levels in the conditions analyzed in this study. In addition, some specific secreted proteins from Metarhizium have an evolutionary history similar to orthologs found in Beauveria/Cordyceps. This similarity suggests that a set of secreted proteins has evolved to participate in entomopathogenicity. The data presented represents an important step to the characterization of the role of secreted proteins in the virulence and pathogenicity of M. anisopliae.

  3. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  4. Protein interactions in genome maintenance as novel antibacterial targets.

    Directory of Open Access Journals (Sweden)

    Aimee H Marceau

    Full Text Available Antibacterial compounds typically act by directly inhibiting essential bacterial enzyme activities. Although this general mechanism of action has fueled traditional antibiotic discovery efforts for decades, new antibiotic development has not kept pace with the emergence of drug resistant bacterial strains. These limitations have severely restricted the therapeutic tools available for treating bacterial infections. Here we test an alternative antibacterial lead-compound identification strategy in which essential protein-protein interactions are targeted rather than enzymatic activities. Bacterial single-stranded DNA-binding proteins (SSBs form conserved protein interaction "hubs" that are essential for recruiting many DNA replication, recombination, and repair proteins to SSB/DNA nucleoprotein substrates. Three small molecules that block SSB/protein interactions are shown to have antibacterial activity against diverse bacterial species. Consistent with a model in which the compounds target multiple SSB/protein interactions, treatment of Bacillus subtilis cultures with the compounds leads to rapid inhibition of DNA replication and recombination, and ultimately to cell death. The compounds also have unanticipated effects on protein synthesis that could be due to a previously unknown role for SSB/protein interactions in translation or to off-target effects. Our results highlight the potential of targeting protein-protein interactions, particularly those that mediate genome maintenance, as a powerful approach for identifying new antibacterial compounds.

  5. A genome-wide screen identifies conserved protein hubs required for cadherin-mediated cell–cell adhesion

    Science.gov (United States)

    Toret, Christopher P.; D’Ambrosio, Michael V.; Vale, Ronald D.; Simon, Michael A.

    2014-01-01

    Cadherins and associated catenins provide an important structural interface between neighboring cells, the actin cytoskeleton, and intracellular signaling pathways in a variety of cell types throughout the Metazoa. However, the full inventory of the proteins and pathways required for cadherin-mediated adhesion has not been established. To this end, we completed a genome-wide (∼14,000 genes) ribonucleic acid interference (RNAi) screen that targeted Ca2+-dependent adhesion in DE-cadherin–expressing Drosophila melanogaster S2 cells in suspension culture. This novel screen eliminated Ca2+-independent cell–cell adhesion, integrin-based adhesion, cell spreading, and cell migration. We identified 17 interconnected regulatory hubs, based on protein functions and protein–protein interactions that regulate the levels of the core cadherin–catenin complex and coordinate cadherin-mediated cell–cell adhesion. Representative proteins from these hubs were analyzed further in Drosophila oogenesis, using targeted germline RNAi, and adhesion was analyzed in Madin–Darby canine kidney mammalian epithelial cell–cell adhesion. These experiments reveal roles for a diversity of cellular pathways that are required for cadherin function in Metazoa, including cytoskeleton organization, cell–substrate interactions, and nuclear and cytoplasmic signaling. PMID:24446484

  6. Approaching the sequential and three-dimensional organization of Archaea, Bacteria and Eukarya genomes. Dynamic Organization of Nuclear Function

    NARCIS (Netherlands)

    T.A. Knoch (Tobias); M. Göker (Markus); R. Lohner (Rudolf); J. Langowski (Jörg)

    2002-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  7. MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.

    Science.gov (United States)

    Gupta, Ankit; Kapil, Rohan; Dhakan, Darshan B; Sharma, Vineet K

    2014-01-01

    The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.

  8. Complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus).

    Science.gov (United States)

    Li, Linmiao; Li, Min; Wu, Zhengjun; Chen, Jinping

    2015-01-01

    We have characterized the complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus) and described its organization in this study. The total length of C. sphinx complete mitochondrial genome was 16,895 bp with the base composition of 32.54% A, 14.05% G, 25.82% T and 27.59% C. The complete mitochondrial genome included 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA) and 1 control region (D-loop). The control region was 1435 bp long with the sequence CATACG repeat 64 times. Three protein-coding genes (ND1, COI and ND4) were ended with incomplete stop codon TA or T.

  9. Sequence analysis of the PIP5K locus in Eimeria maxima provides further evidence for eimerian genome plasticity and segmental organization.

    Science.gov (United States)

    Song, B K; Pan, M Z; Lau, Y L; Wan, K L

    2014-07-29

    Commercial flocks infected by Eimeria species parasites, including Eimeria maxima, have an increased risk of developing clinical or subclinical coccidiosis; an intestinal enteritis associated with increased mortality rates in poultry. Currently, infection control is largely based on chemotherapy or live vaccines; however, drug resistance is common and vaccines are relatively expensive. The development of new cost-effective intervention measures will benefit from unraveling the complex genetic mechanisms that underlie host-parasite interactions, including the identification and characterization of genes encoding proteins such as phosphatidylinositol 4-phosphate 5-kinase (PIP5K). We previously identified a PIP5K coding sequence within the E. maxima genome. In this study, we analyzed two bacterial artificial chromosome clones presenting a ~145-kb E. maxima (Weybridge strain) genomic region spanning the PIP5K gene locus. Sequence analysis revealed that ~95% of the simple sequence repeats detected were located within regions comparable to the previously described feature-rich segments of the Eimeria tenella genome. Comparative sequence analysis with the orthologous E. maxima (Houghton strain) region revealed a moderate level of conserved synteny. Unique segmental organizations and telomere-like repeats were also observed in both genomes. A number of incomplete transposable elements were detected and further scrutiny of these elements in both orthologous segments revealed interesting nesting events, which may play a role in facilitating genome plasticity in E. maxima. The current analysis provides more detailed information about the genome organization of E. maxima and may help to reveal genotypic differences that are important for expression of traits related to pathogenicity and virulence.

  10. DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.

    Science.gov (United States)

    Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin

    2016-01-01

    The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

  11. An Integrative Bioinformatics Framework for Genome-scale Multiple Level Network Reconstruction of Rice

    Directory of Open Access Journals (Sweden)

    Liu Lili

    2013-06-01

    Full Text Available Understanding how metabolic reactions translate the genome of an organism into its phenotype is a grand challenge in biology. Genome-wide association studies (GWAS statistically connect genotypes to phenotypes, without any recourse to known molecular interactions, whereas a molecular mechanistic description ties gene function to phenotype through gene regulatory networks (GRNs, protein-protein interactions (PPIs and molecular pathways. Integration of different regulatory information levels of an organism is expected to provide a good way for mapping genotypes to phenotypes. However, the lack of curated metabolic model of rice is blocking the exploration of genome-scale multi-level network reconstruction. Here, we have merged GRNs, PPIs and genome-scale metabolic networks (GSMNs approaches into a single framework for rice via omics’ regulatory information reconstruction and integration. Firstly, we reconstructed a genome-scale metabolic model, containing 4,462 function genes, 2,986 metabolites involved in 3,316 reactions, and compartmentalized into ten subcellular locations. Furthermore, 90,358 pairs of protein-protein interactions, 662,936 pairs of gene regulations and 1,763 microRNA-target interactions were integrated into the metabolic model. Eventually, a database was developped for systematically storing and retrieving the genome-scale multi-level network of rice. This provides a reference for understanding genotype-phenotype relationship of rice, and for analysis of its molecular regulatory network.

  12. Identification and characterization of insect-specific proteins by genome data analysis

    DEFF Research Database (Denmark)

    Zhang, Guojie; Wang, Hongsheng; Shi, Junjie

    2007-01-01

    melanogaster, Anopheles gambiae, Bombyx mori, Tribolium castaneum, and Apis mellifera were compared to the complete genomes of three non-insect eukaryotes (opisthokonts) Homo sapiens, Caenorhabditis elegans and Saccharomyces cerevisiae. This operation yielded 154 groups of orthologous proteins in Drosophila...

  13. Rapid Identification of Genetic Modifications in Bacillus anthracis Using Whole Genome Draft Sequences Generated by 454 Pyrosequencing

    Science.gov (United States)

    2010-08-25

    in honey bee colony collapse disorder. Science 318: 283–287. 39. Towner JS, Sealy TK, Khristova ML, Albarino CG, Conlan S, et al. (2008) Newly...utilize known, organism-specific proteins or genomic DNA signatures respectively. Hence, these assays lack the ability to detect novel natural variations...detection assays utilize known, organism-specific proteins or genomic DNA signatures respectively. Hence, these assays lack the ability to detect novel

  14. Functional and genomic analyses of alpha-solenoid proteins.

    Science.gov (United States)

    Fournier, David; Palidwor, Gareth A; Shcherbinin, Sergey; Szengel, Angelika; Schaefer, Martin H; Perez-Iratxeta, Carol; Andrade-Navarro, Miguel A

    2013-01-01

    Alpha-solenoids are flexible protein structural domains formed by ensembles of alpha-helical repeats (Armadillo and HEAT repeats among others). While homology can be used to detect many of these repeats, some alpha-solenoids have very little sequence homology to proteins of known structure and we expect that many remain undetected. We previously developed a method for detection of alpha-helical repeats based on a neural network trained on a dataset of protein structures. Here we improved the detection algorithm and updated the training dataset using recently solved structures of alpha-solenoids. Unexpectedly, we identified occurrences of alpha-solenoids in solved protein structures that escaped attention, for example within the core of the catalytic subunit of PI3KC. Our results expand the current set of known alpha-solenoids. Application of our tool to the protein universe allowed us to detect their significant enrichment in proteins interacting with many proteins, confirming that alpha-solenoids are generally involved in protein-protein interactions. We then studied the taxonomic distribution of alpha-solenoids to discuss an evolutionary scenario for the emergence of this type of domain, speculating that alpha-solenoids have emerged in multiple taxa in independent events by convergent evolution. We observe a higher rate of alpha-solenoids in eukaryotic genomes and in some prokaryotic families, such as Cyanobacteria and Planctomycetes, which could be associated to increased cellular complexity. The method is available at http://cbdm.mdc-berlin.de/~ard2/.

  15. DMS-Seq for In Vivo Genome-wide Mapping of Protein-DNA Interactions and Nucleosome Centers.

    Science.gov (United States)

    Umeyama, Taichi; Ito, Takashi

    2017-10-03

    Protein-DNA interactions provide the basis for chromatin structure and gene regulation. Comprehensive identification of protein-occupied sites is thus vital to an in-depth understanding of genome function. Dimethyl sulfate (DMS) is a chemical probe that has long been used to detect footprints of DNA-bound proteins in vitro and in vivo. Here, we describe a genomic footprinting method, dimethyl sulfate sequencing (DMS-seq), which exploits the cell-permeable nature of DMS to obviate the need for nuclear isolation. This feature makes DMS-seq simple in practice and removes the potential risk of protein re-localization during nuclear isolation. DMS-seq successfully detects transcription factors bound to cis-regulatory elements and non-canonical chromatin particles in nucleosome-free regions. Furthermore, an unexpected preference of DMS confers on DMS-seq a unique potential to directly detect nucleosome centers without using genetic manipulation. We expect that DMS-seq will serve as a characteristic method for genome-wide interrogation of in vivo protein-DNA interactions. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  16. Intrinsic disorder in Viral Proteins Genome-Linked: experimental and predictive analyses

    Directory of Open Access Journals (Sweden)

    Van Dorsselaer Alain

    2009-02-01

    Full Text Available Abstract Background VPgs are viral proteins linked to the 5' end of some viral genomes. Interactions between several VPgs and eukaryotic translation initiation factors eIF4Es are critical for plant infection. However, VPgs are not restricted to phytoviruses, being also involved in genome replication and protein translation of several animal viruses. To date, structural data are still limited to small picornaviral VPgs. Recently three phytoviral VPgs were shown to be natively unfolded proteins. Results In this paper, we report the bacterial expression, purification and biochemical characterization of two phytoviral VPgs, namely the VPgs of Rice yellow mottle virus (RYMV, genus Sobemovirus and Lettuce mosaic virus (LMV, genus Potyvirus. Using far-UV circular dichroism and size exclusion chromatography, we show that RYMV and LMV VPgs are predominantly or partly unstructured in solution, respectively. Using several disorder predictors, we show that both proteins are predicted to possess disordered regions. We next extend theses results to 14 VPgs representative of the viral diversity. Disordered regions were predicted in all VPg sequences whatever the genus and the family. Conclusion Based on these results, we propose that intrinsic disorder is a common feature of VPgs. The functional role of intrinsic disorder is discussed in light of the biological roles of VPgs.

  17. Comparative genome analysis reveals a conserved family of actin-like proteins in apicomplexan parasites

    Directory of Open Access Journals (Sweden)

    Sibley L David

    2005-12-01

    Full Text Available Abstract Background The phylum Apicomplexa is an early-branching eukaryotic lineage that contains a number of important human and animal pathogens. Their complex life cycles and unique cytoskeletal features distinguish them from other model eukaryotes. Apicomplexans rely on actin-based motility for cell invasion, yet the regulation of this system remains largely unknown. Consequently, we focused our efforts on identifying actin-related proteins in the recently completed genomes of Toxoplasma gondii, Plasmodium spp., Cryptosporidium spp., and Theileria spp. Results Comparative genomic and phylogenetic studies of apicomplexan genomes reveals that most contain only a single conventional actin and yet they each have 8–10 additional actin-related proteins. Among these are a highly conserved Arp1 protein (likely part of a conserved dynactin complex, and Arp4 and Arp6 homologues (subunits of the chromatin-remodeling machinery. In contrast, apicomplexans lack canonical Arp2 or Arp3 proteins, suggesting they lost the Arp2/3 actin polymerization complex on their evolutionary path towards intracellular parasitism. Seven of these actin-like proteins (ALPs are novel to apicomplexans. They show no phylogenetic associations to the known Arp groups and likely serve functions specific to this important group of intracellular parasites. Conclusion The large diversity of actin-like proteins in apicomplexans suggests that the actin protein family has diverged to fulfill various roles in the unique biology of intracellular parasites. Conserved Arps likely participate in vesicular transport and gene expression, while apicomplexan-specific ALPs may control unique biological traits such as actin-based gliding motility.

  18. Mitochondrial genome evolution in Alismatales: Size reduction and extensive loss of ribosomal protein genes

    DEFF Research Database (Denmark)

    Petersen, Gitte; Cuenca, Argelia; Zervas, Athanasios

    2017-01-01

    The order Alismatales is a hotspot for evolution of plant mitochondrial genomes characterized by remarkable differences in genome size, substitution rates, RNA editing, retrotranscription, gene loss and intron loss. Here we have sequenced the complete mitogenomes of Zostera marina and Stratiotes...... aloides, which together with previously sequenced mitogenomes from Butomus and Spirodela, provide new evolutionary evidence of genome size reduction, gene loss and transfer to the nucleus. The Zostera mitogenome includes a large portion of DNA transferred from the plastome, yet it is the smallest known...... mitogenome from a non-parasitic plant. Using a broad sample of the Alismatales, the evolutionary history of ribosomal protein gene loss is analyzed. In Zostera almost all ribosomal protein genes are lost from the mitogenome, but only some can be found in the nucleus....

  19. Bacterial Genome Editing Strategy for Control of Transcription and Protein Stability

    DEFF Research Database (Denmark)

    Lauritsen, Ida; Martinez, Virginia; Ronda, Carlotta

    2018-01-01

    In molecular biology and cell factory engineering, tools that enable control of protein production and stability are highly important. Here, we describe protocols for tagging genes in Escherichia coli allowing for inducible degradation and transcriptional control of any soluble protein of interest....... The underlying molecular biology is based on the two cross-kingdom tools CRISPRi and the N-end rule for protein degradation. Genome editing is performed with the CRMAGE technology and randomization of the translational initiation region minimizes the polar effects of tag insertion. The approach has previously...... been applied for targeting proteins originating from essential operon-located genes and has potential to serve as a universal synthetic biology tool....

  20. Evolutionary gradient of predicted nuclear localization signals (NLS)-bearing proteins in genomes of family Planctomycetaceae.

    Science.gov (United States)

    Guo, Min; Yang, Ruifu; Huang, Chen; Liao, Qiwen; Fan, Guangyi; Sun, Chenghang; Lee, Simon Ming-Yuen

    2017-04-04

    The nuclear envelope is considered a key classification marker that distinguishes prokaryotes from eukaryotes. However, this marker does not apply to the family Planctomycetaceae, which has intracellular spaces divided by lipidic intracytoplasmic membranes (ICMs). Nuclear localization signal (NLS), a short stretch of amino acid sequence, destines to transport proteins from cytoplasm into nucleus, and is also associated with the development of nuclear envelope. We attempted to investigate the NLS motifs in Planctomycetaceae genomes to demonstrate the potential molecular transition in the development of intracellular membrane system. In this study, we identified NLS-like motifs that have the same amino acid compositions as experimentally identified NLSs in genomes of 11 representative species of family Planctomycetaceae. A total of 15 NLS types and 170 NLS-bearing proteins were detected in the 11 strains. To determine the molecular transformation, we compared NLS-bearing protein abundances in the 11 representative Planctomycetaceae genomes with them in genomes of 16 taxonomically varied microorganisms: nine bacteria, two archaea and five fungi. In the 27 strains, 29 NLS types and 1101 NLS-bearing proteins were identified, principal component analysis showed a significant transitional gradient from bacteria to Planctomycetaceae to fungi on their NLS-bearing protein abundance profiles. Then, we clustered the 993 non-redundant NLS-bearing proteins into 181 families and annotated their involved metabolic pathways. Afterwards, we aligned the ten types of NLS motifs from the 13 families containing NLS-bearing proteins among bacteria, Planctomycetaceae or fungi, considering their diversity, length and origin. A transition towards increased complexity from non-planctomycete bacteria to Planctomycetaceae to archaea and fungi was detected based on the complexity of the 10 types of NLS-like motifs in the 13 NLS-bearing proteins families. The results of this study reveal that

  1. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  2. The genome editing revolution

    DEFF Research Database (Denmark)

    Stella, Stefano; Montoya, Guillermo

    2016-01-01

    -Cas system has become the main tool for genome editing in many laboratories. Currently the targeted genome editing technology has been used in many fields and may be a possible approach for human gene therapy. Furthermore, it can also be used to modifying the genomes of model organisms for studying human......In the last 10 years, we have witnessed a blooming of targeted genome editing systems and applications. The area was revolutionized by the discovery and characterization of the transcription activator-like effector proteins, which are easier to engineer to target new DNA sequences than...... sequence). This ribonucleoprotein complex protects bacteria from invading DNAs, and it was adapted to be used in genome editing. The CRISPR ribonucleic acid (RNA) molecule guides to the specific DNA site the Cas9 nuclease to cleave the DNA target. Two years and more than 1000 publications later, the CRISPR...

  3. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

    Science.gov (United States)

    Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

    2015-05-19

    Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.

  4. A New Approach to Dissect Nuclear Organization: TALE-Mediated Genome Visualization (TGV).

    Science.gov (United States)

    Miyanari, Yusuke

    2016-01-01

    Spatiotemporal organization of chromatin within the nucleus has so far remained elusive. Live visualization of nuclear remodeling could be a promising approach to understand its functional relevance in genome functions and mechanisms regulating genome architecture. Recent technological advances in live imaging of chromosomes begun to explore the biological roles of the movement of the chromatin within the nucleus. Here I describe a new technique, called TALE-mediated genome visualization (TGV), which allows us to visualize endogenous repetitive sequence including centromeric, pericentromeric, and telomeric repeats in living cells.

  5. Annotation of the Domestic Pig Genome by Quantitative Proteogenomics.

    Science.gov (United States)

    Marx, Harald; Hahne, Hannes; Ulbrich, Susanne E; Schnieke, Angelika; Rottmann, Oswald; Frishman, Dmitrij; Kuster, Bernhard

    2017-08-04

    The pig is one of the earliest domesticated animals in the history of human civilization and represents one of the most important livestock animals. The recent sequencing of the Sus scrofa genome was a major step toward the comprehensive understanding of porcine biology, evolution, and its utility as a promising large animal model for biomedical and xenotransplantation research. However, the functional and structural annotation of the Sus scrofa genome is far from complete. Here, we present mass spectrometry-based quantitative proteomics data of nine juvenile organs and six embryonic stages between 18 and 39 days after gestation. We found that the data provide evidence for and improve the annotation of 8176 protein-coding genes including 588 novel and 321 refined gene models. The analysis of tissue-specific proteins and the temporal expression profiles of embryonic proteins provides an initial functional characterization of expressed protein interaction networks and modules including as yet uncharacterized proteins. Comparative transcript and protein expression analysis to human organs reveal a moderate conservation of protein translation across species. We anticipate that this resource will facilitate basic and applied research on Sus scrofa as well as its porcine relatives.

  6. Gene organization inside replication domains in mammalian genomes

    Science.gov (United States)

    Zaghloul, Lamia; Baker, Antoine; Audit, Benjamin; Arneodo, Alain

    2012-11-01

    We investigate the large-scale organization of human genes with respect to "master" replication origins that were previously identified as bordering nucleotide compositional skew domains. We separate genes in two categories depending on their CpG enrichment at the promoter which can be considered as a marker of germline DNA methylation. Using expression data in mouse, we confirm that CpG-rich genes are highly expressed in germline whereas CpG-poor genes are in a silent state. We further show that, whether tissue-specific or broadly expressed (housekeeping genes), the CpG-rich genes are over-represented close to the replication skew domain borders suggesting some coordination of replication and transcription. We also reveal that the transcription of the longest CpG-rich genes is co-oriented with replication fork progression so that the promoter of these transcriptionally active genes be located into the accessible open chromatin environment surrounding the master replication origins that border the replication skew domains. The observation of a similar gene organization in the mouse genome confirms the interplay of replication, transcription and chromatin structure as the cornerstone of mammalian genome architecture.

  7. A genome-wide association study identifies protein quantitative trait loci (pQTLs.

    Directory of Open Access Journals (Sweden)

    David Melzer

    2008-05-01

    Full Text Available There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts - cis effects, and elsewhere in the genome - trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8x10(-57, CCL4L1 (p = 3.9x10(-21, IL18 (p = 6.8x10(-13, LPA (p = 4.4x10(-10, GGT1 (p = 1.5x10(-7, SHBG (p = 3.1x10(-7, CRP (p = 6.4x10(-6 and IL1RN (p = 7.3x10(-6 genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R, altered secretion rates of different sized proteins (LPA, variation in gene copy number (CCL4L1 and altered transcription (GGT1. We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha levels (p = 6.8x10(-40, but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis

  8. Genome projects and the functional-genomic era.

    Science.gov (United States)

    Sauer, Sascha; Konthur, Zoltán; Lehrach, Hans

    2005-12-01

    The problems we face today in public health as a result of the -- fortunately -- increasing age of people and the requirements of developing countries create an urgent need for new and innovative approaches in medicine and in agronomics. Genomic and functional genomic approaches have a great potential to at least partially solve these problems in the future. Important progress has been made by procedures to decode genomic information of humans, but also of other key organisms. The basic comprehension of genomic information (and its transfer) should now give us the possibility to pursue the next important step in life science eventually leading to a basic understanding of biological information flow; the elucidation of the function of all genes and correlative products encoded in the genome, as well as the discovery of their interactions in a molecular context and the response to environmental factors. As a result of the sequencing projects, we are now able to ask important questions about sequence variation and can start to comprehensively study the function of expressed genes on different levels such as RNA, protein or the cell in a systematic context including underlying networks. In this article we review and comment on current trends in large-scale systematic biological research. A particular emphasis is put on technology developments that can provide means to accomplish the tasks of future lines of functional genomics.

  9. Identification and characterization of insect-specific proteins by genome data analysis

    Directory of Open Access Journals (Sweden)

    Clark Terry

    2007-04-01

    Full Text Available Abstract Background Insects constitute the vast majority of known species with their importance including biodiversity, agricultural, and human health concerns. It is likely that the successful adaptation of the Insecta clade depends on specific components in its proteome that give rise to specialized features. However, proteome determination is an intensive undertaking. Here we present results from a computational method that uses genome analysis to characterize insect and eukaryote proteomes as an approximation complementary to experimental approaches. Results Homologs in common to Drosophila melanogaster, Anopheles gambiae, Bombyx mori, Tribolium castaneum, and Apis mellifera were compared to the complete genomes of three non-insect eukaryotes (opisthokonts Homo sapiens, Caenorhabditis elegans and Saccharomyces cerevisiae. This operation yielded 154 groups of orthologous proteins in Drosophila to be insect-specific homologs; 466 groups were determined to be common to eukaryotes (represented by three opisthokonts. ESTs from the hemimetabolous insect Locust migratoria were also considered in order to approximate their corresponding genes in the insect-specific homologs. Stress and stimulus response proteins were found to constitute a higher fraction in the insect-specific homologs than in the homologs common to eukaryotes. Conclusion The significant representation of stress response and stimulus response proteins in proteins determined to be insect-specific, along with specific cuticle and pheromone/odorant binding proteins, suggest that communication and adaptation to environments may distinguish insect evolution relative to other eukaryotes. The tendency for low Ka/Ks ratios in the insect-specific protein set suggests purifying selection pressure. The generally larger number of paralogs in the insect-specific proteins may indicate adaptation to environment changes. Instances in our insect-specific protein set have been arrived at through

  10. Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis.

    Science.gov (United States)

    Yutin, Natalya; Bäckström, Disa; Ettema, Thijs J G; Krupovic, Mart; Koonin, Eugene V

    2018-04-10

    Analysis of metagenomic sequences has become the principal approach for the study of the diversity of viruses. Many recent, extensive metagenomic studies on several classes of viruses have dramatically expanded the visible part of the virosphere, showing that previously undetected viruses, or those that have been considered rare, actually are important components of the global virome. We investigated the provenance of viruses related to tail-less bacteriophages of the family Tectiviridae by searching genomic and metagenomics sequence databases for distant homologs of the tectivirus-like Double Jelly-Roll major capsid proteins (DJR MCP). These searches resulted in the identification of numerous genomes of virus-like elements that are similar in size to tectiviruses (10-15 kilobases) and have diverse gene compositions. By comparison of the gene repertoires, the DJR MCP-encoding genomes were classified into 6 distinct groups that can be predicted to differ in reproduction strategies and host ranges. Only the DJR MCP gene that is present by design is shared by all these genomes, and most also encode a predicted DNA-packaging ATPase; the rest of the genes are present only in subgroups of this unexpectedly diverse collection of DJR MCP-encoding genomes. Only a minority encode a DNA polymerase which is a hallmark of the family Tectiviridae and the putative family "Autolykiviridae". Notably, one of the identified putative DJR MCP viruses encodes a homolog of Cas1 endonuclease, the integrase involved in CRISPR-Cas adaptation and integration of transposon-like elements called casposons. This is the first detected occurrence of Cas1 in a virus. Many of the identified elements are individual contigs flanked by inverted or direct repeats and appear to represent complete, extrachromosomal viral genomes, whereas others are flanked by bacterial genes and thus can be considered as proviruses. These contigs come from metagenomes of widely different environments, some dominated by

  11. The elevation of radiation load on ecosystems and genome instability of organisms

    International Nuclear Information System (INIS)

    Gaziyev, A. I.; Bezlepkin, V.Q.

    2002-01-01

    prophylaxis of human disorders. Thus, it was found that the action of low-dose ionizing radiation on living organisms might induce an adaptive repair response in them aimed at decreasing the genetic consequences of the exposure. However, the potentialities of defense and repair systems of an organism are limited, so an increase in genome lesions may cause inheritable mutations, cancer and other pathologies, and death. DNA lesions caused by ionizing radiation in small and sublethal doses can essentially be repaired, whereas unrepaired lesions and errors of repair, replication, and recombination systems lead to formation of mutational changes in DNA sequences. These changes may be transmitted to daughter cells and induce genome instability in the progeny. Induced genome instability in survived somatic cells is characterized by persistence of a high level of acquired variability in many generations of these cells. Genome instability manifests itself as an increased frequency of karyotypic anomalies, chromosome and gene mutations, clonal heterogeneity, and malignant transformation in the progeny of cells exposed to DNA-damaging agents. Besides, cells with genome instability show increased amplification of genes and changes in their expression, as well as disturbances in their differentiation, delays in reproductive death and other phenotypic characters of abnormal development. Whereas some progress has been made towards knowledge of genome instability in the somatic cells of mammals, the radiation-induced genome instability in germ cells transmitted to individuals of the next generation is still not clearly understood. At the same time, evidence has been obtained which suggests that the transmission of genome instability to the somatic cells of the progeny from the germ cells of gamma - radiation-exposed parents is possible. This conclusion is based on the data on mutation frequency in the progeny of parents exposed to DNA-damaging agents. For instance, a significant increase in

  12. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  13. Identification of a new genomic hot spot of evolutionary diversification of protein function.

    Directory of Open Access Journals (Sweden)

    Aline Winkelmann

    Full Text Available Establishment of phylogenetic relationships remains a challenging task because it is based on computational analysis of genomic hot spots that display species-specific sequence variations. Here, we identify a species-specific thymine-to-guanine sequence variation in the Glrb gene which gives rise to species-specific splice donor sites in the Glrb genes of mouse and bushbaby. The resulting splice insert in the receptor for the inhibitory neurotransmitter glycine (GlyR conveys synaptic receptor clustering and specific association with a particular synaptic plasticity-related splice variant of the postsynaptic scaffold protein gephyrin. This study identifies a new genomic hot spot which contributes to phylogenetic diversification of protein function and advances our understanding of phylogenetic relationships.

  14. Genomic clustering and homology between HET-S and the NWD2 STAND protein in various fungal genomes.

    Directory of Open Access Journals (Sweden)

    Asen Daskalov

    Full Text Available BACKGROUND: Prions are infectious proteins propagating as self-perpetuating amyloid polymers. The [Het-s] prion of Podospora anserina is involved in a cell death process associated with non-self recognition. The prion forming domain (PFD of HET-s adopts a β-solenoid amyloid structure characterized by the two fold repetition of an elementary triangular motif. [Het-s] induces cell death when interacting with HET-S, an allelic variant of HET-s. When templated by [Het-s], HET-S undergoes a trans-conformation, relocates to the cell membrane and induces toxicity. METHODOLOGY/PRINCIPAL FINDINGS: Here, comparing HET-s homologs from different species, we devise a consensus for the HET-s elementary triangular motif. We use this motif to screen genomic databases and find a match to the N-terminus of NWD2, a STAND protein, encoded by the gene immediately adjacent to het-S. STAND proteins are signal transducing ATPases which undergo ligand-induced oligomerisation. Homology modelling predicts that the NWD2 N-terminal region adopts a HET-s-like fold. We propose that upon NWD2 oligomerisation, these N-terminal extensions adopt the β-solenoid fold and template HET-S to adopt the amyloid fold and trigger toxicity. We extend this model to a putative prion, the σ infectious element in Nectria haematococca, because the s locus controlling propagation of σ also encodes a STAND protein and displays analogous features. Comparative genomic analyses indicate evolutionary conservation of these STAND/prion-like gene pairs, identify a number of novel prion candidates and define, in addition to the HET-s PFD motif, two distinct, novel putative PFD-like motifs. CONCLUSIONS/SIGNIFICANCE: We suggest the existence, in the fungal kingdom, of a widespread and evolutionarily conserved mode of signal transduction based on the transmission of an amyloid-fold from a NOD-like STAND receptor protein to an effector protein.

  15. Analysis of secreted proteins from Aspergillus flavus.

    Science.gov (United States)

    Medina, Martha L; Haynes, Paul A; Breci, Linda; Francisco, Wilson A

    2005-08-01

    MS/MS techniques in proteomics make possible the identification of proteins from organisms with little or no genome sequence information available. Peptide sequences are obtained from tandem mass spectra by matching peptide mass and fragmentation information to protein sequence information from related organisms, including unannotated genome sequence data. This peptide identification data can then be grouped and reconstructed into protein data. In this study, we have used this approach to study protein secretion by Aspergillus flavus, a filamentous fungus for which very little genome sequence information is available. A. flavus is capable of degrading the flavonoid rutin (quercetin 3-O-glycoside), as the only source of carbon via an extracellular enzyme system. In this continuing study, a proteomic analysis was used to identify secreted proteins from A. flavus when grown on rutin. The growth media glucose and potato dextrose were used to identify differentially expressed secreted proteins. The secreted proteins were analyzed by 1- and 2-DE and MS/MS. A total of 51 unique A. flavus secreted proteins were identified from the three growth conditions. Ten proteins were unique to rutin-, five to glucose- and one to potato dextrose-grown A. flavus. Sixteen secreted proteins were common to all three media. Fourteen identifications were of hypothetical proteins or proteins of unknown functions. To our knowledge, this is the first extensive proteomic study conducted to identify the secreted proteins from a filamentous fungus.

  16. Genomic analysis of murine DNA-dependent protein kinase

    International Nuclear Information System (INIS)

    Fujimori, A.; Abe, M.

    2003-01-01

    Full text: The gene of catalytic subunit of DNA dependent protein kinase is responsible gene for SCID mice. The molecules play a critical role in non-homologous end joining including the V(D)J recombination. Contribution of the molecules to the difference of radiosensitivity and the susceptibility to cancer has been suggested. Here we show the entire nucleotide sequence of approximately 193 kbp and 84 kbp genomic regions encoding the entire DNA-PKcs gene in the mouse and chicken respectively. Retroposon was found in the intron 51 of mouse genomic DNA-PKcs gene but in human and chicken. Comparative analysis of these two species strongly suggested that only two genes, DNA-PKcs and MCM4, exist in the region of both species. Several conserved sequences and cis elements, however, were predicted. Recently, the orthologous region for the human DNA-PKcs locus was completed. The results of further comparative study will be discussed

  17. PLAZA 3.0: an access point for plant comparative genomics

    Science.gov (United States)

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. PMID:25324309

  18. Promoter characterization and genomic organization of the human X11β gene APBA2.

    LENUS (Irish Health Repository)

    Hao, Yan

    2012-02-15

    Overexpression of neuronal adaptor protein X11β has been shown to decrease the production of amyloid-β, a toxic peptide deposited in Alzheimer\\'s disease brains. Therefore, manipulation of the X11β level may represent a potential therapeutic strategy for Alzheimer\\'s disease. As X11β expression can be regulated at the transcription level, we determined the genomic organization and the promoter of the human X11β gene, amyloid β A4 precursor protein-binding family A member 2 (APBA2). By RNA ligase-mediated rapid amplification of cDNA ends, a single APBA2 transcription start site and the complete sequence of exon 1 were identified. The APBA2 promoter was located upstream of exon 1 and was more active in neurons. The core promoter contains several CpG dinucleotides, and was strongly suppressed by DNA methylation. In addition, mutagenesis analysis revealed a putative Pax5-binding site within the promoter. Together, APBA2 contains a potent neuronal promoter whose activity may be regulated by DNA methylation and Pax5.

  19. Genome-scale metabolic model of Pichia pastoris with native and humanized glycosylation of recombinant proteins.

    Science.gov (United States)

    Irani, Zahra Azimzadeh; Kerkhoven, Eduard J; Shojaosadati, Seyed Abbas; Nielsen, Jens

    2016-05-01

    Pichia pastoris is used for commercial production of human therapeutic proteins, and genome-scale models of P. pastoris metabolism have been generated in the past to study the metabolism and associated protein production by this yeast. A major challenge with clinical usage of recombinant proteins produced by P. pastoris is the difference in N-glycosylation of proteins produced by humans and this yeast. However, through metabolic engineering, a P. pastoris strain capable of producing humanized N-glycosylated proteins was constructed. The current genome-scale models of P. pastoris do not address native nor humanized N-glycosylation, and we therefore developed ihGlycopastoris, an extension to the iLC915 model with both native and humanized N-glycosylation for recombinant protein production, but also an estimation of N-glycosylation of P. pastoris native proteins. This new model gives a better prediction of protein yield, demonstrates the effect of the different types of N-glycosylation of protein yield, and can be used to predict potential targets for strain improvement. The model represents a step towards a more complete description of protein production in P. pastoris, which is required for using these models to understand and optimize protein production processes. © 2015 Wiley Periodicals, Inc.

  20. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria

    Directory of Open Access Journals (Sweden)

    Keeling Patrick J

    2007-09-01

    Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements

  1. Roles of Werner syndrome protein in protection of genome integrity

    DEFF Research Database (Denmark)

    Rossi, Marie L; Ghosh, Avik K; Bohr, Vilhelm A

    2010-01-01

    Werner syndrome protein (WRN) is one of a family of five human RecQ helicases implicated in the maintenance of genome stability. The conserved RecQ family also includes RecQ1, Bloom syndrome protein (BLM), RecQ4, and RecQ5 in humans, as well as Sgs1 in Saccharomyces cerevisiae, Rqh1...... in Schizosaccharomyces pombe, and homologs in Caenorhabditis elegans, Xenopus laevis, and Drosophila melanogaster. Defects in three of the RecQ helicases, RecQ4, BLM, and WRN, cause human pathologies linked with cancer predisposition and premature aging. Mutations in the WRN gene are the causative factor of Werner...

  2. Mutational analysis of the genome-linked protein of cowpea mosaic virus

    NARCIS (Netherlands)

    Carette, J.E.; Kujawa, A.; Gühl, K.; Verver, J.; Wellink, J.; Kammen, van A.

    2001-01-01

    In this study we have performed a mutational analysis of the cowpea mosaic comovirus (CPMV) genome-linked protein VPg to discern the structural requirements necessary for proper functioning of VPg. Either changing the serine residue linking VPg to RNA at a tyrosine or a threonine or changing the

  3. GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes.

    Science.gov (United States)

    Yuan, Lina; Yu, Yang; Zhu, Yanmin; Li, Yulai; Li, Changqing; Li, Rujiao; Ma, Qin; Siu, Gilman Kit-Hang; Yu, Jun; Jiang, Taijiao; Xiao, Jingfa; Kang, Yu

    2017-01-25

    Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS.

  4. Comparative genomic analysis identified a mutation related to enhanced heterologous protein production in the filamentous fungus Aspergillus oryzae.

    Science.gov (United States)

    Jin, Feng-Jie; Katayama, Takuya; Maruyama, Jun-Ichi; Kitamoto, Katsuhiko

    2016-11-01

    Genomic mapping of mutations using next-generation sequencing technologies has facilitated the identification of genes contributing to fundamental biological processes, including human diseases. However, few studies have used this approach to identify mutations contributing to heterologous protein production in industrial strains of filamentous fungi, such as Aspergillus oryzae. In a screening of A. oryzae strains that hyper-produce human lysozyme (HLY), we previously isolated an AUT1 mutant that showed higher production of various heterologous proteins; however, the underlying factors contributing to the increased heterologous protein production remained unclear. Here, using a comparative genomic approach performed with whole-genome sequences, we attempted to identify the genes responsible for the high-level production of heterologous proteins in the AUT1 mutant. The comparative sequence analysis led to the detection of a gene (AO090120000003), designated autA, which was predicted to encode an unknown cytoplasmic protein containing an alpha/beta-hydrolase fold domain. Mutation or deletion of autA was associated with higher production levels of HLY. Specifically, the HLY yields of the autA mutant and deletion strains were twofold higher than that of the control strain during the early stages of cultivation. Taken together, these results indicate that combining classical mutagenesis approaches with comparative genomic analysis facilitates the identification of novel genes involved in heterologous protein production in filamentous fungi.

  5. Comparative analysis of prophages in Streptococcus mutans genomes

    Science.gov (United States)

    Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

    2017-01-01

    Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986

  6. The draft genome of a termite illuminates alternative social organization

    Science.gov (United States)

    Termites have substantial economic and ecological impact worldwide. They are also the oldest organisms living in complex societies, having evolved a caste system independent of that of eusocial Hymenoptera (ants, bees and wasps). Here we provide the first genome sequence for a termite, Zootermopsis ...

  7. Putative drug and vaccine target protein identification using comparative genomic analysis of KEGG annotated metabolic pathways of Mycoplasma hyopneumoniae.

    Science.gov (United States)

    Damte, Dereje; Suh, Joo-Won; Lee, Seung-Jin; Yohannes, Sileshi Belew; Hossain, Md Akil; Park, Seung-Chun

    2013-07-01

    In the present study, a computational comparative and subtractive genomic/proteomic analysis aimed at the identification of putative therapeutic target and vaccine candidate proteins from Kyoto Encyclopedia of Genes and Genomes (KEGG) annotated metabolic pathways of Mycoplasma hyopneumoniae was performed for drug design and vaccine production pipelines against M.hyopneumoniae. The employed comparative genomic and metabolic pathway analysis with a predefined computational systemic workflow extracted a total of 41 annotated metabolic pathways from KEGG among which five were unique to M. hyopneumoniae. A total of 234 proteins were identified to be involved in these metabolic pathways. Although 125 non homologous and predicted essential proteins were found from the total that could serve as potential drug targets and vaccine candidates, additional prioritizing parameters characterize 21 proteins as vaccine candidate while druggability of each of the identified proteins evaluated by the DrugBank database prioritized 42 proteins suitable for drug targets. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Comparative genomics evidence that only protein toxins are tagging bad bugs

    Directory of Open Access Journals (Sweden)

    Kalliopi eGeorgiades

    2011-10-01

    Full Text Available The term toxin was introduced by Roux and Yersin and describes macromolecular substances that, when produced during infection or when introduced parenterally or orally, cause an impairment of physiological functions that lead to disease or to the death of the infected organism. Long after the discovery of toxins, early genetic studies on bacterial virulence demonstrated that removing a certain number of genes from pathogenic bacteria decreases their capacity to infect hosts. Each of the removed factors was therefore referred to as a virulence factor, and it was speculated that non-pathogenic bacteria lack such supplementary factors. However, many recent comparative studies demonstrate that the specialization of bacteria to eukaryotic hosts is associated with massive gene loss. We recently demonstrated that the only features that seem to characterize 12 epidemic bacteria are toxin-antitoxin (TA modules, which are addiction molecules in host bacteria. In this study, we investigated if protein toxins are indeed the only molecules specific to pathogenic bacteria by comparing 14 epidemic bacterial killers (bad bugs with their 14 closest non-epidemic relatives (controls. We found protein toxins in significantly more elevated numbers in all of the bad bugs. For the first time, statistical principal components analysis, including genome size, GC%, TA modules, restriction enzymes and toxins, revealed that toxins are the only proteins other than TA modules that are correlated with the pathogenic character of bacteria. Moreover, intracellular toxins appear to be more correlated with the pathogenic character of bacteria than secreted toxins. In conclusion, we hypothesize that the only truly identifiable phenomena, witnessing the convergent evolution of the most pathogenic bacteria for humans are the loss of metabolic activities, i.e., the outcome of the loss of regulatory and transcription factors and the presence of protein toxins, alone or coupled as TA

  9. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Science.gov (United States)

    Greub, Gilbert; Kebbi-Beghdadi, Carole; Bertelli, Claire; Collyn, François; Riederer, Beat M; Yersin, Camille; Croxatto, Antony; Raoult, Didier

    2009-12-23

    With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  10. Expression and genomic organization of zonadhesin-like genes in three species of fish give insight into the evolutionary history of a mosaic protein

    Directory of Open Access Journals (Sweden)

    Davidson William S

    2005-11-01

    Full Text Available Abstract Background The mosaic sperm protein zonadhesin (ZAN has been characterized in mammals and is implicated in species-specific egg-sperm binding interactions. The genomic structure and testes-specific expression of zonadhesin is known for many mammalian species. All zonadhesin genes characterized to date consist of meprin A5 antigen receptor tyrosine phosphatase mu (MAM domains, mucin tandem repeats, and von Willebrand (VWD adhesion domains. Here we investigate the genomic structure and expression of zonadhesin-like genes in three species of fish. Results The cDNA and corresponding genomic locus of a zonadhesin-like gene (zlg in Atlantic salmon (Salmo salar were sequenced. Zlg is similar in adhesion domain content to mammalian zonadhesin; however, the domain order is altered. Analysis of puffer fish (Takifugu rubripes and zebrafish (Danio rerio sequence data identified zonadhesin (zan genes that share the same domain order, content, and a conserved syntenic relationship with mammalian zonadhesin. A zonadhesin-like gene in D. rerio was also identified. Unlike mammalian zonadhesin, D. rerio zan and S. salar zlg were expressed in the gut and not in the testes. Conclusion We characterized likely orthologs of zonadhesin in both T. rubripes and D. rerio and uncovered zonadhesin-like genes in S. salar and D. rerio. Each of these genes contains MAM, mucin, and VWD domains. While these domains are associated with several proteins that show prominent gut expression, their combination is unique to zonadhesin and zonadhesin-like genes in vertebrates. The expression patterns of fish zonadhesin and zonadhesin-like genes suggest that the reproductive role of zonadhesin evolved later in the mammalian lineage.

  11. LocateP: Genome-scale subcellular-location predictor for bacterial proteins

    Directory of Open Access Journals (Sweden)

    Zhou Miaomiao

    2008-03-01

    Full Text Available Abstract Background In the past decades, various protein subcellular-location (SCL predictors have been developed. Most of these predictors, like TMHMM 2.0, SignalP 3.0, PrediSi and Phobius, aim at the identification of one or a few SCLs, whereas others such as CELLO and Psortb.v.2.0 aim at a broader classification. Although these tools and pipelines can achieve a high precision in the accurate prediction of signal peptides and transmembrane helices, they have a much lower accuracy when other sequence characteristics are concerned. For instance, it proved notoriously difficult to identify the fate of proteins carrying a putative type I signal peptidase (SPIase cleavage site, as many of those proteins are retained in the cell membrane as N-terminally anchored membrane proteins. Moreover, most of the SCL classifiers are based on the classification of the Swiss-Prot database and consequently inherited the inconsistency of that SCL classification. As accurate and detailed SCL prediction on a genome scale is highly desired by experimental researchers, we decided to construct a new SCL prediction pipeline: LocateP. Results LocateP combines many of the existing high-precision SCL identifiers with our own newly developed identifiers for specific SCLs. The LocateP pipeline was designed such that it mimics protein targeting and secretion processes. It distinguishes 7 different SCLs within Gram-positive bacteria: intracellular, multi-transmembrane, N-terminally membrane anchored, C-terminally membrane anchored, lipid-anchored, LPxTG-type cell-wall anchored, and secreted/released proteins. Moreover, it distinguishes pathways for Sec- or Tat-dependent secretion and alternative secretion of bacteriocin-like proteins. The pipeline was tested on data sets extracted from literature, including experimental proteomics studies. The tests showed that LocateP performs as well as, or even slightly better than other SCL predictors for some locations and outperforms

  12. Biology, genome organization and evolution of parvoviruses in marine shrimp

    Science.gov (United States)

    A number of parvoviruses are now know to infect marine shrimp, and these viruses alone or in combination with other viruses have the potential to cause major losses in shrimp aquaculture globally. This review provides a comprehensive overview of the biology, genome organization, gene expression, and...

  13. Detecting Protein-Protein Interactions in the Intact Cell of Bacillus subtilis (ATCC 6633)

    OpenAIRE

    Winters, Michael S.; Day, R. A.

    2003-01-01

    The salt bridge, paired group-specific reagent cyanogen (ethanedinitrile; C2N2) converts naturally occurring pairs of functional groups into covalently linked products. Cyanogen readily permeates cell walls and membranes. When the paired groups are shared between associated proteins, isolation of the covalently linked proteins allows their identity to be assigned. Examination of organisms of known genome sequence permits identification of the linked proteins by mass spectrometric techniques a...

  14. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  15. Agricultural genomics and sustainable development: perspectives ...

    African Journals Online (AJOL)

    Administrator

    era is to establish how genes and proteins function to bring about changes in phenotype. Some of ... within the context of sustainable development of African economies. The greatest .... these strategies, the genomes of many organisms have now been ... gene structure and order, e.g. between rice, wheat, corn, millets and ...

  16. Genome-wide scans for delineation of candidate genes regulating seed-protein content in chickpea

    Directory of Open Access Journals (Sweden)

    Hari Deo eUpadhyaya

    2016-03-01

    Full Text Available Identification of potential genes/alleles governing complex seed-protein content (SPC trait is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study, high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150-200 kb LD (linkage disequilibrium decay] was utilized. This led to identification of seven most effective genomic loci (genes associated [10 to 20% with 41% combined PVE (phenotypic variation explained] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line mapping population (ICC 12299 x ICC 4958 by selective genotyping. The seed-specific expression, including differential up-regulation (> 4-fold of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with high level of contrasting seed-protein content (21-22% was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait

  17. Modeling structure of G protein-coupled receptors in huan genome

    KAUST Repository

    Zhang, Yang

    2016-01-26

    G protein-coupled receptors (or GPCRs) are integral transmembrane proteins responsible to various cellular signal transductions. Human GPCR proteins are encoded by 5% of human genes but account for the targets of 40% of the FDA approved drugs. Due to difficulties in crystallization, experimental structure determination remains extremely difficult for human GPCRs, which have been a major barrier in modern structure-based drug discovery. We proposed a new hybrid protocol, GPCR-I-TASSER, to construct GPCR structure models by integrating experimental mutagenesis data with ab initio transmembrane-helix assembly simulations, assisted by the predicted transmembrane-helix interaction networks. The method was tested in recent community-wide GPCRDock experiments and constructed models with a root mean square deviation 1.26 Å for Dopamine-3 and 2.08 Å for Chemokine-4 receptors in the transmembrane domain regions, which were significantly closer to the native than the best templates available in the PDB. GPCR-I-TASSER has been applied to model all 1,026 putative GPCRs in the human genome, where 923 are found to have correct folds based on the confidence score analysis and mutagenesis data comparison. The successfully modeled GPCRs contain many pharmaceutically important families that do not have previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. All the human GPCR models have been made publicly available through the GPCR-HGmod database at http://zhanglab.ccmb.med.umich.edu/GPCR-HGmod/ The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins which should bring useful impact on the effort of GPCR-targeted drug discovery.

  18. The copepod Tigriopus: A promising marine model organism for ecotoxicology and environmental genomics

    Energy Technology Data Exchange (ETDEWEB)

    Raisuddin, Sheikh [Department of Chemistry and the National Research Lab of Marine Molecular and Environmental Bioscience, College of Natural Sciences, Hanyang University, Seoul 133-791 (Korea, Republic of); Kwok, Kevin W.H. [Swire Institute of Marine Science, Department of Ecology and Biodiversity, University of Hong Kong, Pokfulam, Hong Kong (China); Leung, Kenneth M.Y. [Swire Institute of Marine Science, Department of Ecology and Biodiversity, University of Hong Kong, Pokfulam, Hong Kong (China); Schlenk, Daniel [Department of Environmental Sciences, University of California, Riverside, CA 92521 (United States); Lee, Jae-Seong [Department of Chemistry and the National Research Lab of Marine Molecular and Environmental Bioscience, College of Natural Sciences, Hanyang University, Seoul 133-791 (Korea, Republic of)]. E-mail: jslee2@hanyang.ac.kr

    2007-07-20

    There is an increasing body of evidence to support the significant role of invertebrates in assessing impacts of environmental contaminants on marine ecosystems. Therefore, in recent years massive efforts have been directed to identify viable and ecologically relevant invertebrate toxicity testing models. Tigriopus, a harpacticoid copepod has a number of promising characteristics which make it a candidate worth consideration in such efforts. Tigriopus and other copepods are widely distributed and ecologically important organisms. Their position in marine food chains is very prominent, especially with regard to the transfer of energy. Copepods also play an important role in the transportation of aquatic pollutants across the food chains. In recent years there has been a phenomenal increase in the knowledge base of Tigriopus spp., particularly in the areas of their ecology, geophylogeny, genomics and their behavioural, biochemical and molecular responses following exposure to environmental stressors and chemicals. Sequences of a number of important marker genes have been studied in various Tigriopus spp., notably T. californicus and T. japonicus. These genes belong to normal biophysiological functions (e.g. electron transport system enzymes) as well as stress and toxic chemical exposure responses (heat shock protein 20, glutathione reductase, glutathione S-transferase). Recently, 40,740 expressed sequenced tags (ESTs) from T. japonicus, have been sequenced and of them, 5673 ESTs showed significant hits (E-value, >1.0E-05) to the red flour beetle Tribolium genome database. Metals and organic pollutants such as antifouling agents, pesticides, polycyclic aromatic hydrocarbons (PAH) and polychrlorinated biphenyls (PCB) have shown reproducible biological responses when tested in Tigriopus spp. Promising results have been obtained when Tigriopus was used for assessment of risk associated with exposure to endocrine-disrupting chemicals (EDCs). Application of environmental

  19. The copepod Tigriopus: A promising marine model organism for ecotoxicology and environmental genomics

    International Nuclear Information System (INIS)

    Raisuddin, Sheikh; Kwok, Kevin W.H.; Leung, Kenneth M.Y.; Schlenk, Daniel; Lee, Jae-Seong

    2007-01-01

    There is an increasing body of evidence to support the significant role of invertebrates in assessing impacts of environmental contaminants on marine ecosystems. Therefore, in recent years massive efforts have been directed to identify viable and ecologically relevant invertebrate toxicity testing models. Tigriopus, a harpacticoid copepod has a number of promising characteristics which make it a candidate worth consideration in such efforts. Tigriopus and other copepods are widely distributed and ecologically important organisms. Their position in marine food chains is very prominent, especially with regard to the transfer of energy. Copepods also play an important role in the transportation of aquatic pollutants across the food chains. In recent years there has been a phenomenal increase in the knowledge base of Tigriopus spp., particularly in the areas of their ecology, geophylogeny, genomics and their behavioural, biochemical and molecular responses following exposure to environmental stressors and chemicals. Sequences of a number of important marker genes have been studied in various Tigriopus spp., notably T. californicus and T. japonicus. These genes belong to normal biophysiological functions (e.g. electron transport system enzymes) as well as stress and toxic chemical exposure responses (heat shock protein 20, glutathione reductase, glutathione S-transferase). Recently, 40,740 expressed sequenced tags (ESTs) from T. japonicus, have been sequenced and of them, 5673 ESTs showed significant hits (E-value, >1.0E-05) to the red flour beetle Tribolium genome database. Metals and organic pollutants such as antifouling agents, pesticides, polycyclic aromatic hydrocarbons (PAH) and polychrlorinated biphenyls (PCB) have shown reproducible biological responses when tested in Tigriopus spp. Promising results have been obtained when Tigriopus was used for assessment of risk associated with exposure to endocrine-disrupting chemicals (EDCs). Application of environmental

  20. ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome

    Directory of Open Access Journals (Sweden)

    Zhang Zhenhai

    2010-10-01

    Full Text Available Abstract Background Maize (Zea mays ssp. mays L. is an important model for plant basic and applied research. In 2009, the B73 maize genome sequencing made a great step forward, using clone by clone strategy; however, functional annotation and gene classification of the maize genome are still limited. Thus, a well-annotated datasets and informative database will be important for further research discoveries. Signal transduction is a fundamental biological process in living cells, and many protein families participate in this process in sensing, amplifying and responding to various extracellular or internal stimuli. Therefore, it is a good starting point to integrate information on the maize functional genes involved in signal transduction. Results Here we introduce a comprehensive database 'ProFITS' (Protein Families Involved in the Transduction of Signalling, which endeavours to identify and classify protein kinases/phosphatases, transcription factors and ubiquitin-proteasome-system related genes in the B73 maize genome. Users can explore gene models, corresponding transcripts and FLcDNAs using the three abovementioned protein hierarchical categories, and visualize them using an AJAX-based genome browser (JBrowse or Generic Genome Browser (GBrowse. Functional annotations such as GO annotation, protein signatures, protein best-hits in the Arabidopsis and rice genome are provided. In addition, pre-calculated transcription factor binding sites of each gene are generated and mutant information is incorporated into ProFITS. In short, ProFITS provides a user-friendly web interface for studies in signal transduction process in maize. Conclusion ProFITS, which utilizes both the B73 maize genome and full length cDNA (FLcDNA datasets, provides users a comprehensive platform of maize annotation with specific focus on the categorization of families involved in the signal transduction process. ProFITS is designed as a user-friendly web interface and it is

  1. Genome-based identification of spliceosomal proteins in the silk moth Bombyx mori.

    Science.gov (United States)

    Somarelli, Jason A; Mesa, Annia; Fuller, Myron E; Torres, Jacqueline O; Rodriguez, Carol E; Ferrer, Christina M; Herrera, Rene J

    2010-12-01

    Pre-messenger RNA splicing is a highly conserved eukaryotic cellular function that takes place by way of a large, RNA-protein assembly known as the spliceosome. In the mammalian system, nearly 300 proteins associate with uridine-rich small nuclear (sn)RNAs to form this complex. Some of these splicing factors are ubiquitously present in the spliceosome, whereas others are involved only in the processing of specific transcripts. Several proteomics analyses have delineated the proteins of the spliceosome in several species. In this study, we mine multiple sequence data sets of the silk moth Bombyx mori in an attempt to identify the entire set of known spliceosomal proteins. Five data sets were utilized, including the 3X, 6X, and Build 2.0 genomic contigs as well as the expressed sequence tag and protein libraries. While homologs for 88% of vertebrate splicing factors were delineated in the Bombyx mori genome, there appear to be several spliceosomal polypeptides absent in Bombyx mori and seven additional insect species. This apparent increase in spliceosomal complexity in vertebrates may reflect the tissue-specific and developmental stage-specific alternative pre-mRNA splicing requirements in vertebrates. Phylogenetic analyses of 15 eukaryotic taxa using the core splicing factors suggest that the essential functional units of the pre-mRNA processing machinery have remained highly conserved from yeast to humans. The Sm and LSm proteins are the most conserved, whereas proteins of the U1 small nuclear ribonucleoprotein particle are the most divergent. These data highlight both the differential conservation and relative phylogenetic signals of the essential spliceosomal components throughout evolution. © 2010 Wiley Periodicals, Inc.

  2. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  3. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Wasnick Michael

    2008-03-01

    Full Text Available Abstract Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any

  4. Origins of Programmable Nucleases for Genome Engineering.

    Science.gov (United States)

    Chandrasegaran, Srinivasan; Carroll, Dana

    2016-02-27

    Genome engineering with programmable nucleases depends on cellular responses to a targeted double-strand break (DSB). The first truly targetable reagents were the zinc finger nucleases (ZFNs) showing that arbitrary DNA sequences could be addressed for cleavage by protein engineering, ushering in the breakthrough in genome manipulation. ZFNs resulted from basic research on zinc finger proteins and the FokI restriction enzyme (which revealed a bipartite structure with a separable DNA-binding domain and a non-specific cleavage domain). Studies on the mechanism of cleavage by 3-finger ZFNs established that the preferred substrates were paired binding sites, which doubled the size of the target sequence recognition from 9 to 18bp, long enough to specify a unique genomic locus in plant and mammalian cells. Soon afterwards, a ZFN-induced DSB was shown to stimulate homologous recombination in cells. Transcription activator-like effector nucleases (TALENs) that are based on bacterial TALEs fused to the FokI cleavage domain expanded this capability. The fact that ZFNs and TALENs have been used for genome modification of more than 40 different organisms and cell types attests to the success of protein engineering. The most recent technology platform for delivering a targeted DSB to cellular genomes is that of the RNA-guided nucleases, which are based on the naturally occurring Type II prokaryotic CRISPR-Cas9 system. Unlike ZFNs and TALENs that use protein motifs for DNA sequence recognition, CRISPR-Cas9 depends on RNA-DNA recognition. The advantages of the CRISPR-Cas9 system-the ease of RNA design for new targets and the dependence on a single, constant Cas9 protein-have led to its wide adoption by research laboratories around the world. These technology platforms have equipped scientists with an unprecedented ability to modify cells and organisms almost at will, with wide-ranging implications across biology and medicine. However, these nucleases have also been shown to cut

  5. Structural organization of the genes for rat von Ebner's gland proteins 1 and 2 reveals their close relationship to lipocalins.

    Science.gov (United States)

    Kock, K; Ahlers, C; Schmale, H

    1994-05-01

    The rat von Ebner's gland protein 1 (VEGP 1) is a secretory protein, which is abundantly expressed in the small acinar von Ebner's salivary glands of the tongue. Based on the primary structure of this protein we have previously suggested that it is a member of the lipocalin superfamily of lipophilic-ligand carrier proteins. Although the physiological role of VEGP 1 is not clear, it might be involved in sensory or protective functions in the taste epithelium. Here, we report the purification of VEGP 1 and of a closely related secretory polypeptide, VEGP 2, the isolation of a cDNA clone encoding VEGP 2, and the isolation and structural characterization of the genes for both proteins. Protein purification by gel-filtration and anion-exchange chromatography using Mono Q revealed the presence of two different immunoreactive VEGP species. N-terminal sequence determination of peptide fragments isolated after protease Asp-N digestion allowed the identification of a new VEGP, named VEGP 2, in addition to the previously characterized VEGP 1. The complete VEGP 2 sequence was deduced from a cDNA clone isolated from a von Ebner's gland cDNA library. The VEGP 2 cDNA encodes a protein of 177 amino acids and is 94% identical to VEGP 1. DNA sequence analysis of the rat VEGP 1 and 2 genes isolated from rat genomic libraries revealed that both span about 4.5 kb and contain seven exons. The VEGP 1 and 2 genes are non-allelic distinct genes in the rat genome and probably arose by gene duplication. The high degree of nucleotide sequence identity in introns A-C (94-100%) points to a recent gene conversion event that included the 5' part of the genes. The genomic organization of the rat VEGP genes closely resembles that found in other lipocalins such as beta-lactoglobulin, mouse urinary proteins (MUPs) and prostaglandin D synthase, and therefore provides clear evidence that VEGPs belong to this superfamily of proteins.

  6. Construction of a mutagenesis cartridge for poliovirus genome-linked viral protein: isolation and characterization of viable and nonviable mutants

    International Nuclear Information System (INIS)

    Kuhn, R.J.; Tada, H.; Ypma-Wong, M.F.; Dunn, J.J.; Semler, B.L.; Wimmer, E.

    1988-01-01

    By following a strategy of genetic analysis of poliovirus, the authors have constructed a synthetic mutagenesis cartridge spanning the genome-linked viral protein coding region and flanking cleavage sites in an infectious cDNA clone of the type I (Mahoney) genome. The insertion of new restriction sites within the infectious clone has allowed them to replace the wild-type sequences with short complementary pairs of synthetic oligonucleotides containing various mutations. A set of mutations have been made that create methionine codons within the genome-linked viral protein region. The resulting viruses have growth characteristics similar to wild type. Experiments that led to an alteration of the tyrosine residue responsible for the linkage to RNA have resulted in nonviable virus. In one mutant, proteolytic processing assayed in vitro appeared unimpaired by the mutation. They suggest that the position of the tyrosine residue is important for genome-linked viral protein function(s)

  7. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    Science.gov (United States)

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  8. PLAZA 3.0: an access point for plant comparative genomics.

    Science.gov (United States)

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Translation elicits a growth rate-dependent, genome-wide, differential protein production in Bacillus subtilis.

    Science.gov (United States)

    Borkowski, Olivier; Goelzer, Anne; Schaffer, Marc; Calabre, Magali; Mäder, Ulrike; Aymerich, Stéphane; Jules, Matthieu; Fromion, Vincent

    2016-05-17

    Complex regulatory programs control cell adaptation to environmental changes by setting condition-specific proteomes. In balanced growth, bacterial protein abundances depend on the dilution rate, transcript abundances and transcript-specific translation efficiencies. We revisited the current theory claiming the invariance of bacterial translation efficiency. By integrating genome-wide transcriptome datasets and datasets from a library of synthetic gfp-reporter fusions, we demonstrated that translation efficiencies in Bacillus subtilis decreased up to fourfold from slow to fast growth. The translation initiation regions elicited a growth rate-dependent, differential production of proteins without regulators, hence revealing a unique, hard-coded, growth rate-dependent mode of regulation. We combined model-based data analyses of transcript and protein abundances genome-wide and revealed that this global regulation is extensively used in B. subtilis We eventually developed a knowledge-based, three-step translation initiation model, experimentally challenged the model predictions and proposed that a growth rate-dependent drop in free ribosome abundance accounted for the differential protein production. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.

  10. The genome of the social amoeba Dictyostelium discoideum

    DEFF Research Database (Denmark)

    Eichinger, L; Pachebat, J A; Glöckner, G

    2005-01-01

    The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins,...

  11. LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes.

    Directory of Open Access Journals (Sweden)

    Liron Levin

    Full Text Available RNA-seq is becoming a preferred tool for genomics studies of model and non-model organisms. However, DNA-based analysis of organisms lacking sequenced genomes cannot rely on RNA-seq data alone to isolate most genes of interest, as DNA codes both exons and introns. With this in mind, we designed a novel tool, LEMONS, that exploits the evolutionary conservation of both exon/intron boundary positions and splice junction recognition signals to produce high throughput splice-junction predictions in the absence of a reference genome. When tested on multiple annotated vertebrate mRNA data, LEMONS accurately identified 87% (average of the splice-junctions. LEMONS was then applied to our updated Mediterranean chameleon transcriptome, which lacks a reference genome, and predicted a total of 90,820 exon-exon junctions. We experimentally verified these splice-junction predictions by amplifying and sequencing twenty randomly selected genes from chameleon DNA templates. Exons and introns were detected in 19 of 20 of the positions predicted by LEMONS. To the best of our knowledge, LEMONS is currently the only experimentally verified tool that can accurately predict splice-junctions in organisms that lack a reference genome.

  12. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Directory of Open Access Journals (Sweden)

    Gilbert Greub

    Full Text Available BACKGROUND: With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. METHODS/PRINCIPAL FINDINGS: We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. CONCLUSIONS/SIGNIFICANCE: This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  13. The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists.

    OpenAIRE

    Sven Heinicke; Michael S Livstone; Charles Lu; Rose Oughtred; Fan Kang; Samuel V Angiuoli; Owen White; David Botstein; Kara Dolinski

    2007-01-01

    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic r...

  14. Complete genome sequence of Coraliomargarita akajimensis type strain (04OKA010-24T)

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, Konstantinos; Abt, Birte; Brambilla, Evelyne; Lapidus, Alla; Copeland, Alex; Desphande, Shweta; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C.; Woyke, Tanja; Goodwin, Lynne; Pitluck, Sam; Held, Brittany; Brettin, Thomas; Tapia, Roxanne; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Liolios, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; G& #246; ker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-06-25

    Coraliomargarita akajimensis Yoon et al. 2007 the type species of the genus Coraliomargarita. C. akajimensis is an obligately aerobic, Gram-negative, non-spore-forming, non-motile, spherical bacterium which was isolated from seawater surrounding the hard coral Galaxea fascicularis. C. akajimensis organism is of special interest because of its phylogenetic position in a genomically purely studied area in the bacterial diversity. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Puniceicoccaceae. The 3,750,771 bp long genome with its 3,137 protein-coding and 55 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Complete genome sequence of the myxobacterium Sorangium cellulosum

    DEFF Research Database (Denmark)

    Schneiker, S; Perlova, O; Kaiser, O

    2007-01-01

    The genus Sorangium synthesizes approximately half of the secondary metabolites isolated from myxobacteria, including the anti-cancer metabolite epothilone. We report the complete genome sequence of the model Sorangium strain S. cellulosum Soce56, which produces several natural products and has...... morphological and physiological properties typical of the genus. The circular genome, comprising 13,033,779 base pairs, is the largest bacterial genome sequenced to date. No global synteny with the genome of Myxococcus xanthus is apparent, revealing an unanticipated level of divergence between...... these myxobacteria. A large percentage of the genome is devoted to regulation, particularly post-translational phosphorylation, which probably supports the strain's complex, social lifestyle. This regulatory network includes the highest number of eukaryotic protein kinase-like kinases discovered in any organism...

  16. Structural genomics of infectious disease drug targets: the SSGCID

    International Nuclear Information System (INIS)

    Stacy, Robin; Begley, Darren W.; Phan, Isabelle; Staker, Bart L.; Van Voorhis, Wesley C.; Varani, Gabriele; Buchko, Garry W.; Stewart, Lance J.; Myler, Peter J.

    2011-01-01

    An introduction and overview of the focus, goals and overall mission of the Seattle Structural Genomics Center for Infectious Disease (SSGCID) is given. The Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium of researchers at Seattle BioMed, Emerald BioStructures, the University of Washington and Pacific Northwest National Laboratory that was established to apply structural genomics approaches to drug targets from infectious disease organisms. The SSGCID is currently funded over a five-year period by the National Institute of Allergy and Infectious Diseases (NIAID) to determine the three-dimensional structures of 400 proteins from a variety of Category A, B and C pathogens. Target selection engages the infectious disease research and drug-therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. The protein-expression systems, purified proteins, ligand screens and three-dimensional structures produced by SSGCID constitute a valuable resource for drug-discovery research, all of which is made freely available to the greater scientific community. This issue of Acta Crystallographica Section F, entirely devoted to the work of the SSGCID, covers the details of the high-throughput pipeline and presents a series of structures from a broad array of pathogenic organisms. Here, a background is provided on the structural genomics of infectious disease, the essential components of the SSGCID pipeline are discussed and a survey of progress to date is presented

  17. Complete genome sequence of Haliangium ochraceum type strain (SMP-2T)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Daum, Chris [U.S. Department of Energy, Joint Genome Institute; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kopitz, marcus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Haliangium ochraceum Fudou et al. 2002 is the type species of the genus Haliangium in the myxococcal family Haliangiaceae . Members of the genus Haliangium are the first halophilic myxobacterial taxa described. The cells of the species follow a multicellular lifestyle in highly organized biofilms, called swarms, they decompose bacterial and yeast cells as most myxobacteria do. The fruiting bodies contain particularly small coccoid myxospores. H. ochraceum encodes the first actin homologue identified in a bacterial genome. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the myxococcal suborder Nannocystineae, and the 9,446,314 bp long single replicon genome with its 6,898 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Avian reovirus L2 genome segment sequences and predicted structure/function of the encoded RNA-dependent RNA polymerase protein

    Directory of Open Access Journals (Sweden)

    Xu Wanhong

    2008-12-01

    Full Text Available Abstract Background The orthoreoviruses are infectious agents that possess a genome comprised of 10 double-stranded RNA segments encased in two concentric protein capsids. Like virtually all RNA viruses, an RNA-dependent RNA polymerase (RdRp enzyme is required for viral propagation. RdRp sequences have been determined for the prototype mammalian orthoreoviruses and for several other closely-related reoviruses, including aquareoviruses, but have not yet been reported for any avian orthoreoviruses. Results We determined the L2 genome segment nucleotide sequences, which encode the RdRp proteins, of two different avian reoviruses, strains ARV138 and ARV176 in order to define conserved and variable regions within reovirus RdRp proteins and to better delineate structure/function of this important enzyme. The ARV138 L2 genome segment was 3829 base pairs long, whereas the ARV176 L2 segment was 3830 nucleotides long. Both segments were predicted to encode λB RdRp proteins 1259 amino acids in length. Alignments of these newly-determined ARV genome segments, and their corresponding proteins, were performed with all currently available homologous mammalian reovirus (MRV and aquareovirus (AqRV genome segment and protein sequences. There was ~55% amino acid identity between ARV λB and MRV λ3 proteins, making the RdRp protein the most highly conserved of currently known orthoreovirus proteins, and there was ~28% identity between ARV λB and homologous MRV and AqRV RdRp proteins. Predictive structure/function mapping of identical and conserved residues within the known MRV λ3 atomic structure indicated most identical amino acids and conservative substitutions were located near and within predicted catalytic domains and lining RdRp channels, whereas non-identical amino acids were generally located on the molecule's surfaces. Conclusion The ARV λB and MRV λ3 proteins showed the highest ARV:MRV identity values (~55% amongst all currently known ARV and MRV

  19. Use of Modern Chemical Protein Synthesis and Advanced Fluorescent Assay Techniques to Experimentally Validate the Functional Annotation of Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kent, Stephen [University of Chicago

    2012-07-20

    The objective of this research program was to prototype methods for the chemical synthesis of predicted protein molecules in annotated microbial genomes. High throughput chemical methods were to be used to make large numbers of predicted proteins and protein domains, based on microbial genome sequences. Microscale chemical synthesis methods for the parallel preparation of peptide-thioester building blocks were developed; these peptide segments are used for the parallel chemical synthesis of proteins and protein domains. Ultimately, it is envisaged that these synthetic molecules would be ‘printed’ in spatially addressable arrays. The unique ability of total synthesis to precision label protein molecules with dyes and with chemical or biochemical ‘tags’ can be used to facilitate novel assay technologies adapted from state-of-the art single molecule fluorescence detection techniques. In the future, in conjunction with modern laboratory automation this integrated set of techniques will enable high throughput experimental validation of the functional annotation of microbial genomes.

  20. Integration of Structural Dynamics and Molecular Evolution via Protein Interaction Networks: A New Era in Genomic Medicine

    Science.gov (United States)

    Kumar, Avishek; Butler, Brandon M.; Kumar, Sudhir; Ozkan, S. Banu

    2016-01-01

    Summary Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. PMID:26684487

  1. 2004 Structural, Function and Evolutionary Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  2. Portal protein functions akin to a DNA-sensor that couples genome-packaging to icosahedral capsid maturation

    OpenAIRE

    Lokareddy, Ravi K.; Sankhala, Rajeshwer S.; Roy, Ankoor; Afonine, Pavel V.; Motwani, Tina; Teschke, Carolyn M.; Parent, Kristin N.; Cingolani, Gino

    2017-01-01

    Tailed bacteriophages and herpesviruses assemble infectious particles via an empty precursor capsid (or ?procapsid') built by multiple copies of coat and scaffolding protein and by one dodecameric portal protein. Genome packaging triggers rearrangement of the coat protein and release of scaffolding protein, resulting in dramatic procapsid lattice expansion. Here, we provide structural evidence that the portal protein of the bacteriophage P22 exists in two distinct dodecameric conformations: a...

  3. STRING 8--a global view on proteins and their functional interactions in 630 organisms

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Kuhn, Michael; Stark, Manuel

    2008-01-01

    Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein-protein inter......Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein......-protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set...... of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures...

  4. Complete mitochondrial genome of the blacknose shark Carcharhinus acronotus (Elasmobranchii: Carcharhinidae).

    Science.gov (United States)

    Yang, Lei; Matthes-Rosana, Kerri A; Naylor, Gavin J P

    2016-01-01

    The complete mitochondrial genome of the blacknose shark Carcharhinus acronotus has been determined in this work. It has a length of 16,719 bp and consisted of 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region. The gene composition and genome organization was similar to other vertebrates. This study represents part of an ongoing effort to obtain mitochondrial genome sequences for chondrichthyan species in order to better estimate their phylogenetic relationships.

  5. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  6. Genome-wide screens for expressed hypothetical proteins

    DEFF Research Database (Denmark)

    Madsen, Claus Desler; Durhuus, Jon Ambæk; Rasmussen, Lene Juel

    2012-01-01

    A hypothetical protein (HP) is defined as a protein that is predicted to be expressed from an open reading frame, but for which there is no experimental evidence of translation. HPs constitute a substantial fraction of proteomes of human as well as of other organisms. With the general belief that...... that the majority of HPs are the product of pseudogenes, it is essential to have a tool with the ability of pinpointing the minority of HPs with a high probability of being expressed....

  7. The Ever-Evolving Concept of the Gene: The Use of RNA/Protein Experimental Techniques to Understand Genome Functions

    Directory of Open Access Journals (Sweden)

    Andrea Cipriano

    2018-03-01

    Full Text Available The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs, which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years.

  8. Genome-wide characterization, evolution, and expression analysis of the leucine-rich repeat receptor-like protein kinase (LRR-RLK) gene family in Rosaceae genomes.

    Science.gov (United States)

    Sun, Jiangmei; Li, Leiting; Wang, Peng; Zhang, Shaoling; Wu, Juyou

    2017-10-10

    Leucine-rich repeat receptor-like protein kinase (LRR-RLK) is the largest gene family of receptor-like protein kinases (RLKs) and actively participates in regulating the growth, development, signal transduction, immunity, and stress responses of plants. However, the patterns of LRR-RLK gene family evolution in the five main Rosaceae species for which genome sequences are available have not yet been reported. In this study, we performed a comprehensive analysis of LRR-RLK genes for five Rosaceae species: Fragaria vesca (strawberry), Malus domestica (apple), Pyrus bretschneideri (Chinese white pear), Prunus mume (mei), and Prunus persica (peach), which contained 201, 244, 427, 267, and 258 LRR-RLK genes, respectively. All LRR-RLK genes were further grouped into 23 subfamilies based on the hidden Markov models approach. RLK-Pelle_LRR-XII-1, RLK-Pelle_LRR-XI-1, and RLK-Pelle_LRR-III were the three largest subfamilies. Synteny analysis indicated that there were 236 tandem duplicated genes in the five Rosaceae species, among which subfamilies XII-1 (82 genes) and XI-1 (80 genes) comprised 68.6%. Our results indicate that tandem duplication made a large contribution to the expansion of the subfamilies. The gene expression, tissue-specific expression, and subcellular localization data revealed that LRR-RLK genes were differentially expressed in various organs and tissues, and the largest subfamily XI-1 was highly expressed in all five Rosaceae species, suggesting that LRR-RLKs play important roles in each stage of plant growth and development. Taken together, our results provide an overview of the LRR-RLK family in Rosaceae genomes and the basis for further functional studies.

  9. Expressed Peptide Tags: An additional layer of data for genome annotation

    Energy Technology Data Exchange (ETDEWEB)

    Savidor, Alon [ORNL; Donahoo, Ryan S [ORNL; Hurtado-Gonzales, Oscar [University of Tennessee, Knoxville (UTK); Verberkmoes, Nathan C [ORNL; Shah, Manesh B [ORNL; Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL

    2006-01-01

    While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller sub-databases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While ~77% of Phytophthora EPTs supported the current annotation, a portion of them (7.2% and 12.6% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.

  10. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    Energy Technology Data Exchange (ETDEWEB)

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  11. Unique genome organization of non-mammalian papillomaviruses provides insights into the evolution of viral early proteins.

    Science.gov (United States)

    Van Doorslaer, Koenraad; Ruoppolo, Valeria; Schmidt, Annie; Lescroël, Amelie; Jongsomjit, Dennis; Elrod, Megan; Kraberger, Simona; Stainton, Daisy; Dugger, Katie M; Ballard, Grant; Ainley, David G; Varsani, Arvind

    2017-07-01

    The family Papillomaviridae contains more than 320 papillomavirus types, with most having been identified as infecting skin and mucosal epithelium in mammalian hosts. To date, only nine non-mammalian papillomaviruses have been described from birds ( n  = 5), a fish ( n  = 1), a snake ( n  = 1), and turtles ( n  = 2). The identification of papillomaviruses in sauropsids and a sparid fish suggests that early ancestors of papillomaviruses were already infecting the earliest Euteleostomi. The Euteleostomi clade includes more than 90 per cent of the living vertebrate species, and progeny virus could have been passed on to all members of this clade, inhabiting virtually every habitat on the planet. As part of this study, we isolated a novel papillomavirus from a 16-year-old female Adélie penguin ( Pygoscelis adeliae ) from Cape Crozier, Ross Island (Antarctica). The new papillomavirus shares ∼64 per cent genome-wide identity to a previously described Adélie penguin papillomavirus. Phylogenetic analyses show that the non-mammalian viruses (expect the python, Morelia spilota , associated papillomavirus) cluster near the base of the papillomavirus evolutionary tree. A papillomavirus isolated from an avian host (Northern fulmar; Fulmarus glacialis ), like the two turtle papillomaviruses, lacks a putative E9 protein that is found in all other avian papillomaviruses. Furthermore, the Northern fulmar papillomavirus has an E7 more similar to the mammalian viruses than the other avian papillomaviruses. Typical E6 proteins of mammalian papillomaviruses have two Zinc finger motifs, whereas the sauropsid papillomaviruses only have one such motif. Furthermore, this motif is absent in the fish papillomavirus. Thus, it is highly likely that the most recent common ancestor of the mammalian and sauropsid papillomaviruses had a single motif E6. It appears that a motif duplication resulted in mammalian papillomaviruses having a double Zinc finger motif in E6. We

  12. Unique genome organization of non-mammalian papillomaviruses provides insights into the evolution of viral early proteins

    Science.gov (United States)

    Van Doorslaer, Koenraad; Ruoppolo, Valeria; Schmidt, Annie; Lescroël, Amelie; Jongsomjit, Dennis; Elrod, Megan; Kraberger, Simona; Stainton, Daisy; Dugger, Katie M.; Ballard, Grant; Ainley, David G.; Varsani, Arvind

    2017-01-01

    The family Papillomaviridae contains more than 320 papillomavirus types, with most having been identified as infecting skin and mucosal epithelium in mammalian hosts. To date, only nine non-mammalian papillomaviruses have been described from birds (n = 5), a fish (n = 1), a snake (n = 1), and turtles (n = 2). The identification of papillomaviruses in sauropsids and a sparid fish suggests that early ancestors of papillomaviruses were already infecting the earliest Euteleostomi. The Euteleostomi clade includes more than 90 per cent of the living vertebrate species, and progeny virus could have been passed on to all members of this clade, inhabiting virtually every habitat on the planet. As part of this study, we isolated a novel papillomavirus from a 16-year-old female Adélie penguin (Pygoscelis adeliae) from Cape Crozier, Ross Island (Antarctica). The new papillomavirus shares ∼64 per cent genome-wide identity to a previously described Adélie penguin papillomavirus. Phylogenetic analyses show that the non-mammalian viruses (expect the python, Morelia spilota, associated papillomavirus) cluster near the base of the papillomavirus evolutionary tree. A papillomavirus isolated from an avian host (Northern fulmar; Fulmarus glacialis), like the two turtle papillomaviruses, lacks a putative E9 protein that is found in all other avian papillomaviruses. Furthermore, the Northern fulmar papillomavirus has an E7 more similar to the mammalian viruses than the other avian papillomaviruses. Typical E6 proteins of mammalian papillomaviruses have two Zinc finger motifs, whereas the sauropsid papillomaviruses only have one such motif. Furthermore, this motif is absent in the fish papillomavirus. Thus, it is highly likely that the most recent common ancestor of the mammalian and sauropsid papillomaviruses had a single motif E6. It appears that a motif duplication resulted in mammalian papillomaviruses having a double Zinc finger motif in E6. We estimated the

  13. The number of genes encoding repeat domain-containing proteins positively correlates with genome size in amoebal giant viruses

    Science.gov (United States)

    Shukla, Avi; Chatterjee, Anirvan

    2018-01-01

    Abstract Curiously, in viruses, the virion volume appears to be predominantly driven by genome length rather than the number of proteins it encodes or geometric constraints. With their large genome and giant particle size, amoebal viruses (AVs) are ideally suited to study the relationship between genome and virion size and explore the role of genome plasticity in their evolutionary success. Different genomic regions of AVs exhibit distinct genealogies. Although the vertically transferred core genes and their functions are universally conserved across the nucleocytoplasmic large DNA virus (NCLDV) families and are essential for their replication, the horizontally acquired genes are variable across families and are lineage-specific. When compared with other giant virus families, we observed a near–linear increase in the number of genes encoding repeat domain-containing proteins (RDCPs) with the increase in the genome size of AVs. From what is known about the functions of RDCPs in bacteria and eukaryotes and their prevalence in the AV genomes, we envisage important roles for RDCPs in the life cycle of AVs, their genome expansion, and plasticity. This observation also supports the evolution of AVs from a smaller viral ancestor by the acquisition of diverse gene families from the environment including RDCPs that might have helped in host adaption. PMID:29308275

  14. Structural genomics: keeping up with expanding knowledge of the protein universe

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2010-01-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space — a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a reassessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006. PMID:17587562

  15. Structural genomics: keeping up with expanding knowledge of the protein universe.

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2007-06-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.

  16. Improvisation in evolution of genes and genomes: whose structure is it anyway?

    Science.gov (United States)

    Shakhnovich, Boris E; Shakhnovich, Eugene I

    2008-06-01

    Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.

  17. The genome editing revolution: A CRISPR-Cas TALE off-target story.

    Science.gov (United States)

    Stella, Stefano; Montoya, Guillermo

    2016-07-01

    In the last 10 years, we have witnessed a blooming of targeted genome editing systems and applications. The area was revolutionized by the discovery and characterization of the transcription activator-like effector proteins, which are easier to engineer to target new DNA sequences than the previously available DNA binding templates, zinc fingers and meganucleases. Recently, the area experimented a quantum leap because of the introduction of the clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein (Cas) system (clustered regularly interspaced short palindromic sequence). This ribonucleoprotein complex protects bacteria from invading DNAs, and it was adapted to be used in genome editing. The CRISPR ribonucleic acid (RNA) molecule guides to the specific DNA site the Cas9 nuclease to cleave the DNA target. Two years and more than 1000 publications later, the CRISPR-Cas system has become the main tool for genome editing in many laboratories. Currently the targeted genome editing technology has been used in many fields and may be a possible approach for human gene therapy. Furthermore, it can also be used to modifying the genomes of model organisms for studying human pathways or to improve key organisms for biotechnological applications, such as plants, livestock genome as well as yeasts and bacterial strains. © 2016 The Authors. BioEssays published by WILEY Periodicals, Inc.

  18. A systematic genome-wide analysis of zebrafish protein-coding gene function

    NARCIS (Netherlands)

    Kettleborough, R.N.; Busch-Nentwich, E.M.; Harvey, S.A.; Dooley, C.M.; de Bruijn, E.; van Eeden, F.; Sealy, I.; White, R.J.; Herd, C.; Nijman, I.J.; Fenyes, F.; Mehroke, S.; Scahill, C.; Gibbons, R.; Wali, N.; Carruthers, S.; Hall, A.; Yen, J.; Cuppen, E.; Stemple, D.L.

    2013-01-01

    Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms,

  19. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine.

    Science.gov (United States)

    Kumar, Avishek; Butler, Brandon M; Kumar, Sudhir; Ozkan, S Banu

    2015-12-01

    Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Analysis of phage Mu DNA transposition by whole-genome Escherichia coli tiling arrays reveals a complex relationship to distribution of target selection protein B, transcription and chromosome architectural elements.

    Science.gov (United States)

    Ge, Jun; Lou, Zheng; Cui, Hong; Shang, Lei; Harshey, Rasika M

    2011-09-01

    Of all known transposable elements, phage Mu exhibits the highest transposition efficiency and the lowest target specificity. In vitro, MuB protein is responsible for target choice. In this work, we provide a comprehensive assessment of the genome-wide distribution of MuB and its relationship to Mu target selection using high-resolution Escherichia coli tiling DNA arrays. We have also assessed how MuB binding and Mu transposition are influenced by chromosome-organizing elements such as AT-rich DNA signatures, or the binding of the nucleoid-associated protein Fis, or processes such as transcription. The results confirm and extend previous biochemical and lower resolution in vivo data. Despite the generally random nature of Mu transposition and MuB binding, there were hot and cold insertion sites and MuB binding sites in the genome, and differences between the hottest and coldest sites were large. The new data also suggest that MuB distribution and subsequent Mu integration is responsive to DNA sequences that contribute to the structural organization of the chromosome.

  1. Functional genomics for food microbiology: Molecular mechanisms of weak organic acid preservative adaptation in yeast

    NARCIS (Netherlands)

    Brul, S.; Kallemeijn, W.; Smits, G.

    2008-01-01

    The recent era of genomics has offered tremendous possibilities to biology. This concise review describes the possibilities of applying (functional) genomics studies to the field of microbial food stability. In doing so, the studies on weak-organic-acid stress response in yeast are discussed by way

  2. A nutrient-driven tRNA modification alters translational fidelity and genome-wide protein coding across an animal genus.

    Science.gov (United States)

    Zaborske, John M; DuMont, Vanessa L Bauer; Wallace, Edward W J; Pan, Tao; Aquadro, Charles F; Drummond, D Allan

    2014-12-01

    Natural selection favors efficient expression of encoded proteins, but the causes, mechanisms, and fitness consequences of evolved coding changes remain an area of aggressive inquiry. We report a large-scale reversal in the relative translational accuracy of codons across 12 fly species in the Drosophila/Sophophora genus. Because the reversal involves pairs of codons that are read by the same genomically encoded tRNAs, we hypothesize, and show by direct measurement, that a tRNA anticodon modification from guanosine to queuosine has coevolved with these genomic changes. Queuosine modification is present in most organisms but its function remains unclear. Modification levels vary across developmental stages in D. melanogaster, and, consistent with a causal effect, genes maximally expressed at each stage display selection for codons that are most accurate given stage-specific queuosine modification levels. In a kinetic model, the known increased affinity of queuosine-modified tRNA for ribosomes increases the accuracy of cognate codons while reducing the accuracy of near-cognate codons. Levels of queuosine modification in D. melanogaster reflect bioavailability of the precursor queuine, which eukaryotes scavenge from the tRNAs of bacteria and absorb in the gut. These results reveal a strikingly direct mechanism by which recoding of entire genomes results from changes in utilization of a nutrient.

  3. A nutrient-driven tRNA modification alters translational fidelity and genome-wide protein coding across an animal genus.

    Directory of Open Access Journals (Sweden)

    John M Zaborske

    2014-12-01

    Full Text Available Natural selection favors efficient expression of encoded proteins, but the causes, mechanisms, and fitness consequences of evolved coding changes remain an area of aggressive inquiry. We report a large-scale reversal in the relative translational accuracy of codons across 12 fly species in the Drosophila/Sophophora genus. Because the reversal involves pairs of codons that are read by the same genomically encoded tRNAs, we hypothesize, and show by direct measurement, that a tRNA anticodon modification from guanosine to queuosine has coevolved with these genomic changes. Queuosine modification is present in most organisms but its function remains unclear. Modification levels vary across developmental stages in D. melanogaster, and, consistent with a causal effect, genes maximally expressed at each stage display selection for codons that are most accurate given stage-specific queuosine modification levels. In a kinetic model, the known increased affinity of queuosine-modified tRNA for ribosomes increases the accuracy of cognate codons while reducing the accuracy of near-cognate codons. Levels of queuosine modification in D. melanogaster reflect bioavailability of the precursor queuine, which eukaryotes scavenge from the tRNAs of bacteria and absorb in the gut. These results reveal a strikingly direct mechanism by which recoding of entire genomes results from changes in utilization of a nutrient.

  4. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R.; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-01-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  5. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi

    2016-12-24

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  6. The Burmese python genome reveals the molecular basis for extreme adaptation in snakes.

    Science.gov (United States)

    Castoe, Todd A; de Koning, A P Jason; Hall, Kathryn T; Card, Daren C; Schield, Drew R; Fujita, Matthew K; Ruggiero, Robert P; Degner, Jack F; Daza, Juan M; Gu, Wanjun; Reyes-Velasco, Jacobo; Shaney, Kyle J; Castoe, Jill M; Fox, Samuel E; Poole, Alex W; Polanco, Daniel; Dobry, Jason; Vandewege, Michael W; Li, Qing; Schott, Ryan K; Kapusta, Aurélie; Minx, Patrick; Feschotte, Cédric; Uetz, Peter; Ray, David A; Hoffmann, Federico G; Bogden, Robert; Smith, Eric N; Chang, Belinda S W; Vonk, Freek J; Casewell, Nicholas R; Henkel, Christiaan V; Richardson, Michael K; Mackessy, Stephen P; Bronikowski, Anne M; Bronikowsi, Anne M; Yandell, Mark; Warren, Wesley C; Secor, Stephen M; Pollock, David D

    2013-12-17

    Snakes possess many extreme morphological and physiological adaptations. Identification of the molecular basis of these traits can provide novel understanding for vertebrate biology and medicine. Here, we study snake biology using the genome sequence of the Burmese python (Python molurus bivittatus), a model of extreme physiological and metabolic adaptation. We compare the python and king cobra genomes along with genomic samples from other snakes and perform transcriptome analysis to gain insights into the extreme phenotypes of the python. We discovered rapid and massive transcriptional responses in multiple organ systems that occur on feeding and coordinate major changes in organ size and function. Intriguingly, the homologs of these genes in humans are associated with metabolism, development, and pathology. We also found that many snake metabolic genes have undergone positive selection, which together with the rapid evolution of mitochondrial proteins, provides evidence for extensive adaptive redesign of snake metabolic pathways. Additional evidence for molecular adaptation and gene family expansions and contractions is associated with major physiological and phenotypic adaptations in snakes; genes involved are related to cell cycle, development, lungs, eyes, heart, intestine, and skeletal structure, including GRB2-associated binding protein 1, SSH, WNT16, and bone morphogenetic protein 7. Finally, changes in repetitive DNA content, guanine-cytosine isochore structure, and nucleotide substitution rates indicate major shifts in the structure and evolution of snake genomes compared with other amniotes. Phenotypic and physiological novelty in snakes seems to be driven by system-wide coordination of protein adaptation, gene expression, and changes in the structure of the genome.

  7. In-depth comparative analysis of malaria parasite genomes reveals protein-coding genes linked to human disease in Plasmodium falciparum genome.

    Science.gov (United States)

    Liu, Xuewu; Wang, Yuanyuan; Liang, Jiao; Wang, Luojun; Qin, Na; Zhao, Ya; Zhao, Gang

    2018-05-02

    Plasmodium falciparum is the most virulent malaria parasite capable of parasitizing human erythrocytes. The identification of genes related to this capability can enhance our understanding of the molecular mechanisms underlying human malaria and lead to the development of new therapeutic strategies for malaria control. With the availability of several malaria parasite genome sequences, performing computational analysis is now a practical strategy to identify genes contributing to this disease. Here, we developed and used a virtual genome method to assign 33,314 genes from three human malaria parasites, namely, P. falciparum, P. knowlesi and P. vivax, and three rodent malaria parasites, namely, P. berghei, P. chabaudi and P. yoelii, to 4605 clusters. Each cluster consisted of genes whose protein sequences were significantly similar and was considered as a virtual gene. Comparing the enriched values of all clusters in human malaria parasites with those in rodent malaria parasites revealed 115 P. falciparum genes putatively responsible for parasitizing human erythrocytes. These genes are mainly located in the chromosome internal regions and participate in many biological processes, including membrane protein trafficking and thiamine biosynthesis. Meanwhile, 289 P. berghei genes were included in the rodent parasite-enriched clusters. Most are located in subtelomeric regions and encode erythrocyte surface proteins. Comparing cluster values in P. falciparum with those in P. vivax and P. knowlesi revealed 493 candidate genes linked to virulence. Some of them encode proteins present on the erythrocyte surface and participate in cytoadhesion, virulence factor trafficking, or erythrocyte invasion, but many genes with unknown function were also identified. Cerebral malaria is characterized by accumulation of infected erythrocytes at trophozoite stage in brain microvascular. To discover cerebral malaria-related genes, fast Fourier transformation (FFT) was introduced to extract

  8. VISTA - computational tools for comparative genomics

    Energy Technology Data Exchange (ETDEWEB)

    Frazer, Kelly A.; Pachter, Lior; Poliakov, Alexander; Rubin,Edward M.; Dubchak, Inna

    2004-01-01

    Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/VISTA/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, submit their own sequences of interest to several VISTA servers for various types of comparative analysis, and obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kilobase (kb) interval on human chromosome 5 that encodes for the kinesin family member3A (KIF3A) protein.

  9. Elucidation of Operon Structures across Closely Related Bacterial Genomes

    Science.gov (United States)

    Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components. PMID:24959722

  10. A mutation in the centriole-associated protein centrin causes genomic instability via increased chromosome loss in Chlamydomonas reinhardtii

    Directory of Open Access Journals (Sweden)

    Marshall Wallace F

    2005-05-01

    Full Text Available Abstract Background The role of centrioles in mitotic spindle function remains unclear. One approach to investigate mitotic centriole function is to ask whether mutation of centriole-associated proteins can cause genomic instability. Results We addressed the role of the centriole-associated EF-hand protein centrin in genomic stability using a Chlamydomonas reinhardtii centrin mutant that forms acentriolar bipolar spindles and lacks the centrin-based rhizoplast structures that join centrioles to the nucleus. Using a genetic assay for loss of heterozygosity, we found that this centrin mutant showed increased genomic instability compared to wild-type cells, and we determined that the increase in genomic instability was due to a 100-fold increase in chromosome loss rates compared to wild type. Live cell imaging reveals an increased rate in cell death during G1 in haploid cells that is consistent with an elevated rate of chromosome loss, and analysis of cell death versus centriole copy number argues against a role for multipolar spindles in this process. Conclusion The increased chromosome loss rates observed in a centrin mutant that forms acentriolar spindles suggests a role for centrin protein, and possibly centrioles, in mitotic fidelity.

  11. Pipeline to upgrade the genome annotations

    Directory of Open Access Journals (Sweden)

    Lijin K. Gopi

    2017-12-01

    Full Text Available Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS. Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus. It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms.

  12. Genome Compositional Organization in Gars Shows More Similarities to Mammals than to Other Ray-Finned Fish.

    Science.gov (United States)

    Symonová, Radka; Majtánová, Zuzana; Arias-Rodriguez, Lenin; Mořkovský, Libor; Kořínková, Tereza; Cavin, Lionel; Pokorná, Martina Johnson; Doležálková, Marie; Flajšhans, Martin; Normandeau, Eric; Ráb, Petr; Meyer, Axel; Bernatchez, Louis

    2017-11-01

    Genomic GC content can vary locally, and GC-rich regions are usually associated with increased DNA thermostability in thermophilic prokaryotes and warm-blooded eukaryotes. Among vertebrates, fish and amphibians appeared to possess a distinctly less heterogeneous AT/GC organization in their genomes, whereas cytogenetically detectable GC heterogeneity has so far only been documented in mammals and birds. The subject of our study is the gar, an ancient "living fossil" of a basal ray-finned fish lineage, known from the Cretaceous period. We carried out cytogenomic analysis in two gar genera (Atractosteus and Lepisosteus) uncovering a GC chromosomal pattern uncharacteristic for fish. Bioinformatic analysis of the spotted gar (Lepisosteus oculatus) confirmed a GC compartmentalization on GC profiles of linkage groups. This indicates a rather mammalian mode of compositional organization on gar chromosomes. Gars are thus the only analyzed extant ray-finned fishes with a GC compartmentalized genome. Since gars are cold-blooded anamniotes, our results contradict the generally accepted hypothesis that the phylogenomic onset of GC compartmentalization occurred near the origin of amniotes. Ecophysiological findings of other authors indicate a metabolic similarity of gars with mammals. We hypothesize that gars might have undergone convergent evolution with the tetrapod lineages leading to mammals on both metabolic and genomic levels. Their metabolic adaptations might have left footprints in their compositional genome evolution, as proposed by the metabolic rate hypothesis. The genome organization described here in gars sheds new light on the compositional genome evolution in vertebrates generally and contributes to better understanding of the complexities of the mechanisms involved in this process. © 2016 Wiley Periodicals, Inc.

  13. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  14. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  15. Complete genome sequence of Nakamurella multipartita type strain (Y-104).

    Science.gov (United States)

    Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Integration of multi-omics data of a genome-reduced bacterium: Prevalence of post-transcriptional regulation and its correlation with protein abundances

    Science.gov (United States)

    Chen, Wei-Hua; van Noort, Vera; Lluch-Senar, Maria; Hennrich, Marco L.; H. Wodke, Judith A.; Yus, Eva; Alibés, Andreu; Roma, Guglielmo; Mende, Daniel R.; Pesavento, Christina; Typas, Athanasios; Gavin, Anne-Claude; Serrano, Luis; Bork, Peer

    2016-01-01

    We developed a comprehensive resource for the genome-reduced bacterium Mycoplasma pneumoniae comprising 1748 consistently generated ‘-omics’ data sets, and used it to quantify the power of antisense non-coding RNAs (ncRNAs), lysine acetylation, and protein phosphorylation in predicting protein abundance (11%, 24% and 8%, respectively). These factors taken together are four times more predictive of the proteome abundance than of mRNA abundance. In bacteria, post-translational modifications (PTMs) and ncRNA transcription were both found to increase with decreasing genomic GC-content and genome size. Thus, the evolutionary forces constraining genome size and GC-content modify the relative contributions of the different regulatory layers to proteome homeostasis, and impact more genomic and genetic features than previously appreciated. Indeed, these scaling principles will enable us to develop more informed approaches when engineering minimal synthetic genomes. PMID:26773059

  17. Mycobacteriophage genome database.

    Science.gov (United States)

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  18. Sequencing and comparative genome analysis of two pathogenic Streptococcus gallolyticus subspecies: genome plasticity, adaptation and virulence.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I and S. pasteurianus ATCC 43144 (biotype II.2. The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92% and 1607 (86% of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops.

  19. Genomic analysis of the aconidial and high-performance protein producer, industrially relevant Aspergillus niger SH2 strain.

    Science.gov (United States)

    Yin, Chao; Wang, Bin; He, Pan; Lin, Ying; Pan, Li

    2014-05-15

    Aspergillus niger is usually regarded as a beneficial species widely used in biotechnological industry. Obtaining the genome sequence of the widely used aconidial A. niger SH2 strain is of great importance to understand its unusual production capability. In this study we assembled a high-quality genome sequence of A. niger SH2 with approximately 11,517 ORFs. Relatively high proportion of genes enriched for protein expression related FunCat items verify its efficient capacity in protein production. Furthermore, genome-wide comparative analysis between A. niger SH2 and CBS513.88 reveals insights into unique properties of A. niger SH2. A. niger SH2 lacks the gene related with the initiation of asexual sporulation (PrpA), leading to its distinct aconidial phenotype. Frame shift mutations and non-synonymous SNPs in genes of cell wall integrity signaling, β-1,3-glucan synthesis and chitin synthesis influence its cell wall development which is important for its hyphal fragmentation during industrial high-efficiency protein production. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Complete genome sequence of Hydrogenobacter thermophilus type strain (TK-6T)

    Energy Technology Data Exchange (ETDEWEB)

    Zeytun, Ahmet [Los Alamos National Laboratory (LANL); Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [Joint Genome Institute, Walnut Creek, California; Lapidus, Alla L. [Joint Genome Institute, Walnut Creek, California; Lucas, Susan [Joint Genome Institute, Walnut Creek, California; Han, James [Joint Genome Institute; Tice, Hope [Joint Genome Institute, Walnut Creek, California; Cheng, Jan-Fang [Joint Genome Institute, Walnut Creek, California; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [Joint Genome Institute, Walnut Creek, California; Liolios, Konstantinos [Joint Genome Institute, Walnut Creek, California; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [Joint Genome Institute, Walnut Creek, California; Palaniappan, Krishna [Joint Genome Institute, Walnut Creek, California; Ngatchou, Olivier Duplex [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [Joint Genome Institute, Walnut Creek, California; Ubler, Susanne [Universitat Regensburg, Regensburg, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Woyke, Tanja [Joint Genome Institute, Walnut Creek, California; Bristow, James [Joint Genome Institute, Walnut Creek, California; Eisen, Jonathan [Joint Genome Institute, Walnut Creek, California; Markowitz, Victor [Joint Genome Institute, Walnut Creek, California; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [Joint Genome Institute, Walnut Creek, California

    2011-01-01

    Hydrogenobacter thermophilus Kawasumi et al. 1984 is the type species of the genus Hydrogenobacter. H. thermophilus was the first obligate autotrophic organism reported among aerobic hydrogen-oxidizing bacteria. Strain TK-6T is of interest because of the unusually efficient hydrogen-oxidizing ability of this strain, which results in a faster generation time compared to other autotrophs. It is also able to grow anaerobically using nitrate as an electron acceptor when molecular hydrogen is used as the energy source, and able to aerobically fix CO2 via the reductive tricarboxylic acid cycle. This is the fifth completed genome sequence in the family Aquificaceae, and the second genome sequence determined from a strain derived from the original isolate. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,742,932 bp long genome with its 1,899 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Self-Organization of Genome Expression from Embryo to Terminal Cell Fate: Single-Cell Statistical Mechanics of Biological Regulation

    Directory of Open Access Journals (Sweden)

    Alessandro Giuliani

    2017-12-01

    Full Text Available A statistical mechanical mean-field approach to the temporal development of biological regulation provides a phenomenological, but basic description of the dynamical behavior of genome expression in terms of autonomous self-organization with a critical transition (Self-Organized Criticality: SOC. This approach reveals the basis of self-regulation/organization of genome expression, where the extreme complexity of living matter precludes any strict mechanistic approach. The self-organization in SOC involves two critical behaviors: scaling-divergent behavior (genome avalanche and sandpile-type critical behavior. Genome avalanche patterns—competition between order (scaling and disorder (divergence reflect the opposite sequence of events characterizing the self-organization process in embryo development and helper T17 terminal cell differentiation, respectively. On the other hand, the temporal development of sandpile-type criticality (the degree of SOC control in mouse embryo suggests the existence of an SOC control landscape with a critical transition state (i.e., the erasure of zygote-state criticality. This indicates that a phase transition of the mouse genome before and after reprogramming (immediately after the late 2-cell state occurs through a dynamical change in a control parameter. This result provides a quantitative open-thermodynamic appreciation of the still largely qualitative notion of the epigenetic landscape. Our results suggest: (i the existence of coherent waves of condensation/de-condensation in chromatin, which are transmitted across regions of different gene-expression levels along the genome; and (ii essentially the same critical dynamics we observed for cell-differentiation processes exist in overall RNA expression during embryo development, which is particularly relevant because it gives further proof of SOC control of overall expression as a universal feature.

  2. Genome-Wide Characterization of Heat-Shock Protein 70s from Chenopodium quinoa and Expression Analyses of Cqhsp70s in Response to Drought Stress.

    Science.gov (United States)

    Liu, Jianxia; Wang, Runmei; Liu, Wenying; Zhang, Hongli; Guo, Yaodong; Wen, Riyu

    2018-01-23

    Heat-shock proteins (HSPs) are ubiquitous proteins with important roles in response to biotic and abiotic stress. The 70-kDa heat-shock genes ( Hsp70s ) encode a group of conserved chaperone proteins that play central roles in cellular networks of molecular chaperones and folding catalysts across all the studied organisms including bacteria, plants and animals. Several Hsp70s involved in drought tolerance have been well characterized in various plants, whereas no research on Chenopodium quinoa HSPs has been completed. Here, we analyzed the genome of C. quinoa and identified sixteen Hsp70 members in quinoa genome. Phylogenetic analysis revealed the independent origination of those Hsp70 members, with eight paralogous pairs comprising the Hsp70 family in quinoa. While the gene structure and motif analysis showed high conservation of those paralogous pairs, the synteny analysis of those paralogous pairs provided evidence for expansion coming from the polyploidy event. With several subcellular localization signals detected in CqHSP70 protein paralogous pairs, some of the paralogous proteins lost the localization information, indicating the diversity of both subcellular localizations and potential functionalities of those HSP70s. Further gene expression analyses revealed by quantitative polymerase chain reaction (qPCR) analysis illustrated the significant variations of Cqhsp70s in response to drought stress. In conclusion, the sixteen Cqhsp70 s undergo lineage-specific expansions and might play important and varied roles in response to drought stress.

  3. Complete genome sequence of Oceanithermus profundus type strain (506T)

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Zhang, Xiaojing [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Hauser, Loren John [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Ruhl, Alina [U.S. Department of Energy, Joint Genome Institute; Mwirichia, Romano [University of Munster, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Land, Miriam L [ORNL

    2011-01-01

    Oceanithermus profundus Miroshnichenko et al. 2003 is the type species of the genus Oceanithermus, which belongs to the family Thermaceae. The genus currently comprises two species whose members are thermophilic and are able to reduce sulfur compounds and nitrite. The organism is adapted to the salinity of sea water, is able to utilize a broad range of carbohydrates, some proteinaceous substrates, organic acids and alcohols. This is the first completed genome sequence of a member of the genus Oceanithermus and the fourth sequence from the family Thermaceae. The 2,439,291 bp long genome with its 2,391 protein-coding and 54 RNA genes consists of one chromosome and a 135,351 bp long plasmid, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Gramene 2018: unifying comparative genomics and pathway resources for plant research

    OpenAIRE

    Tello-Ruiz, Marcela K; Naithani, Sushma; Stein, Joshua C; Gupta, Parul; Campbell, Michael; Olson, Andrew; Wei, Sharon; Preece, Justin; Geniza, Matthew J; Jiao, Yinping; Lee, Young Koung; Wang, Bo; Mulvaney, Joseph; Chougule, Kapeel; Elser, Justin

    2017-01-01

    Abstract Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversi...

  5. Protein Design Using Unnatural Amino Acids

    Science.gov (United States)

    Bilgiçer, Basar; Kumar, Krishna

    2003-11-01

    With the increasing availability of whole organism genome sequences, understanding protein structure and function is of capital importance. Recent developments in the methodology of incorporation of unnatural amino acids into proteins allow the exploration of proteins at a very detailed level. Furthermore, de novo design of novel protein structures and function is feasible with unprecedented sophistication. Using examples from the literature, this article describes the available methods for unnatural amino acid incorporation and highlights some recent applications including the design of hyperstable protein folds.

  6. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives

  7. Protein folding and the organization of the protein topology universe

    DEFF Research Database (Denmark)

    Lindorff-Larsen,, Kresten; Røgen, Peter; Paci, Emanuele

    2005-01-01

    residues and, in addition, that the topology of the transition state is closer to that of the native state than to that of any other fold in the protein universe. Here, we review the evidence for these conclusions and suggest a molecular mechanism that rationalizes these findings by presenting a view...... of protein folds that is based on the topological features of the polypeptide backbone, rather than the conventional view that depends on the arrangement of different types of secondary-structure elements. By linking the folding process to the organization of the protein structure universe, we propose...

  8. Eukaryotic evolutionary transitions are associated with extreme codon bias in functionally-related proteins.

    Directory of Open Access Journals (Sweden)

    Nicholas J Hudson

    Full Text Available Codon bias in the genome of an organism influences its phenome by changing the speed and efficiency of mRNA translation and hence protein abundance. We hypothesized that differences in codon bias, either between-species differences in orthologous genes, or within-species differences between genes, may play an evolutionary role. To explore this hypothesis, we compared the genome-wide codon bias in six species that occupy vital positions in the Eukaryotic Tree of Life. We acquired the entire protein coding sequences for these organisms, computed the codon bias for all genes in each organism and explored the output for relationships between codon bias and protein function, both within- and between-lineages. We discovered five notable coordinated patterns, with extreme codon bias most pronounced in traits considered highly characteristic of a given lineage. Firstly, the Homo sapiens genome had stronger codon bias for DNA-binding transcription factors than the Saccharomyces cerevisiae genome, whereas the opposite was true for ribosomal proteins--perhaps underscoring transcriptional regulation in the origin of complexity. Secondly, both mammalian species examined possessed extreme codon bias in genes relating to hair--a tissue unique to mammals. Thirdly, Arabidopsis thaliana showed extreme codon bias in genes implicated in cell wall formation and chloroplast function--which are unique to plants. Fourthly, Gallus gallus possessed strong codon bias in a subset of genes encoding mitochondrial proteins--perhaps reflecting the enhanced bioenergetic efficiency in birds that co-evolved with flight. And lastly, the G. gallus genome had extreme codon bias for the Ciliary Neurotrophic Factor--which may help to explain their spontaneous recovery from deafness. We propose that extreme codon bias in groups of genes that encode functionally related proteins has a pathway-level energetic explanation.

  9. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody

    2016-03-23

    Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. Methods To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. Results The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. Conclusions Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance

  10. The master two-dimensional gel database of human AMA cell proteins: towards linking protein and genome sequence and mapping information (update 1991)

    DEFF Research Database (Denmark)

    Celis, J E; Leffers, H; Rasmussen, H H

    1991-01-01

    autoantigens" and "cDNAs". For convenience we have included an alphabetical list of all known proteins recorded in this database. In the long run, the main goal of this database is to link protein and DNA sequencing and mapping information (Human Genome Program) and to provide an integrated picture......The master two-dimensional gel database of human AMA cells currently lists 3801 cellular and secreted proteins, of which 371 cellular polypeptides (306 IEF; 65 NEPHGE) were added to the master images during the last 10 months. These include: (i) very basic and acidic proteins that do not focus...

  11. Using context to improve protein domain identification

    Directory of Open Access Journals (Sweden)

    Llinás Manuel

    2011-03-01

    Full Text Available Abstract Background Identifying domains in protein sequences is an important step in protein structural and functional annotation. Existing domain recognition methods typically evaluate each domain prediction independently of the rest. However, the majority of proteins are multidomain, and pairwise domain co-occurrences are highly specific and non-transitive. Results Here, we demonstrate how to exploit domain co-occurrence to boost weak domain predictions that appear in previously observed combinations, while penalizing higher confidence domains if such combinations have never been observed. Our framework, Domain Prediction Using Context (dPUC, incorporates pairwise "context" scores between domains, along with traditional domain scores and thresholds, and improves domain prediction across a variety of organisms from bacteria to protozoa and metazoa. Among the genomes we tested, dPUC is most successful at improving predictions for the poorly-annotated malaria parasite Plasmodium falciparum, for which over 38% of the genome is currently unannotated. Our approach enables high-confidence annotations in this organism and the identification of orthologs to many core machinery proteins conserved in all eukaryotes, including those involved in ribosomal assembly and other RNA processing events, which surprisingly had not been previously known. Conclusions Overall, our results demonstrate that this new context-based approach will provide significant improvements in domain and function prediction, especially for poorly understood genomes for which the need for additional annotations is greatest. Source code for the algorithm is available under a GPL open source license at http://compbio.cs.princeton.edu/dpuc/. Pre-computed results for our test organisms and a web server are also available at that location.

  12. The genome of Paenibacillus sabinae T27 provides insight into evolution, organization and functional elucidation of nif and nif-like genes

    OpenAIRE

    Li, Xinxin; Deng, Zhiping; Liu, Zhanzhi; Yan, Yongliang; Wang, Tianshu; Xie, Jianbo; Lin, Min; Cheng, Qi; Chen, Sanfeng

    2014-01-01

    Background Most biological nitrogen fixation is catalyzed by the molybdenum nitrogenase. This enzyme is a complex which contains the MoFe protein encoded by nifDK and the Fe protein encoded by nifH. In addition to nifHDK, nifHDK-like genes were found in some Archaea and Firmicutes, but their function is unclear. Results We sequenced the genome of Paenibacillus sabinae T27. A total of 4,793 open reading frames were predicted from its 5.27 Mb genome. The genome of P. sabinae T27 contains fiftee...

  13. Complete genome sequence of Calditerrivibrio nitroreducens type strain (Yu37-1T)

    Energy Technology Data Exchange (ETDEWEB)

    Pitluck, Sam [Joint Genome Institute, Walnut Creek, California; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Zeytun, Ahmet [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [Joint Genome Institute, Walnut Creek, California; Nolan, Matt [Joint Genome Institute, Walnut Creek, California; Lucas, Susan [Joint Genome Institute, Walnut Creek, California; Hammon, Nancy [Joint Genome Institute, Walnut Creek, California; Deshpande, Shweta [Joint Genome Institute, Walnut Creek, California; Cheng, Jan-Fang [Joint Genome Institute, Walnut Creek, California; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Liolios, Konstantinos [Joint Genome Institute, Walnut Creek, California; Pagani, Ioanna [Joint Genome Institute, Walnut Creek, California; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [Joint Genome Institute, Walnut Creek, California; Palaniappan, Krishna [Joint Genome Institute, Walnut Creek, California; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [Joint Genome Institute, Walnut Creek, California; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Ngatchou, Olivier Duplex [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [Joint Genome Institute, Walnut Creek, California; Bristow, James [Joint Genome Institute, Walnut Creek, California; Eisen, Jonathan [Joint Genome Institute, Walnut Creek, California; Markowitz, Victor [Joint Genome Institute, Walnut Creek, California; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [Joint Genome Institute, Walnut Creek, California; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Land, Miriam L [ORNL

    2011-01-01

    Calditerrivibrio nitroreducens Iino et al. 2008 is the type species of the genus Calditerrivibrio. The species is of interest because of its important role in the nitrate cycle as nitrate reducer and for its isolated phylogenetic position in the Tree of Life. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the third complete genome sequence of a member of the family Deferribacteraceae. The 2,216,552 bp long genome with its 2,128 protein-coding and 50 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Self-organized critical model for protein folding

    Science.gov (United States)

    Moret, M. A.

    2011-09-01

    The major factor that drives a protein toward collapse and folding is the hydrophobic effect. At the folding process a hydrophobic core is shielded by the solvent-accessible surface area of the protein. We study the fractal behavior of 5526 protein structures present in the Brookhaven Protein Data Bank. Power laws of protein mass, volume and solvent-accessible surface area are measured independently. The present findings indicate that self-organized criticality is an alternative explanation for the protein folding. Also we note that the protein packing is an independent and constant value because the self-similar behavior of the volumes and protein masses have the same fractal dimension. This power law guarantees that a protein is a complex system. From the analyzed data, q-Gaussian distributions seem to fit well this class of systems.

  15. Annotation and Curation of Uncharacterized proteins- Challenges

    Directory of Open Access Journals (Sweden)

    Johny eIjaq

    2015-03-01

    Full Text Available Hypothetical Proteins are the proteins that are predicted to be expressed from an open reading frame (ORF, constituting a substantial fraction of proteomes in both prokaryotes and eukaryotes. Genome projects have led to the identification of many therapeutic targets, the putative function of the protein and their interactions. In this review we have enlisted various methods. Annotation linked to structural and functional prediction of hypothetical proteins assist in the discovery of new structures and functions serving as markers and pharmacological targets for drug designing, discovery and screening. Mass spectrometry is an analytical technique for validating protein characterisation. Matrix-assisted laser desorption ionization–mass spectrometry (MALDI-MS is an efficient analytical method. Microarrays and Protein expression profiles help understanding the biological systems through a systems-wide study of proteins and their interactions with other proteins and non-proteinaceous molecules to control complex processes in cells and tissues and even whole organism. Next generation sequencing technology accelerates multiple areas of genomics research.

  16. The eukaryotic genome is structurally and functionally more like a social insect colony than a book.

    Science.gov (United States)

    Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin

    2017-11-01

    Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.

  17. Genome-wide organization and expression profiling of the R2R3-MYB transcription factor family in pineapple (Ananas comosus).

    Science.gov (United States)

    Liu, Chaoyang; Xie, Tao; Chen, Chenjie; Luan, Aiping; Long, Jianmei; Li, Chuhao; Ding, Yaqi; He, Yehua

    2017-07-01

    The MYB proteins comprise one of the largest families of plant transcription factors, which are involved in various plant physiological and biochemical processes. Pineapple (Ananas comosus) is one of three most important tropical fruits worldwide. The completion of pineapple genome sequencing provides a great opportunity to investigate the organization and evolutionary traits of pineapple MYB genes at the genome-wide level. In the present study, a total of 94 pineapple R2R3-MYB genes were identified and further phylogenetically classified into 26 subfamilies, as supported by the conserved gene structures and motif composition. Collinearity analysis indicated that the segmental duplication events played a crucial role in the expansion of pineapple MYB gene family. Further comparative phylogenetic analysis suggested that there have been functional divergences of MYB gene family during plant evolution. RNA-seq data from different tissues and developmental stages revealed distinct temporal and spatial expression profiles of the AcMYB genes. Further quantitative expression analysis showed the specific expression patterns of the selected putative stress-related AcMYB genes in response to distinct abiotic stress and hormonal treatments. The comprehensive expression analysis of the pineapple MYB genes, especially the tissue-preferential and stress-responsive genes, could provide valuable clues for further function characterization. In this work, we systematically identified AcMYB genes by analyzing the pineapple genome sequence using a set of bioinformatics approaches. Our findings provide a global insight into the organization, phylogeny and expression patterns of the pineapple R2R3-MYB genes, and hence contribute to the greater understanding of their biological roles in pineapple.

  18. Genome-Wide Characterization and Expression Analysis of Major Intrinsic Proteins during Abiotic and Biotic Stresses in Sweet Orange (Citrus sinensis L. Osb.).

    Science.gov (United States)

    Martins, Cristina de Paula Santos; Pedrosa, Andresa Muniz; Du, Dongliang; Gonçalves, Luana Pereira; Yu, Qibin; Gmitter, Frederick G; Costa, Marcio Gilberto Cardoso

    2015-01-01

    The family of aquaporins (AQPs), or major intrinsic proteins (MIPs), includes integral membrane proteins that function as transmembrane channels for water and other small molecules of physiological significance. MIPs are classified into five subfamilies in higher plants, including plasma membrane (PIPs), tonoplast (TIPs), NOD26-like (NIPs), small basic (SIPs) and unclassified X (XIPs) intrinsic proteins. This study reports a genome-wide survey of MIP encoding genes in sweet orange (Citrus sinensis L. Osb.), the most widely cultivated Citrus spp. A total of 34 different genes encoding C. sinensis MIPs (CsMIPs) were identified and assigned into five subfamilies (CsPIPs, CsTIPs, CsNIPs, CsSIPs and CsXIPs) based on sequence analysis and also on their phylogenetic relationships with clearly classified MIPs of Arabidopsis thaliana. Analysis of key amino acid residues allowed the assessment of the substrate specificity of each CsMIP. Gene structure analysis revealed that the CsMIPs possess an exon-intron organization that is highly conserved within each subfamily. CsMIP loci were precisely mapped on every sweet orange chromosome, indicating a wide distribution of the gene family in the sweet orange genome. Investigation of their expression patterns in different tissues and upon drought and salt stress treatments, as well as with 'Candidatus Liberibacter asiaticus' infection, revealed a tissue-specific and coordinated regulation of the different CsMIP isoforms, consistent with the organization of the stress-responsive cis-acting regulatory elements observed in their promoter regions. A special role in regulating the flow of water and nutrients is proposed for CsTIPs and CsXIPs during drought stress, and for most CsMIPs during salt stress and the development of HLB disease. These results provide a valuable reference for further exploration of the CsMIPs functions and applications to the genetic improvement of both abiotic and biotic stress tolerance in citrus.

  19. Genome-Wide Characterization and Expression Analysis of Major Intrinsic Proteins during Abiotic and Biotic Stresses in Sweet Orange (Citrus sinensis L. Osb..

    Directory of Open Access Journals (Sweden)

    Cristina de Paula Santos Martins

    Full Text Available The family of aquaporins (AQPs, or major intrinsic proteins (MIPs, includes integral membrane proteins that function as transmembrane channels for water and other small molecules of physiological significance. MIPs are classified into five subfamilies in higher plants, including plasma membrane (PIPs, tonoplast (TIPs, NOD26-like (NIPs, small basic (SIPs and unclassified X (XIPs intrinsic proteins. This study reports a genome-wide survey of MIP encoding genes in sweet orange (Citrus sinensis L. Osb., the most widely cultivated Citrus spp. A total of 34 different genes encoding C. sinensis MIPs (CsMIPs were identified and assigned into five subfamilies (CsPIPs, CsTIPs, CsNIPs, CsSIPs and CsXIPs based on sequence analysis and also on their phylogenetic relationships with clearly classified MIPs of Arabidopsis thaliana. Analysis of key amino acid residues allowed the assessment of the substrate specificity of each CsMIP. Gene structure analysis revealed that the CsMIPs possess an exon-intron organization that is highly conserved within each subfamily. CsMIP loci were precisely mapped on every sweet orange chromosome, indicating a wide distribution of the gene family in the sweet orange genome. Investigation of their expression patterns in different tissues and upon drought and salt stress treatments, as well as with 'Candidatus Liberibacter asiaticus' infection, revealed a tissue-specific and coordinated regulation of the different CsMIP isoforms, consistent with the organization of the stress-responsive cis-acting regulatory elements observed in their promoter regions. A special role in regulating the flow of water and nutrients is proposed for CsTIPs and CsXIPs during drought stress, and for most CsMIPs during salt stress and the development of HLB disease. These results provide a valuable reference for further exploration of the CsMIPs functions and applications to the genetic improvement of both abiotic and biotic stress tolerance in citrus.

  20. Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins.

    Science.gov (United States)

    Ramírez-Sánchez, Obed; Pérez-Rodríguez, Paulino; Delaye, Luis; Tiessen, Axel

    2016-12-01

    Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt)]. Streptophyta have on average only ∼5.7 exons of medium size (∼230nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400nt). Among subcellular compartments, membrane proteins are the largest (∼520aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes. Copyright © 2016 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  1. Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins

    Directory of Open Access Journals (Sweden)

    Obed Ramírez-Sánchez

    2016-12-01

    Full Text Available Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa, average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81 aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt]. Streptophyta have on average only ∼5.7 exons of medium size (∼230 nt. Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400 nt. Among subcellular compartments, membrane proteins are the largest (∼520 aa, whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240 aa. Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes.

  2. Simultaneous improvement of grain yield and protein content in durum wheat by different phenotypic indices and genomic selection.

    Science.gov (United States)

    Rapp, M; Lein, V; Lacoudre, F; Lafferty, J; Müller, E; Vida, G; Bozhanova, V; Ibraliu, A; Thorwarth, P; Piepho, H P; Leiser, W L; Würschum, T; Longin, C F H

    2018-06-01

    Simultaneous improvement of protein content and grain yield by index selection is possible but its efficiency largely depends on the weighting of the single traits. The genetic architecture of these indices is similar to that of the primary traits. Grain yield and protein content are of major importance in durum wheat breeding, but their negative correlation has hampered their simultaneous improvement. To account for this in wheat breeding, the grain protein deviation (GPD) and the protein yield were proposed as targets for selection. The aim of this work was to investigate the potential of different indices to simultaneously improve grain yield and protein content in durum wheat and to evaluate their genetic architecture towards genomics-assisted breeding. To this end, we investigated two different durum wheat panels comprising 159 and 189 genotypes, which were tested in multiple field locations across Europe and genotyped by a genotyping-by-sequencing approach. The phenotypic analyses revealed significant genetic variances for all traits and heritabilities of the phenotypic indices that were in a similar range as those of grain yield and protein content. The GPD showed a high and positive correlation with protein content, whereas protein yield was highly and positively correlated with grain yield. Thus, selecting for a high GPD would mainly increase the protein content whereas a selection based on protein yield would mainly improve grain yield, but a combination of both indices allows to balance this selection. The genome-wide association mapping revealed a complex genetic architecture for all traits with most QTL having small effects and being detected only in one germplasm set, thus limiting the potential of marker-assisted selection for trait improvement. By contrast, genome-wide prediction appeared promising but its performance strongly depends on the relatedness between training and prediction sets.

  3. The Role of Hexon Protein as a Molecular Mold in Patterning the Protein IX Organization in Human Adenoviruses.

    Science.gov (United States)

    Reddy, Vijay S

    2017-09-01

    Adenoviruses are respiratory, ocular and enteric pathogens that form complex capsids, which are assembled from seven different structural proteins and composed of several core proteins that closely interact with the packaged dsDNA genome. The recent near-atomic resolution structures revealed that the interlacing continuous hexagonal network formed by the protein IX molecules is conserved among different human adenoviruses (HAdVs), but not in non-HAdVs. In this report, we propose a distinct role for the hexon protein as a "molecular mold" in enabling the formation of such hexagonal protein IX network that has been shown to preserve the stability and infectivity of HAdVs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. The complete mitochondrial genome sequence of Eimeria innocua (Eimeriidae, Coccidia, Apicomplexa).

    Science.gov (United States)

    Hafeez, Mian Abdul; Vrba, Vladimir; Barta, John Robert

    2016-07-01

    The complete mitochondrial genome of Eimeria innocua KR strain (Eimeriidae, Coccidia, Apicomplexa) was sequenced. This coccidium infects turkeys (Meleagris gallopavo), Bobwhite quails (Colinus virginianus), and Grey partridges (Perdix perdix). Genome organization and gene contents were comparable with other Eimeria spp. infecting galliform birds. The circular-mapping mt genome of E. innocua is 6247 bp in length with three protein-coding genes (cox1, cox3, and cytb), 19 gene fragments encoding large subunit (LSU) rRNA and 14 gene fragments encoding small subunit (SSU) rRNA. Like other Apicomplexa, no tRNA was encoded. The mitochondrial genome of E. innocua confirms its close phylogenetic affinities to Eimeria dispersa.

  5. Complete genome sequence of Sanguibacter keddieii type strain (ST-74T)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, Natalia; Sikorski, Johannes; Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Pati, Amrita; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D' haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Goker, Markus; Pukall, Rudiger; Klenk, Hans-Peter; Kyrpides, Nikos

    2009-05-20

    Sanguibacter keddieii is the type species of the genus Sanguibacter, the only described genus within the family of Sanguibacteraceae. Phylogenetically, this family is located in the neighbourhood of the genus Oerskovia and the family Cellulomonadaceae within the actinobacterial suborder Micrococcineae. The strain described in this report was isolated from blood of apparently healthy cows. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the family Sanguibacteraceae, and the 4,253,413 bp long single replicon genome with its 3735 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  6. Complete genome sequence of Parvibaculum lavamentivorans type strain (DS-1(T)).

    Science.gov (United States)

    Schleheck, David; Weiss, Michael; Pitluck, Sam; Bruce, David; Land, Miriam L; Han, Shunsheng; Saunders, Elizabeth; Tapia, Roxanne; Detter, Chris; Brettin, Thomas; Han, James; Woyke, Tanja; Goodwin, Lynne; Pennacchio, Len; Nolan, Matt; Cook, Alasdair M; Kjelleberg, Staffan; Thomas, Torsten

    2011-12-31

    Parvibaculum lavamentivorans DS-1(T) is the type species of the novel genus Parvibaculum in the novel family Rhodobiaceae (formerly Phyllobacteriaceae) of the order Rhizobiales of Alphaproteobacteria. Strain DS-1(T) is a non-pigmented, aerobic, heterotrophic bacterium and represents the first tier member of environmentally important bacterial communities that catalyze the complete degradation of synthetic laundry surfactants. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,914,745 bp long genome with its predicted 3,654 protein coding genes is the first completed genome sequence of the genus Parvibaculum, and the first genome sequence of a representative of the family Rhodobiaceae.

  7. The phosphatomes of the multicellular myxobacteria Myxococcus xanthus and Sorangium cellulosum in comparison with other prokaryotic genomes.

    Directory of Open Access Journals (Sweden)

    Anke Treuner-Lange

    Full Text Available BACKGROUND: Analysis of the complete genomes from the multicellular myxobacteria Myxococcus xanthus and Sorangium cellulosum identified the highest number of eukaryotic-like protein kinases (ELKs compared to all other genomes analyzed. High numbers of protein phosphatases (PPs could therefore be anticipated, as reversible protein phosphorylation is a major regulation mechanism of fundamental biological processes. METHODOLOGY: Here we report an intensive analysis of the phosphatomes of M. xanthus and S. cellulosum in which we constructed phylogenetic trees to position these sequences relative to PPs from other prokaryotic organisms. PRINCIPAL FINDINGS: PREDOMINANT OBSERVATIONS WERE: (i M. xanthus and S. cellulosum possess predominantly Ser/Thr PPs; (ii S. cellulosum encodes the highest number of PP2c-type phosphatases so far reported for a prokaryotic organism; (iii in contrast to M. xanthus only S. cellulosum encodes high numbers of SpoIIE-like PPs; (iv there is a significant lack of synteny among M. xanthus and S. cellulosum, and (v the degree of co-organization between kinase and phosphatase genes is extremely low in these myxobacterial genomes. CONCLUSIONS: We conclude that there has been a greater expansion of ELKs than PPs in multicellular myxobacteria.

  8. Complete genome sequence of Desulfohalobium retbaense type strain (HR100T)

    Energy Technology Data Exchange (ETDEWEB)

    Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Munk, Christine [U.S. Department of Energy, Joint Genome Institute; Kiss, Hajnalka [Los Alamos National Laboratory (LANL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Han, Cliff [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Schuler, Esther [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Desulfohalobium retbaense (Ollivier et al. 1991) is the type species of the polyphyletic genus Desulfohalobium, which comprises, at the time of writing, two species and represents the family Desulfohalobiaceae within the Deltaproteobacteria. D. retbaense is a moderately halophilic sulfate-reducing bacterium, which can utilize H2 and a limited range of organic substrates, which are incompletely oxidized to acetate and CO2, for growth. The type strain HR100T was isolated from sediments of the hypersaline Retba Lake in Senegal. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Desulfohalobiaceae. The 2,909,567 bp genome (one chromosome and a 45,263 bp plasmid) with its 2,552 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Electrospray droplet exposure to organic vapors: metal ion removal from proteins and protein complexes.

    Science.gov (United States)

    DeMuth, J Corinne; McLuckey, Scott A

    2015-01-20

    The exposure of aqueous nanoelectrospray droplets to various organic vapors can dramatically reduce sodium adduction on protein ions in positive ion mass spectra. Volatile alcohols, such as methanol, ethanol, and isopropanol lead to a significant reduction in sodium ion adduction but are not as effective as acetonitrile, acetone, and ethyl acetate. Organic vapor exposure in the negative ion mode, on the other hand, has essentially no effect on alkali ion adduction. Evidence is presented to suggest that the mechanism by which organic vapor exposure reduces alkali ion adduction in the positive mode involves the depletion of alkali metal ions via ion evaporation of metal ions solvated with organic molecules. The early generation of metal/organic cluster ions during the droplet desolvation process results in fewer metal ions available to condense on the protein ions formed via the charged residue mechanism. These effects are demonstrated with holomyoglobin ions to illustrate that the metal ion reduction takes place without detectable protein denaturation, which might be revealed by heme loss or an increase in charge state distribution. No evidence is observed for denaturation with exposure to any of the organic vapors evaluated in this work.

  10. Genome-wide identification and characterization of odorant-binding protein (OBP) genes in the malaria vector Anopheles sinensis (Diptera: Culicidae).

    Science.gov (United States)

    He, Xiu; He, Zheng-Bo; Zhang, Yu-Juan; Zhou, Yong; Xian, Peng-Jie; Qiao, Liang; Chen, Bin

    2016-06-01

    Anopheles sinensis is a major malaria vector. Insect odorant-binding proteins (OBPs) may function in the reception of odorants in the olfactory system. The classification and characterization of the An. sinensis OBP genes have not been systematically studied. In this study, 64 putative OBP genes were identified at the whole-genome level of An. sinensis based on the comparison between OBP conserved motifs, PBP_GOBP, and phylogenetic analysis with An. gambiae OBPs. The characterization of An. sinensis OBPs, including the motif's conservation, gene structure, genomic organization and classification, were investigated. A new gene, AsOBP73, belonging to the Plus-C subfamily, was identified with the support of transcript and conservative motifs. These An. sinensis OBP genes were classified into three subfamilies with 37, 15 and 12 genes in the subfamily Classic, Atypical and Plus-C, respectively. The genomic organization of An. sinensis OBPs suggests a clustered distribution across nine different scaffolds. Eight genes (OBP23-28, OBP63-64) might originate from a single gene through a series of historic duplication events at least before divergence of Anopheles, Culex and Aedes. The microsynteny analyses indicate a very high synteny between An. sinensis and An. gambiae OBPs. OBP70 and OBP71 earlier classified under Plus-C in An. gambiae are recognized as belonging to the group Obp59a of the Classic subfamily, and OBP69 earlier classified under Plus-C has been moved to the Atypical subfamily in this study. The study established a basic information frame for further study of the OBP genes in insects as well as in An. sinensis. © 2016 Institute of Zoology, Chinese Academy of Sciences.

  11. Repeat-containing protein effectors of plant-associated organisms

    Directory of Open Access Journals (Sweden)

    Carl H. Mesarich

    2015-10-01

    Full Text Available Many plant-associated organisms, including microbes, nematodes, and insects, deliver effector proteins into the apoplast, vascular tissue, or cell cytoplasm of their prospective hosts. These effectors function to promote colonization, typically by altering host physiology or by modulating host immune responses. The same effectors however, can also trigger host immunity in the presence of cognate host immune receptor proteins, and thus prevent colonization. To circumvent effector-triggered immunity, or to further enhance host colonization, plant-associated organisms often rely on adaptive effector evolution. In recent years, it has become increasingly apparent that several effectors of plant-associated organisms are repeat-containing proteins (RCPs that carry tandem or non-tandem arrays of an amino acid sequence or structural motif. In this review, we highlight the diverse roles that these repeat domains play in RCP effector function. We also draw attention to the potential role of these repeat domains in adaptive evolution with regards to RCP effector function and the evasion of effector-triggered immunity. The aim of this review is to increase the profile of RCP effectors from plant-associated organisms.

  12. Genomic organization and dynamics of repetitive DNA sequences in representatives of three Fagaceae genera.

    Science.gov (United States)

    Alves, Sofia; Ribeiro, Teresa; Inácio, Vera; Rocheta, Margarida; Morais-Cecílio, Leonor

    2012-05-01

    Oaks, chestnuts, and beeches are economically important species of the Fagaceae. To understand the relationship between these members of this family, a deep knowledge of their genome composition and organization is needed. In this work, we have isolated and characterized several AFLP fragments obtained from Quercus rotundifolia Lam. through homology searches in available databases. Genomic polymorphisms involving some of these sequences were evaluated in two species of Quercus, one of Castanea, and one of Fagus with specific primers. Comparative FISH analysis with generated sequences was performed in interphase nuclei of the four species, and the co-immunolocalization of 5-methylcytosine was also studied. Some of the sequences isolated proved to be genus-specific, while others were present in all the genera. Retroelements, either gypsy-like of the Tat/Athila clade or copia-like, are well represented, and most are dispersed in euchromatic regions of these species with no DNA methylation associated, pointing to an interspersed arrangement of these retroelements with potential gene-rich regions. A particular gypsy-sequence is dispersed in oaks and chestnut nuclei, but its confinement to chromocenters in beech evidences genome restructuring events during evolution of Fagaceae. Several sequences generated in this study proved to be good tools to comparatively study Fagaceae genome organization.

  13. In Silico Post Genome-Wide Association Studies Analysis of C-Reactive Protein Loci Suggests an Important Role for Interferons

    NARCIS (Netherlands)

    Vaez, Ahmad; Jansen, Rick; Prins, Bram P.; Hottenga, Jouke-Jan; de Geus, Eco J. C.; Boomsma, Dorret I.; Penninx, Brenda W. J. H.; Nolte, Ilja M.; Snieder, Harold; Alizadeh, Behrooz Z.

    Background Genome-wide association studies (GWASs) have successfully identified several single nucleotide polymorphisms (SNPs) associated with serum levels of C-reactive protein (CRP). An important limitation of GWASs is that the identified variants merely flag the nearby genomic region and do not

  14. In Silico Post Genome-Wide Association Studies Analysis of C-Reactive Protein Loci Suggests an Important Role for Interferons

    NARCIS (Netherlands)

    Vaez, A.; Jansen, R.; Prins, B.P.; Hottenga, J.J.; de Geus, E.J.C.; Boomsma, D.I.; Penninx, B.W.J.H.; Nolte, I.M.; Snieder, H.; Alizadeh, BZ

    2015-01-01

    Background - Genome-wide association studies (GWASs) have successfully identified several single nucleotide polymorphisms (SNPs) associated with serum levels of C-reactive protein (CRP). An important limitation of GWASs is that the identified variants merely flag the nearby genomic region and do not

  15. Bioinformatic analysis of microRNA biogenesis and function related proteins in eleven animal genomes.

    Science.gov (United States)

    Liu, Xiuying; Luo, GuanZheng; Bai, Xiujuan; Wang, Xiu-Jie

    2009-10-01

    MicroRNAs are approximately 22 nt long small non-coding RNAs that play important regulatory roles in eukaryotes. The biogenesis and functional processes of microRNAs require the participation of many proteins, of which, the well studied ones are Dicer, Drosha, Argonaute and Exportin 5. To systematically study these four protein families, we screened 11 animal genomes to search for genes encoding above mentioned proteins, and identified some new members for each family. Domain analysis results revealed that most proteins within the same family share identical or similar domains. Alternative spliced transcript variants were found for some proteins. We also examined the expression patterns of these proteins in different human tissues and identified other proteins that could potentially interact with these proteins. These findings provided systematic information on the four key proteins involved in microRNA biogenesis and functional pathways in animals, and will shed light on further functional studies of these proteins.

  16. Short and long-term genome stability analysis of prokaryotic genomes.

    Science.gov (United States)

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were

  17. Identification of an Arabidopsis thaliana protein that binds to tomato mosaic virus genomic RNA and inhibits its multiplication

    International Nuclear Information System (INIS)

    Fujisaki, Koki; Ishikawa, Masayuki

    2008-01-01

    The genomic RNAs of positive-strand RNA viruses carry RNA elements that play positive, or in some cases, negative roles in virus multiplication by interacting with viral and cellular proteins. In this study, we purified Arabidopsis thaliana proteins that specifically bind to 5' or 3' terminal regions of tomato mosaic virus (ToMV) genomic RNA, which contain important regulatory elements for translation and RNA replication, and identified these proteins by mass spectrometry analyses. One of these host proteins, named BTR1, harbored three heterogeneous nuclear ribonucleoprotein K-homology RNA-binding domains and preferentially bound to RNA fragments that contained a sequence around the initiation codon of the 130K and 180K replication protein genes. The knockout and overexpression of BTR1 specifically enhanced and inhibited, respectively, ToMV multiplication in inoculated A. thaliana leaves, while such effect was hardly detectable in protoplasts. These results suggest that BTR1 negatively regulates the local spread of ToMV

  18. Structurally Complex Organization of Repetitive DNAs in the Genome of Cobia (Rachycentron canadum).

    Science.gov (United States)

    Costa, Gideão W W F; Cioffi, Marcelo de B; Bertollo, Luiz A C; Molina, Wagner F

    2015-06-01

    Repetitive DNAs comprise the largest fraction of the eukaryotic genome. They include microsatellites or simple sequence repeats (SSRs), which play an important role in the chromosome differentiation among fishes. Rachycentron canadum is the only representative of the family Rachycentridae. This species has been focused on several multidisciplinary studies in view of its important potential for marine fish farming. In the present study, distinct classes of repetitive DNAs, with emphasis on SSRs, were mapped in the chromosomes of this species to improve the knowledge of its genome organization. Microsatellites exhibited a diversified distribution, both dispersed in euchromatin and clustered in the heterochromatin. The multilocus location of SSRs strengthened the heterochromatin heterogeneity in this species, as suggested by some previous studies. The colocalization of SSRs with retrotransposons and transposons pointed to a close evolutionary relationship between these repetitive sequences. A number of heterochromatic regions highlighted a greater complex organization than previously supposed, harboring a diversity of repetitive elements. In this sense, there was also evidence of colocalization of active genetic regions and different classes of repetitive DNAs in a common heterochromatic region, which offers a potential opportunity for further researches regarding the interaction of these distinct fractions in fish genomes.

  19. Genomic sequence and organization of two members of a human lectin gene family

    International Nuclear Information System (INIS)

    Gitt, M.A.; Barondes, S.H.

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family

  20. Transcription factor IID in the Archaea: sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein of eukaryotes

    Science.gov (United States)

    Marsh, T. L.; Reich, C. I.; Whitelock, R. B.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1994-01-01

    The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.

  1. Synaptogenic proteins and synaptic organizers: "many hands make light work".

    Science.gov (United States)

    Brose, Nils

    2009-03-12

    Synaptogenesis is thought to be mediated by cell adhesion proteins, which induce the initial contact between an axon and its target cell and subsequently recruit and organize the presynaptic and postsynaptic protein machinery required for synaptic transmission. A new study by Linhoff and colleagues in this issue of Neuron identifies adhesion proteins of the LRRTM family as novel synaptic organizers.

  2. Evolution of plant virus movement proteins from the 30K superfamily and of their homologs integrated in plant genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mushegian, Arcady R., E-mail: mushegian2@gmail.com [Division of Molecular and Cellular Biosciences, National Science Foundation, 4201 Wilson Boulevard, Arlington, VA 22230 (United States); Elena, Santiago F., E-mail: sfelena@ibmcp.upv.es [Instituto de Biología Molecular y Celular de Plantas, CSIC-UPV, 46022 València (Spain); The Santa Fe Institute, Santa Fe, NM 87501 (United States)

    2015-02-15

    Homologs of Tobacco mosaic virus 30K cell-to-cell movement protein are encoded by diverse plant viruses. Mechanisms of action and evolutionary origins of these proteins remain obscure. We expand the picture of conservation and evolution of the 30K proteins, producing sequence alignment of the 30K superfamily with the broadest phylogenetic coverage thus far and illuminating structural features of the core all-beta fold of these proteins. Integrated copies of pararetrovirus 30K movement genes are prevalent in euphyllophytes, with at least one copy intact in nearly every examined species, and mRNAs detected for most of them. Sequence analysis suggests repeated integrations, pseudogenizations, and positive selection in those provirus genes. An unannotated 30K-superfamily gene in Arabidopsis thaliana genome is likely expressed as a fusion with the At1g37113 transcript. This molecular background of endopararetrovirus gene products in plants may change our view of virus infection and pathogenesis, and perhaps of cellular homeostasis in the hosts. - Highlights: • Sequence region shared by plant virus “30K” movement proteins has an all-beta fold. • Most euphyllophyte genomes contain integrated copies of pararetroviruses. • These integrated virus genomes often include intact movement protein genes. • Molecular evidence suggests that these “30K” genes may be selected for function.

  3. Comparative genome analysis of Bacillus cereus group genomes withBacillus subtilis

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D' Souza, Mark; Larsen, Niels; Pusch,Gordon; Liolios, Konstantinos; Grechkin, Yuri; Lapidus, Alla; Goltsman,Eugene; Chu, Lien; Fonstein, Michael; Ehrlich, S. Dusko; Overbeek, Ross; Kyrpides, Nikos; Ivanova, Natalia

    2005-09-14

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.

  4. Identifying neuropeptide and protein hormone receptors in Drosophila melanogaster by exploiting genomic data

    DEFF Research Database (Denmark)

    Hauser, Frank; Williamson, Michael; Cazzamali, Giuseppe

    2006-01-01

    insect genome, that of the fruitfly Drosophila melanogaster, was sequenced in 2000, and about 200 GPCRs have been annnotated in this model insect. About 50 of these receptors were predicted to have neuropeptides or protein hormones as their ligands. Since 2000, the cDNAs of most of these candidate...... receptors have been cloned and for many receptors the endogenous ligand has been identified. In this review, we will give an update about the current knowledge of all Drosophila neuropeptide and protein hormone receptors, and discuss their phylogenetic relationships. Udgivelsesdato: 2006-Feb...

  5. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  6. A Network of Multi-Tasking Proteins at the DNA Replication Fork Preserves Genome Stability.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available To elucidate the network that maintains high fidelity genome replication, we have introduced two conditional mutant alleles of DNA2, an essential DNA replication gene, into each of the approximately 4,700 viable yeast deletion mutants and determined the fitness of the double mutants. Fifty-six DNA2-interacting genes were identified. Clustering analysis of genomic synthetic lethality profiles of each of 43 of the DNA2-interacting genes defines a network (consisting of 322 genes and 876 interactions whose topology provides clues as to how replication proteins coordinate regulation and repair to protect genome integrity. The results also shed new light on the functions of the query gene DNA2, which, despite many years of study, remain controversial, especially its proposed role in Okazaki fragment processing and the nature of its in vivo substrates. Because of the multifunctional nature of virtually all proteins at the replication fork, the meaning of any single genetic interaction is inherently ambiguous. The multiplexing nature of the current studies, however, combined with follow-up supporting experiments, reveals most if not all of the unique pathways requiring Dna2p. These include not only Okazaki fragment processing and DNA repair but also chromatin dynamics.

  7. Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity.

    Directory of Open Access Journals (Sweden)

    Tamara Smokvina

    Full Text Available Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis

  8. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  9. Genome-wide analyses and functional classification of proline repeat-rich proteins: potential role of eIF5A in eukaryotic evolution.

    Directory of Open Access Journals (Sweden)

    Ajeet Mandal

    Full Text Available The eukaryotic translation factor, eIF5A has been recently reported as a sequence-specific elongation factor that facilitates peptide bond formation at consecutive prolines in Saccharomyces cerevisiae, as its ortholog elongation factor P (EF-P does in bacteria. We have searched the genome databases of 35 representative organisms from six kingdoms of life for PPP (Pro-Pro-Pro and/or PPG (Pro-Pro-Gly-encoding genes whose expression is expected to depend on eIF5A. We have made detailed analyses of proteome data of 5 selected species, Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, Mus musculus and Homo sapiens. The PPP and PPG motifs are low in the prokaryotic proteomes. However, their frequencies markedly increase with the biological complexity of eukaryotic organisms, and are higher in newly derived proteins than in those orthologous proteins commonly shared in all species. Ontology classifications of S. cerevisiae and human genes encoding the highest level of polyprolines reveal their strong association with several specific biological processes, including actin/cytoskeletal associated functions, RNA splicing/turnover, DNA binding/transcription and cell signaling. Previously reported phenotypic defects in actin polarity and mRNA decay of eIF5A mutant strains are consistent with the proposed role for eIF5A in the translation of the polyproline-containing proteins. Of all the amino acid tandem repeats (≥3 amino acids, only the proline repeat frequency correlates with functional complexity of the five organisms examined. Taken together, these findings suggest the importance of proline repeat-rich proteins and a potential role for eIF5A and its hypusine modification pathway in the course of eukaryotic evolution.

  10. In vitro evolution of terminal protein-containing genomes

    Science.gov (United States)

    Esteban, José A.; Blanco, Luis; Villar, Laurentino; Salas, Margarita

    1997-01-01

    A new self-sustained terminal protein-primed DNA amplification system has been used to describe in vitro evolutionary changes affecting maintenance of the genome size of bacteriophage φ29. These changes involve generation and efficient amplification of short palindromic molecules containing an inverted duplication of one of the original DNA ends. A template-switching mechanism is proposed to account for the appearance of these molecules. After their formation, they would replicate by means of hairpin intermediates. Relevant kinetic information about this DNA replication system has been obtained from the competition between the input full-length φ29 DNA and its derived truncated versions. The physiological relevance of these molecules and the mechanisms to control their formation are discussed. PMID:9096322

  11. Complete genome sequence of Halanaerobium praevalens type strain (GSLT)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chertkov, Olga [Los Alamos National Laboratory (LANL); Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Hammon, Nancy [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kannan, K. Palani [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

    2011-01-01

    Halanaerobium praevalens Zeikus et al. 1984 is the type species of the genus Halanaero- bium, which in turn is the type genus of the family Halanaerobiaceae. The species is of inter- est because it is able to reduce a variety of nitro-substituted aromatic compounds at a high rate, and because of its ability to degrade organic pollutants. The strain is also of interest be- cause it functions as a hydrolytic bacterium, fermenting complex organic matter and produc- ing intermediary metabolites for other trophic groups such as sulfate-reducing and methano- genic bacteria. It is further reported as being involved in carbon removal in the Great Salt Lake, its source of isolation. This is the first completed genome sequence of a representative of the genus Halanaerobium and the second genome sequence from a type strain of the fami- ly Halanaerobiaceae. The 2,309,262 bp long genome with its 2,110 protein-coding and 70 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  12. Discovery of global genomic re-organization based on comparison of two newly sequenced rice mitochondrial genomes with cytoplasmic male sterility-related genes

    Directory of Open Access Journals (Sweden)

    Yamada Mari

    2010-03-01

    Full Text Available Abstract Background Plant mitochondrial genomes are known for their complexity, and there is abundant evidence demonstrating that this organelle is important for plant sexual reproduction. Cytoplasmic male sterility (CMS is a phenomenon caused by incompatibility between the nucleus and mitochondria that has been discovered in various plant species. As the exact sequence of steps leading to CMS has not yet been revealed, efforts should be made to elucidate the factors underlying the mechanism of this important trait for crop breeding. Results Two CMS mitochondrial genomes, LD-CMS, derived from Oryza sativa L. ssp. indica (434,735 bp, and CW-CMS, derived from Oryza rufipogon Griff. (559,045 bp, were newly sequenced in this study. Compared to the previously sequenced Nipponbare (Oryza sativa L. ssp. japonica mitochondrial genome, the presence of 54 out of 56 protein-encoding genes (including pseudo-genes, 22 tRNA genes (including pseudo-tRNAs, and three rRNA genes was conserved. Two other genes were not present in the CW-CMS mitochondrial genome, and one of them was present as part of the newly identified chimeric ORF, CW-orf307. At least 12 genomic recombination events were predicted between the LD-CMS mitochondrial genome and Nipponbare, and 15 between the CW-CMS genome and Nipponbare, and novel genetic structures were formed by these genomic rearrangements in the two CMS lines. At least one of the genomic rearrangements was completely unique to each CMS line and not present in 69 rice cultivars or 9 accessions of O. rufipogon. Conclusion Our results demonstrate novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm, and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event. Genomic rearrangements were dynamic in the CMS lines in comparison with those of rice cultivars, suggesting that 'death' and possible 'birth' processes of the

  13. CRISPR/Cas9 Based Genome Editing of Penicillium chrysogenum.

    Science.gov (United States)

    Pohl, C; Kiel, J A K W; Driessen, A J M; Bovenberg, R A L; Nygård, Y

    2016-07-15

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially relevant cell factory. The developed CRISPR/Cas9 toolbox is highly flexible and allows editing of new targets with minimal cloning efforts. The Cas9 protein and the sgRNA can be either delivered during transformation, as preassembled CRISPR-Cas9 ribonucleoproteins (RNPs) or expressed from an AMA1 based plasmid within the cell. The direct delivery of the Cas9 protein with in vitro synthesized sgRNA to the cells allows for a transient method for genome engineering that may rapidly be applicable for other filamentous fungi. The expression of Cas9 from an AMA1 based vector was shown to be highly efficient for marker-free gene deletions.

  14. Apophysomyces variabilis: draft genome sequence and comparison of predictive virulence determinants with other medically important Mucorales.

    Science.gov (United States)

    Prakash, Hariprasath; Rudramurthy, Shivaprakash Mandya; Gandham, Prasad S; Ghosh, Anup Kumar; Kumar, Milner M; Badapanda, Chandan; Chakrabarti, Arunaloke

    2017-09-18

    Apophysomyces species are prevalent in tropical countries and A. variabilis is the second most frequent agent causing mucormycosis in India. Among Apophysomyces species, A. elegans, A. trapeziformis and A. variabilis are commonly incriminated in human infections. The genome sequences of A. elegans and A. trapeziformis are available in public database, but not A. variabilis. We, therefore, performed the whole genome sequence of A. variabilis to explore its genomic structure and possible genes determining the virulence of the organism. The whole genome of A. variabilis NCCPF 102052 was sequenced and the genomic structure of A. variabilis was compared with already available genome structures of A. elegans, A. trapeziformis and other medically important Mucorales. The total size of genome assembly of A. variabilis was 39.38 Mb with 12,764 protein-coding genes. The transposable elements (TEs) were low in Apophysomyces genome and the retrotransposon Ty3-gypsy was the common TE. Phylogenetically, Apophysomyces species were grouped closely with Phycomyces blakesleeanus. OrthoMCL analysis revealed 3025 orthologues proteins, which were common in those three pathogenic Apophysomyces species. Expansion of multiple gene families/duplication was observed in Apophysomyces genomes. Approximately 6% of Apophysomyces genes were predicted to be associated with virulence on PHIbase analysis. The virulence determinants included the protein families of CotH proteins (invasins), proteases, iron utilisation pathways, siderophores and signal transduction pathways. Serine proteases were the major group of proteases found in all Apophysomyces genomes. The carbohydrate active enzymes (CAZymes) constitute the majority of the secretory proteins. The present study is the maiden attempt to sequence and analyze the genomic structure of A. variabilis. Together with available genome sequence of A. elegans and A. trapeziformis, the study helped to indicate the possible virulence determinants of

  15. Protein domain organisation: adding order.

    Science.gov (United States)

    Kummerfeld, Sarah K; Teichmann, Sarah A

    2009-01-29

    Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit. Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved. We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance. To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation. Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected degree of clustering and more domain pairs in forward and

  16. Protein domain organisation: adding order

    Directory of Open Access Journals (Sweden)

    Kummerfeld Sarah K

    2009-01-01

    Full Text Available Abstract Background Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit. Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved. Results We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance. To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation. Conclusion Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected

  17. Protein structure similarity clustering (PSSC) and natural product structure as inspiration sources for drug development and chemical genomics

    NARCIS (Netherlands)

    Dekker, Frank J; Koch, Marcus A; Waldmann, Herbert; Dekker, Frans

    Finding small molecules that modulate protein function is of primary importance in drug development and in the emerging field of chemical genomics. To facilitate the identification of such molecules, we developed a novel strategy making use of structural conservatism found in protein domain

  18. Cloud prediction of protein structure and function with PredictProtein for Debian.

    Science.gov (United States)

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  19. Enabling systematic interrogation of protein-protein interactions in live cells with a versatile ultra-high-throughput biosensor platform | Office of Cancer Genomics

    Science.gov (United States)

    The vast datasets generated by next generation gene sequencing and expression profiling have transformed biological and translational research. However, technologies to produce large-scale functional genomics datasets, such as high-throughput detection of protein-protein interactions (PPIs), are still in early development. While a number of powerful technologies have been employed to detect PPIs, a singular PPI biosensor platform featured with both high sensitivity and robustness in a mammalian cell environment remains to be established.

  20. Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats

    Directory of Open Access Journals (Sweden)

    Adler Joël

    2007-12-01

    Full Text Available Abstract Background Many parasitic organisms, eukaryotes as well as bacteria, possess surface antigens with amino acid repeats. Making up the interface between host and pathogen such repetitive proteins may be virulence factors involved in immune evasion or cytoadherence. They find immunological applications in serodiagnostics and vaccine development. Here we use proteins which contain perfect repeats as a basis for comparative genomics between parasitic and free-living organisms. Results We have developed Reptile http://reptile.unibe.ch, a program for proteome-wide probabilistic description of perfect repeats in proteins. Parasite proteomes exhibited a large variance regarding the proportion of repeat-containing proteins. Interestingly, there was a good correlation between the percentage of highly repetitive proteins and mean protein length in parasite proteomes, but not at all in the proteomes of free-living eukaryotes. Reptile combined with programs for the prediction of transmembrane domains and GPI-anchoring resulted in an effective tool for in silico identification of potential surface antigens and virulence factors from parasites. Conclusion Systemic surveys for perfect amino acid repeats allowed basic comparisons between free-living and parasitic organisms that were directly applicable to predict proteins of serological and parasitological importance. An on-line tool is available at http://genomics.unibe.ch/dora.

  1. Chromosome-wise Protein Interaction Patterns and Their Impact on Functional Implications of Large-Scale Genomic Aberrations

    DEFF Research Database (Denmark)

    Kirk, Isa Kristina; Weinhold, Nils; Belling, Kirstine González-Izarzugaza

    2017-01-01

    Gene copy-number changes influence phenotypes through gene-dosage alteration and subsequent changes of protein complex stoichiometry. Human trisomies where gene copy numbers are increased uniformly over entire chromosomes provide generic cases for studying these relationships. In most trisomies......, gene and protein level alterations have fatal consequences. We used genome-wide protein-protein interaction data to identify chromosome-specific patterns of protein interactions. We found that some chromosomes encode proteins that interact infrequently with each other, chromosome 21 in particular. We...... combined the protein interaction data with transcriptome data from human brain tissue to investigate how this pattern of global interactions may affect cellular function. We identified highly connected proteins that also had coordinated gene expression. These proteins were associated with important...

  2. Intrinsically Disordered Proteins and the Origins of Multicellular Organisms

    Science.gov (United States)

    Dunker, A. Keith

    In simple multicellular organisms all of the cells are in direct contact with the surrounding milieu, whereas in complex multicellular organisms some cells are completely surrounded by other cells. Current phylogenetic trees indicate that complex multicellular organisms evolved independently from unicellular ancestors about 10 times, and only among the eukaryotes, including once for animals, twice each for green, red, and brown algae, and thrice for fungi. Given these multiple independent evolutionary lineages, we asked two questions: 1. Which molecular functions underpinned the evolution of multicellular organisms?; and, 2. Which of these molecular functions depend on intrinsically disordered proteins (IDPs)? Compared to unicellularity, multicellularity requires the advent of molecules for cellular adhesion, for cell-cell communication and for developmental programs. In addition, the developmental programs need to be regulated over space and time. Finally, each multicellular organism has cell-specific biochemistry and physiology. Thus, the evolution of complex multicellular organisms from unicellular ancestors required five new classes of functions. To answer the second question we used Key-words in Swiss Protein ranked for associations with predictions of protein structure or disorder. With a Z-score of 18.8 compared to random-function proteins, à differentiation was the biological process most strongly associated with IDPs. As expected from this result, large numbers of individual proteins associated with differentiation exhibit substantial regions of predicted disorder. For the animals for which there is the most readily available data all five of the underpinning molecular functions for multicellularity were found to depend critically on IDP-based mechanisms and other evidence supports these ideas. While the data are more sparse, IDPs seem to similarly underlie the five new classes of functions for plants and fungi as well, suggesting that IDPs were indeed

  3. Role of Shwachman-Bodian-Diamond syndrome protein in translation machinery and cell chemotaxis: a comparative genomics approach

    Directory of Open Access Journals (Sweden)

    Vasieva O

    2011-09-01

    Full Text Available Olga VasievaInstitute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom; Fellowship for the Interpretation of Genomes, Burr Ridge, IL, USAAbstract: Shwachman-Bodian-Diamond syndrome (SBDS is linked to a mutation in a single gene. The SBDS proinvolved in RNA metabolism and ribosome-associated functions, but SBDS mutation is primarily linked to a defect in polymorphonuclear leukocytes unable to orient correctly in a spatial gradient of chemoattractants. Results of data mining and comparative genomic approaches undertaken in this study suggest that SBDS protein is also linked to tRNA metabolism and translation initiation. Analysis of crosstalk between translation machinery and cytoskeletal dynamics provides new insights into the cellular chemotactic defects caused by SBDS protein malfunction. The proposed functional interactions provide a new approach to exploit potential targets in the treatment and monitoring of this disease.Keywords: Shwachman-Bodian-Diamond syndrome, wybutosine, tRNA, chemotaxis, translation, genomics, gene proximity

  4. Complete genome sequence of Saccharomonospora viridis type strain (P101T)

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Lucas, Susan; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Brettin, Thomas; Han, Cliff; Detter, John C.; Kuske, Cheryl; Bruce, David; Goodwin, Lynne; Chain, Patrick; D' haeseleer, Patrik; Chen, Amy; Palaniappan, Krishna; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Rohde, Manfred; Tindall, Brian J.; Goker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides1, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Saccharomonospora viridis (Schuurmans et al. 1956) Nonomurea and Ohara 1971 is the type species of the genus Saccharomonospora which belongs to the family Pseudonocardiaceae. S. viridis is of interest because it is a Gram-negative organism classified amongst the usually Gram-positive actinomycetes. Members of the species are frequently found in hot compost and hay, and its spores can cause farmer?s lung disease, bagassosis, and humidifier fever. Strains of the species S. viridis have been found to metabolize the xenobiotic pentachlorophenol (PCP). The strain described in this study has been isolated from peat-bog in Ireland. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the family Pseudonocardiaceae, and the 4,308,349 bp long single replicon genome with its 3906 protein-coding and 64 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. Complete genome sequence of Capnocytophaga ochracea type strain (VPI 2845T)

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, Konstantinos; Gronow, Sabine; Saunders, Elizabeth; Land, Miriam; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice1, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Pati, Amrita; Ivanova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Brettin, Thomas; Detter, John C.; Han, Cliff; Bristow, James; Goker, Markus; Rohde, Manfred; Eisen, Jonathan A.; Markowitz, Victor; Kyrpides, Nikos C.; Klenk, Hans-Peter; Hugenholtz, Philip

    2009-05-20

    Capnocytophaga ochracea (Prevot et al. 1956) Leadbetter et al. 1982 is the type species of the genus Capnocytophaga. It is of interest because of its location in the Flavobacteriaceae, a genomically yet uncharted family within the order Flavobacteriales. The species grows as fusiform to rod shaped cells which tend to form clumps and are able to move by gliding. C. ochracea is known as a capnophilic organism with the ability to grow under anaerobic as well as under aerobic conditions (oxygen concentration larger than 15percent), here only in the presence of 5percent CO2. Strain VPI 2845T, the type strain of the species, is portrayed in this report as a gliding, Gram-negative bacterium, originally isolated from a human oral cavity. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence from the flavobacterial genus Capnocytophaga, and the 2,612,925 bp long single replicon genome with its 2193 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  6. Genomic organization of the canine herpesvirus US region.

    Science.gov (United States)

    Haanes, E J; Tomlinson, C C

    1998-02-01

    Canine herpesvirus (CHV) is an alpha-herpesvirus of limited pathogenicity in healthy adult dogs and infectivity of the virus appears to be largely limited to cells of canine origin. CHV's low virulence and species specificity make it an attractive candidate for a recombinant vaccine vector to protect dogs against a variety of pathogens. As part of the analysis of the CHV genome, the authors determined the complete nucleotide sequence of the CHV US region as well as portions of the flanking inverted repeats. Seven full open reading frames (ORFs) encoding proteins larger than 100 amino acids were identified within, or partially within the CHV US: cUS2, cUS3, cUS4, cUS6, cUS7, cUS8 and cUS9; which are homologs of the herpes simplex virus type-1 US2; protein kinase; gG, gD, gI, gE; and US9 genes, respectively. An eighth ORF was identified in the inverted repeat region, cIR6, a homolog of the equine herpesvirus type-1 IR6 gene. The authors identified and mapped most of the major transcripts for the predicted CHV US ORFs by Northern analysis.

  7. Computational prediction of cAMP receptor protein (CRP binding sites in cyanobacterial genomes

    Directory of Open Access Journals (Sweden)

    Su Zhengchang

    2009-01-01

    Full Text Available Abstract Background Cyclic AMP receptor protein (CRP, also known as catabolite gene activator protein (CAP, is an important transcriptional regulator widely distributed in many bacteria. The biological processes under the regulation of CRP are highly diverse among different groups of bacterial species. Elucidation of CRP regulons in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. Previously, CRP has been experimentally studied in only two cyanobacterial strains: Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120; therefore, a systematic genome-scale study of the potential CRP target genes and binding sites in cyanobacterial genomes is urgently needed. Results We have predicted and analyzed the CRP binding sites and regulons in 12 sequenced cyanobacterial genomes using a highly effective cis-regulatory binding site scanning algorithm. Our results show that cyanobacterial CRP binding sites are very similar to those in E. coli; however, the regulons are very different from that of E. coli. Furthermore, CRP regulons in different cyanobacterial species/ecotypes are also highly diversified, ranging from photosynthesis, carbon fixation and nitrogen assimilation, to chemotaxis and signal transduction. In addition, our prediction indicates that crp genes in modern cyanobacteria are likely inherited from a common ancestral gene in their last common ancestor, and have adapted various cellular functions in different environments, while some cyanobacteria lost their crp genes as well as CRP binding sites during the course of evolution. Conclusion The CRP regulons in cyanobacteria are highly diversified, probably as a result of divergent evolution to adapt to various ecological niches. Cyanobacterial CRPs may function as lineage-specific regulators participating in various cellular processes, and are important in some lineages. However, they are dispensable in some other lineages. The

  8. Biocuration at the Saccharomyces genome database.

    Science.gov (United States)

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. © 2015 Wiley Periodicals, Inc.

  9. Complete genome sequence of Cryptobacterium curtum type strain (12-3T)

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, Konstantinos; Pukall, Rudiger; Rohde, Christine; Sims, David; Brettin, Thomas; Kuske, Cheryl; Detter, John C.; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; D' haeseleer, Patrik; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Rohde, Manfred; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2009-05-20

    Cryptobacterium curtum Nakazawa et al. 1999 is the type species of the genus, and is of phylogenetic interest because of its very distant and isolated position within the family Coriobacteriaceae. C. curtum is an asaccharolytic, opportunistic pathogen with a typical occurrence in the oral cavity, involved in dental and oral infections like periodontitis, inflammations and abscesses. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the actinobacterial family Coriobacteriaceae, and this 1,617,804 bp long single replicon genome with its 1364 protein-coding and 58 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Non-Protein Coding RNAs

    CERN Document Server

    Walter, Nils G; Batey, Robert T

    2009-01-01

    This book assembles chapters from experts in the Biophysics of RNA to provide a broadly accessible snapshot of the current status of this rapidly expanding field. The 2006 Nobel Prize in Physiology or Medicine was awarded to the discoverers of RNA interference, highlighting just one example of a large number of non-protein coding RNAs. Because non-protein coding RNAs outnumber protein coding genes in mammals and other higher eukaryotes, it is now thought that the complexity of organisms is correlated with the fraction of their genome that encodes non-protein coding RNAs. Essential biological processes as diverse as cell differentiation, suppression of infecting viruses and parasitic transposons, higher-level organization of eukaryotic chromosomes, and gene expression itself are found to largely be directed by non-protein coding RNAs. The biophysical study of these RNAs employs X-ray crystallography, NMR, ensemble and single molecule fluorescence spectroscopy, optical tweezers, cryo-electron microscopy, and ot...

  11. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    Science.gov (United States)

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.

  12. Platform comparison for evaluation of ALK protein immunohistochemical expression, genomic copy number and hotspot mutation status in neuroblastomas.

    Directory of Open Access Journals (Sweden)

    Benedict Yan

    Full Text Available ALK is an established causative oncogenic driver in neuroblastoma, and is likely to emerge as a routine biomarker in neuroblastoma diagnostics. At present, the optimal strategy for clinical diagnostic evaluation of ALK protein, genomic and hotspot mutation status is not well-studied. We evaluated ALK immunohistochemical (IHC protein expression using three different antibodies (ALK1, 5A4 and D5F3 clones, ALK genomic status using single-color chromogenic in situ hybridization (CISH, and ALK hotspot mutation status using conventional Sanger sequencing and a next-generation sequencing platform (Ion Torrent Personal Genome Machine (IT-PGM, in archival formalin-fixed, paraffin-embedded neuroblastoma samples. We found a significant difference in IHC results using the three different antibodies, with the highest percentage of positive cases seen on D5F3 immunohistochemistry. Correlation with ALK genomic and hotspot mutational status revealed that the majority of D5F3 ALK-positive cases did not possess either ALK genomic amplification or hotspot mutations. Comparison of sequencing platforms showed a perfect correlation between conventional Sanger and IT-PGM sequencing. Our findings suggest that D5F3 immunohistochemistry, single-color CISH and IT-PGM sequencing are suitable assays for evaluation of ALK status in future neuroblastoma clinical trials.

  13. Genomic sequence of 'Candidatus Liberibacter solanacearum' haplotype C and its comparison with haplotype A and B genomes.

    Directory of Open Access Journals (Sweden)

    Jinhui Wang

    Full Text Available Haplotypes A and B of 'Candidatus Liberibacter solanacearum' (CLso are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp and FIN111 (1.20 Mbp, were obtained from carrot psyllids (Trioza apicalis harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.

  14. The mitochondrial gene encoding ribosomal protein S12 has been translocated to the nuclear genome in Oenothera.

    Science.gov (United States)

    Grohmann, L; Brennicke, A; Schuster, W

    1992-01-01

    The Oenothera mitochondrial genome contains only a gene fragment for ribosomal protein S12 (rps12), while other plants encode a functional gene in the mitochondrion. The complete Oenothera rps12 gene is located in the nucleus. The transit sequence necessary to target this protein to the mitochondrion is encoded by a 5'-extension of the open reading frame. Comparison of the amino acid sequence encoded by the nuclear gene with the polypeptides encoded by edited mitochondrial cDNA and genomic sequences of other plants suggests that gene transfer between mitochondrion and nucleus started from edited mitochondrial RNA molecules. Mechanisms and requirements of gene transfer and activation are discussed. Images PMID:1454526

  15. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    Science.gov (United States)

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  16. Genome sequence and description of Anaerosalibacter massiliensis sp. nov.

    Directory of Open Access Journals (Sweden)

    N. Dione

    2016-03-01

    Full Text Available Anaerosalibacter massiliensis sp. nov. strain ND1T (= CSUR P762 = DSM 27308 is the type strain of A. massiliensis sp. nov., a new species within the genus Anaerosalibacter. This strain, the genome of which is described here, was isolated from the faecal flora of a 49-year-old healthy Brazilian man. Anaerosalibacter massiliensis is a Gram-positive, obligate anaerobic rod and member of the family Clostridiaceae. With the complete genome sequence and annotation, we describe here the features of this organism. The 3 197 911 bp long genome (one chromosome but no plasmid contains 3271 protein-coding and 62 RNA genes, including six rRNA genes.

  17. Molecular characterization of genome segments 1 and 3 encoding two capsid proteins of Antheraea mylitta cytoplasmic polyhedrosis virus

    Directory of Open Access Journals (Sweden)

    Chakrabarti Mrinmay

    2010-08-01

    Full Text Available Abstract Background Antheraea mylitta cytoplasmic polyhedrosis virus (AmCPV, a cypovirus of Reoviridae family, infects Indian non-mulberry silkworm, Antheraea mylitta, and contains 11 segmented double stranded RNA (S1-S11 in its genome. Some of its genome segments (S2 and S6-S11 have been previously characterized but genome segments encoding viral capsid have not been characterized. Results In this study genome segments 1 (S1 and 3 (S3 of AmCPV were converted to cDNA, cloned and sequenced. S1 consisted of 3852 nucleotides, with one long ORF of 3735 nucleotides and could encode a protein of 1245 amino acids with molecular mass of ~141 kDa. Similarly, S3 consisted of 3784 nucleotides having a long ORF of 3630 nucleotides and could encode a protein of 1210 amino acids with molecular mass of ~137 kDa. BLAST analysis showed 20-22% homology of S1 and S3 sequence with spike and capsid proteins, respectively, of other closely related cypoviruses like Bombyx mori CPV (BmCPV, Lymantria dispar CPV (LdCPV, and Dendrolimus punctatus CPV (DpCPV. The ORFs of S1 and S3 were expressed as 141 kDa and 137 kDa insoluble His-tagged fusion proteins, respectively, in Escherichia coli M15 cells via pQE-30 vector, purified through Ni-NTA chromatography and polyclonal antibodies were raised. Immunoblot analysis of purified polyhedra, virion particles and virus infected mid-gut cells with the raised anti-p137 and anti-p141 antibodies showed specific immunoreactive bands and suggest that S1 and S3 may code for viral structural proteins. Expression of S1 and S3 ORFs in insect cells via baculovirus recombinants showed to produce viral like particles (VLPs by transmission electron microscopy. Immunogold staining showed that S3 encoded proteins self assembled to form viral outer capsid and VLPs maintained their stability at different pH in presence of S1 encoded protein. Conclusion Our results of cloning, sequencing and functional analysis of AmCPV S1 and S3 indicate that S3

  18. Reading the maps: Organization and function of chromatin types in Drosophila

    NARCIS (Netherlands)

    Braunschweig, U.

    2010-01-01

    The work presented in this thesis shows that the Drosophila genome is organized in chromatin domains with many implications for gene regulation, nuclear organization, and evolution. Furthermore it provides examples of how maps of chromatin protein binding, combined with computational approaches, can

  19. CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2009-08-01

    Full Text Available Abstract Background Viruses and small-genome bacteria (~2 megabases and smaller comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

  20. The genomic view of genes responsive to the antagonistic phytohormones, abscisic acid, and gibberellin.

    Science.gov (United States)

    Yazaki, Junshi; Kikuchi, Shoshi

    2005-01-01

    We now have the various genomics tools for monocot (Oryza sativa) and a dicot (Arabidopsis thaliana) plant. Plant is not only a very important agricultural resource but also a model organism for biological research. It is important that the interaction between ABA and GA is investigated for controlling the transition from embryogenesis to germination in seeds using genomics tools. These studies have investigated the relationship between dormancy and germination using genomics tools. Genomics tools identified genes that had never before been annotated as ABA- or GA-responsive genes in plant, detected new interactions between genes responsive to the two hormones, comprehensively characterized cis-elements of hormone-responsive genes, and characterized cis-elements of rice and Arabidopsis. In these research, ABA- and GA-regulated genes have been classified as functional proteins (proteins that probably function in stress or PR tolerance) and regulatory proteins (protein factors involved in further regulation of signal transduction). Comparison between ABA and/or GA-responsive genes in rice and those in Arabidopsis has shown that the cis-element has specificity in each species. cis-Elements for the dehydration-stress response have been specified in Arabidopsis but not in rice. cis-Elements for protein storage are remarkably richer in the upstream regions of the rice gene than in those of Arabidopsis.

  1. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    Science.gov (United States)

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.

  2. Genome organization of the SARS-CoV

    DEFF Research Database (Denmark)

    Xu, Jing; Hu, Jianfei; Wang, Jing

    2003-01-01

    Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or devel......Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available...

  3. The family Rhabdoviridae: Mono- and bipartite negative-sense RNA viruses with diverse genome organization and common evolutionary origins

    Science.gov (United States)

    Dietzgen, Ralf G.; Kondo, Hideki; Goodin, Michael M.; Kurath, Gael; Vasilakis, Nikos

    2017-01-01

    The family Rhabdoviridae consists of mostly enveloped, bullet-shaped or bacilliform viruses with a negative-sense, single-stranded RNA genome that infect vertebrates, invertebrates or plants. This ecological diversity is reflected by the diversity and complexity of their genomes. Five canonical structural protein genes are conserved in all rhabdoviruses, but may be overprinted, overlapped or interspersed with several novel and diverse accessory genes. This review gives an overview of the characteristics and diversity of rhabdoviruses, their taxonomic classification, replication mechanism, properties of classical rhabdoviruses such as rabies virus and rhabdoviruses with complex genomes, rhabdoviruses infecting aquatic species, and plant rhabdoviruses with both mono- and bipartite genomes.

  4. COMe: the ontology of bioinorganic proteins

    Directory of Open Access Journals (Sweden)

    Contrino Sergio

    2004-02-01

    Full Text Available Abstract Background Many characterised proteins contain metal ions, small organic molecules or modified residues. In contrast, the huge amount of data generated by genome projects consists exclusively of sequences with almost no annotation. One of the goals of the structural genomics initiative is to provide representative three-dimensional (3-D structures for as many protein/domain folds as possible to allow successful homology modelling. However, important functional features such as metal co-ordination or a type of prosthetic group are not always conserved in homologous proteins. So far, the problem of correct annotation of bioinorganic proteins has been largely ignored by the bioinformatics community and information on bioinorganic centres obtained by methods other than crystallography or NMR is only available in literature databases. Results COMe (Co-Ordination of Metals represents the ontology for bioinorganic and other small molecule centres in complex proteins. COMe consists of three types of entities: 'bioinorganic motif' (BIM, 'molecule' (MOL, and 'complex proteins' (PRX, with each entity being assigned a unique identifier. A BIM consists of at least one centre (metal atom, inorganic cluster, organic molecule and two or more endogenous and/or exogenous ligands. BIMs are represented as one-dimensional (1-D strings and 2-D diagrams. A MOL entity represents a 'small molecule' which, when in complex with one or more polypeptides, forms a functional protein. The PRX entities refer to the functional proteins as well as to separate protein domains and subunits. The complex proteins in COMe are subdivided into three categories: (i metalloproteins, (ii organic prosthetic group proteins and (iii modified amino acid proteins. The data are currently stored in both XML format and a relational database and are available at http://www.ebi.ac.uk/come/. Conclusion COMe provides the classification of proteins according to their 'bioinorganic' features

  5. Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  6. Comparative genome analysis to identify SNPs associated with high oleic acid and elevated protein content in soybean.

    Science.gov (United States)

    Kulkarni, Krishnanand P; Patil, Gunvant; Valliyodan, Babu; Vuong, Tri D; Shannon, J Grover; Nguyen, Henry T; Lee, Jeong-Dong

    2018-03-01

    The objective of this study was to determine the genetic relationship between the oleic acid and protein content. The genotypes having high oleic acid and elevated protein (HOEP) content were crossed with five elite lines having normal oleic acid and average protein (NOAP) content. The selected accessions were grown at six environments in three different locations and phenotyped for protein, oil, and fatty acid components. The mean protein content of parents, HOEP, and NOAP lines was 34.6%, 38%, and 34.9%, respectively. The oleic acid concentration of parents, HOEP, and NOAP lines was 21.7%, 80.5%, and 20.8%, respectively. The HOEP plants carried both FAD2-1A (S117N) and FAD2-1B (P137R) mutant alleles contributing to the high oleic acid phenotype. Comparative genome analysis using whole-genome resequencing data identified six genes having single nucleotide polymorphism (SNP) significantly associated with the traits analyzed. A single SNP in the putative gene Glyma.10G275800 was associated with the elevated protein content, and palmitic, oleic, and linoleic acids. The genes from the marker intervals of previously identified QTL did not carry SNPs associated with protein content and fatty acid composition in the lines used in this study, indicating that all the genes except Glyma.10G278000 may be the new genes associated with the respective traits.

  7. Genomic Organization and Expression of Iron Metabolism Genes in the Emerging Pathogenic Mold Scedosporium apiospermum

    Directory of Open Access Journals (Sweden)

    Yohann Le Govic

    2018-04-01

    Full Text Available The ubiquitous mold Scedosporium apiospermum is increasingly recognized as an emerging pathogen, especially among patients with underlying disorders such as immunodeficiency or cystic fibrosis (CF. Indeed, it ranks the second among the filamentous fungi colonizing the respiratory tract of CF patients. However, our knowledge about virulence factors of this fungus is still limited. The role of iron-uptake systems may be critical for establishment of Scedosporium infections, notably in the iron-rich environment of the CF lung. Two main strategies are employed by fungi to efficiently acquire iron from their host or from their ecological niche: siderophore production and reductive iron assimilation (RIA systems. The aim of this study was to assess the existence of orthologous genes involved in iron metabolism in the recently sequenced genome of S. apiospermum. At first, a tBLASTn analysis using A. fumigatus iron-related proteins as query revealed orthologs of almost all relevant loci in the S. apiospermum genome. Whereas the genes putatively involved in RIA were randomly distributed, siderophore biosynthesis and transport genes were organized in two clusters, each containing a non-ribosomal peptide synthetase (NRPS whose orthologs in A. fumigatus have been described to catalyze hydroxamate siderophore synthesis. Nevertheless, comparative genomic analysis of siderophore-related clusters showed greater similarity between S. apiospermum and phylogenetically close molds than with Aspergillus species. The expression level of these genes was then evaluated by exposing conidia to iron starvation and iron excess. The expression of several orthologs of A. fumigatus genes involved in siderophore-based iron uptake or RIA was significantly induced during iron starvation, and conversely repressed in iron excess conditions. Altogether, these results indicate that S. apiospermum possesses the genetic information required for efficient and competitive iron uptake

  8. Gene design, cloning and protein-expression methods for high-value targets at the Seattle Structural Genomics Center for Infectious Disease

    International Nuclear Information System (INIS)

    Raymond, Amy; Haffner, Taryn; Ng, Nathan; Lorimer, Don; Staker, Bart; Stewart, Lance

    2011-01-01

    An overview of one salvage strategy for high-value SSGCID targets is given. Any structural genomics endeavor, particularly ambitious ones such as the NIAID-funded Seattle Structural Genomics Center for Infectious Disease (SSGCID) and Center for Structural Genomics of Infectious Disease (CSGID), face technical challenges at all points of the production pipeline. One salvage strategy employed by SSGCID is combined gene engineering and structure-guided construct design to overcome challenges at the levels of protein expression and protein crystallization. Multiple constructs of each target are cloned in parallel using Polymerase Incomplete Primer Extension cloning and small-scale expressions of these are rapidly analyzed by capillary electrophoresis. Using the methods reported here, which have proven particularly useful for high-value targets, otherwise intractable targets can be resolved

  9. Culture independent genomic comparisons reveal environmental adaptations for Altiarchaeales

    Directory of Open Access Journals (Sweden)

    Jordan T Bird

    2016-08-01

    Full Text Available The recently proposed candidatus order Altiarchaeales remains an uncultured archaeal lineage composed of genetically diverse, globally widespread organisms frequently observed in anoxic subsurface environments. In spite of 15 years of studies on the psychrophilic biofilm-producing Candidatus (Ca. Altiarchaeum hamiconexum and its close relatives, very little is known about the phylogenetic and functional diversity of the widespread free-living marine members of this taxon. From methanogenic sediments in the White Oak River Estuary, NC, we sequenced a single cell amplified genome (SAG, WOR_SCG_SM1, and used it to identify and refine two high-quality genomes from metagenomes, WOR_79 and WOR_86-2, from the same site in a different year. These three genomic reconstructions form a monophyletic group which also includes three previously published genomes from metagenomes from terrestrial springs and a SAG from Sakinaw Lake in a group previously designated as pMC2A384. A synapomorphic mutation in the Altiarchaeales tRNA synthetase β subunit, pheT, causes the protein to be encoded as two subunits at distant loci. Consistent with the terrestrial spring clades, our estuarine genomes contain a near-complete autotrophic metabolism, H2 or CO as potential electron donors, a reductive acetyl-CoA pathway for carbon fixation, and methylotroph-like NADP(H-dependent dehydrogenase. Phylogenies based on 16S rRNA genes and concatenated conserved proteins identify two distinct sub-clades of Altiarchaeales, Alti-1 populated by organisms from actively flowing springs, and Alti-2 which is more widespread, diverse, and not associated with visible mats. The core Alti-1 genome supports Alti-1 as adapted for the stream environment, with lipopolysaccharide production capacity, extracellular hami structures. The core Alti-2 genome members of this clade are free-living, with distinct mechanisms for energy maintenance, motility, osmoregulation, and sulfur redox reactions. These

  10. Insulator function and topological domain border strength scale with architectural protein occupancy

    Science.gov (United States)

    2014-01-01

    Background Chromosome conformation capture studies suggest that eukaryotic genomes are organized into structures called topologically associating domains. The borders of these domains are highly enriched for architectural proteins with characterized roles in insulator function. However, a majority of architectural protein binding sites localize within topological domains, suggesting sites associated with domain borders represent a functionally different subclass of these regulatory elements. How topologically associating domains are established and what differentiates border-associated from non-border architectural protein binding sites remain unanswered questions. Results By mapping the genome-wide target sites for several Drosophila architectural proteins, including previously uncharacterized profiles for TFIIIC and SMC-containing condensin complexes, we uncover an extensive pattern of colocalization in which architectural proteins establish dense clusters at the borders of topological domains. Reporter-based enhancer-blocking insulator activity as well as endogenous domain border strength scale with the occupancy level of architectural protein binding sites, suggesting co-binding by architectural proteins underlies the functional potential of these loci. Analyses in mouse and human stem cells suggest that clustering of architectural proteins is a general feature of genome organization, and conserved architectural protein binding sites may underlie the tissue-invariant nature of topologically associating domains observed in mammals. Conclusions We identify a spectrum of architectural protein occupancy that scales with the topological structure of chromosomes and the regulatory potential of these elements. Whereas high occupancy architectural protein binding sites associate with robust partitioning of topologically associating domains and robust insulator function, low occupancy sites appear reserved for gene-specific regulation within topological domains. PMID

  11. The zebrafish reference genome sequence and its relationship to the human genome.

    Science.gov (United States)

    Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L

    2013-04-25

    Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

  12. CTCF Mediates the Cell-Type Specific Spatial Organization of the Kcnq5 Locus and the Local Gene Regulation

    OpenAIRE

    Ren, Licheng; Wang, Yang; Shi, Minglei; Wang, Xiaoning; Yang, Zhong; Zhao, Zhihu

    2012-01-01

    Chromatin loops play important roles in the dynamic spatial organization of genes in the nucleus. Growing evidence has revealed that the multivalent functional zinc finger protein CCCTC-binding factor (CTCF) is a master regulator of genome spatial organization, and mediates the ubiquitous chromatin loops within the genome. Using circular chromosome conformation capture (4C) methodology, we discovered that CTCF may be a master organizer in mediating the spatial organization of the kcnq5 gene l...

  13. Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Mara Sangiovanni

    2013-12-01

    Full Text Available Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community.

  14. Chloroplast Genome of the Folk Medicine and Vegetable Plant Talinum paniculatum (Jacq.) Gaertn.: Gene Organization, Comparative and Phylogenetic Analysis.

    Science.gov (United States)

    Liu, Xia; Li, Yuan; Yang, Hongyuan; Zhou, Boyang

    2018-04-09

    The complete chloroplast (cp) genome of Talinum paniculatum (Caryophyllale), a source of pharmaceutical efficacy similar to ginseng, and a widely distributed and planted edible vegetable, were sequenced and analyzed. The cp genome size of T. paniculatum is 156,929 bp, with a pair of inverted repeats (IRs) of 25,751 bp separated by a large single copy (LSC) region of 86,898 bp and a small single copy (SSC) region of 18,529 bp. The genome contains 83 protein-coding genes, 37 transfer RNA (tRNA) genes, eight ribosomal RNA (rRNA) genes and four pseudogenes. Fifty one (51) repeat units and ninety two (92) simple sequence repeats (SSRs) were found in the genome. The pseudogene rpl23 (Ribosomal protein L23) was insert AATT than other Caryophyllale species by sequence alignment, which located in IRs region. The gene of trnK-UUU (tRNA-Lys) and rpl16 (Ribosomal protein L16) have larger introns in T. paniculatum , and the existence of matK (maturase K) genes, which usually located in the introns of trnK-UUU , rich sequence divergence in Caryophyllale. Complete cp genome comparison with other eight Caryophyllales species indicated that the differences between T. paniculatum and P. oleracea were very slight, and the most highly divergent regions occurred in intergenic spacers. Comparisons of IR boundaries among nine Caryophyllales species showed that T. paniculatum have larger IRs region and the contraction is relatively slight. The phylogenetic analysis among 35 Caryophyllales species and two outgroup species revealed that T. paniculatum and P. oleracea do not belong to the same family. All these results give good opportunities for future identification, barcoding of Talinum species, understanding the evolutionary mode of Caryophyllale cp genome and molecular breeding of T. paniculatum with high pharmaceutical efficacy.

  15. Genome packaging in viruses

    OpenAIRE

    Sun, Siyang; Rao, Venigalla B.; Rossmann, Michael G.

    2010-01-01

    Genome packaging is a fundamental process in a viral life cycle. Many viruses assemble preformed capsids into which the genomic material is subsequently packaged. These viruses use a packaging motor protein that is driven by the hydrolysis of ATP to condense the nucleic acids into a confined space. How these motor proteins package viral genomes had been poorly understood until recently, when a few X-ray crystal structures and cryo-electron microscopy structures became available. Here we discu...

  16. Comparative Genomics Identifies Epidermal Proteins Associated with the Evolution of the Turtle Shell.

    Science.gov (United States)

    Holthaus, Karin Brigit; Strasser, Bettina; Sipos, Wolfgang; Schmidt, Heiko A; Mlitz, Veronika; Sukseree, Supawadee; Weissenbacher, Anton; Tschachler, Erwin; Alibardi, Lorenzo; Eckhart, Leopold

    2016-03-01

    The evolution of reptiles, birds, and mammals was associated with the origin of unique integumentary structures. Studies on lizards, chicken, and humans have suggested that the evolution of major structural proteins of the outermost, cornified layers of the epidermis was driven by the diversification of a gene cluster called Epidermal Differentiation Complex (EDC). Turtles have evolved unique defense mechanisms that depend on mechanically resilient modifications of the epidermis. To investigate whether the evolution of the integument in these reptiles was associated with specific adaptations of the sequences and expression patterns of EDC-related genes, we utilized newly available genome sequences to determine the epidermal differentiation gene complement of turtles. The EDC of the western painted turtle (Chrysemys picta bellii) comprises more than 100 genes, including at least 48 genes that encode proteins referred to as beta-keratins or corneous beta-proteins. Several EDC proteins have evolved cysteine/proline contents beyond 50% of total amino acid residues. Comparative genomics suggests that distinct subfamilies of EDC genes have been expanded and partly translocated to loci outside of the EDC in turtles. Gene expression analysis in the European pond turtle (Emys orbicularis) showed that EDC genes are differentially expressed in the skin of the various body sites and that a subset of beta-keratin genes within the EDC as well as those located outside of the EDC are expressed predominantly in the shell. Our findings give strong support to the hypothesis that the evolutionary innovation of the turtle shell involved specific molecular adaptations of epidermal differentiation. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. The family Rhabdoviridae: mono- and bipartite negative-sense RNA viruses with diverse genome organization and common evolutionary origins.

    Science.gov (United States)

    Dietzgen, Ralf G; Kondo, Hideki; Goodin, Michael M; Kurath, Gael; Vasilakis, Nikos

    2017-01-02

    The family Rhabdoviridae consists of mostly enveloped, bullet-shaped or bacilliform viruses with a negative-sense, single-stranded RNA genome that infect vertebrates, invertebrates or plants. This ecological diversity is reflected by the diversity and complexity of their genomes. Five canonical structural protein genes are conserved in all rhabdoviruses, but may be overprinted, overlapped or interspersed with several novel and diverse accessory genes. This review gives an overview of the characteristics and diversity of rhabdoviruses, their taxonomic classification, replication mechanism, properties of classical rhabdoviruses such as rabies virus and rhabdoviruses with complex genomes, rhabdoviruses infecting aquatic species, and plant rhabdoviruses with both mono- and bipartite genomes. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production.

    Science.gov (United States)

    Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata

    2013-12-20

    Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.

  19. Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank.

    Science.gov (United States)

    Prlic, Andreas; Kalro, Tara; Bhattacharya, Roshni; Christie, Cole; Burley, Stephen K; Rose, Peter W

    2016-12-15

    The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. andreas.prlic@rcsb.orgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  20. Human-specific protein isoforms produced by novel splice sites in the human genome after the human-chimpanzee divergence

    Directory of Open Access Journals (Sweden)

    Kim Dong Seon

    2012-11-01

    Full Text Available Abstract Background Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. Results We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Conclusions Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution.

  1. Chitinase family GH18: evolutionary insights from the genomic history of a diverse protein family

    Directory of Open Access Journals (Sweden)

    Aronson Nathan N

    2007-06-01

    Full Text Available Abstract Background Chitinases (EC.3.2.1.14 hydrolyze the β-1,4-linkages in chitin, an abundant N-acetyl-β-D-glucosamine polysaccharide that is a structural component of protective biological matrices such as insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 (GH18 family of chitinases is an ancient gene family widely expressed in archea, prokaryotes and eukaryotes. Mammals are not known to synthesize chitin or metabolize it as a nutrient, yet the human genome encodes eight GH18 family members. Some GH18 proteins lack an essential catalytic glutamic acid and are likely to act as lectins rather than as enzymes. This study used comparative genomic analysis to address the evolutionary history of the GH18 multiprotein family, from early eukaryotes to mammals, in an effort to understand the forces that shaped the human genome content of chitinase related proteins. Results Gene duplication and loss according to a birth-and-death model of evolution is a feature of the evolutionary history of the GH18 family. The current human family likely originated from ancient genes present at the time of the bilaterian expansion (approx. 550 mya. The family expanded in the chitinous protostomes C. elegans and D. melanogaster, declined in early deuterostomes as chitin synthesis disappeared, and expanded again in late deuterostomes with a significant increase in gene number after the avian/mammalian split. Conclusion This comprehensive genomic study of animal GH18 proteins reveals three major phylogenetic groups in the family: chitobiases, chitinases/chitolectins, and stabilin-1 interacting chitolectins. Only the chitinase/chitolectin group is associated with expansion in late deuterostomes. Finding that the human GH18 gene family is closely linked to the human major histocompatibility complex paralogon on chromosome 1, together with the recent association of GH18 chitinase activity with Th2 cell inflammation, suggests that its late expansion

  2. Growth-Phase-Specific Modulation of Cell Morphology and Gene Expression by an Archaeal Histone Protein.

    Science.gov (United States)

    Dulmage, Keely A; Todor, Horia; Schmid, Amy K

    2015-09-08

    In all three domains of life, organisms use nonspecific DNA-binding proteins to compact and organize the genome as well as to regulate transcription on a global scale. Histone is the primary eukaryotic nucleoprotein, and its evolutionary roots can be traced to the archaea. However, not all archaea use this protein as the primary DNA-packaging component, raising questions regarding the role of histones in archaeal chromatin function. Here, quantitative phenotyping, transcriptomic, and proteomic assays were performed on deletion and overexpression mutants of the sole histone protein of the hypersaline-adapted haloarchaeal model organism Halobacterium salinarum. This protein is highly conserved among all sequenced haloarchaeal species and maintains hallmark residues required for eukaryotic histone functions. Surprisingly, despite this conservation at the sequence level, unlike in other archaea or eukaryotes, H. salinarum histone is required to regulate cell shape but is not necessary for survival. Genome-wide expression changes in histone deletion strains were global, significant but subtle in terms of fold change, bidirectional, and growth phase dependent. Mass spectrometric proteomic identification of proteins from chromatin enrichments yielded levels of histone and putative nucleoid-associated proteins similar to those of transcription factors, consistent with an open and transcriptionally active genome. Taken together, these data suggest that histone in H. salinarum plays a minor role in DNA compaction but important roles in growth-phase-dependent gene expression and regulation of cell shape. Histone function in haloarchaea more closely resembles a regulator of gene expression than a chromatin-organizing protein like canonical eukaryotic histone. Histones comprise the major protein component of eukaryotic chromatin and are required for both genome packaging and global regulation of expression. The current paradigm maintains that archaea whose genes encode

  3. Complete genome sequence of Marivirga tractuosa type strain (H-43).

    Science.gov (United States)

    Pagani, Ioanna; Chertkov, Olga; Lapidus, Alla; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Nolan, Matt; Saunders, Elizabeth; Pitluck, Sam; Held, Brittany; Goodwin, Lynne; Liolios, Konstantinos; Ovchinikova, Galina; Ivanova, Natalia; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Jeffries, Cynthia D; Detter, John C; Han, Cliff; Tapia, Roxanne; Ngatchou-Djao, Olivier D; Rohde, Manfred; Göker, Markus; Spring, Stefan; Sikorski, Johannes; Woyke, Tanja; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-04-29

    Marivirga tractuosa (Lewin 1969) Nedashkovskaya et al. 2010 is the type species of the genus Marivirga, which belongs to the family Flammeovirgaceae. Members of this genus are of interest because of their gliding motility. The species is of interest because representative strains show resistance to several antibiotics, including gentamicin, kanamycin, neomycin, polymixin and streptomycin. This is the first complete genome sequence of a member of the family Flammeovirgaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,511,574 bp long chromosome and the 4,916 bp plasmid with their 3,808 protein-coding and 49 RNA genes are a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Refining low protein modular feeds for children on low protein tube feeds with organic acidaemias.

    Science.gov (United States)

    Daly, A; Evans, S; Ashmore, C; Chahal, S; Santra, S; MacDonald, A

    2017-12-01

    Children with inherited metabolic disorders (IMD) who are dependent on tube feeding and require a protein restriction are commonly fed by 'modular tube feeds' consisting of several ingredients. A longitudinal, prospective two-phase study, conducted over 18 months assessed the long-term efficacy of a pre-measured protein-free composite feed. This was specifically designed to meet the non-protein nutritional requirements of children (aged over 1 year) with organic acidaemias on low protein enteral feeds and to be used as a supplement with an enteral feeding protein source. All non-protein individual feed ingredients were replaced with one protein-free composite feed supplying fat, carbohydrate, and micronutrients. Thirteen subjects, median age 7.4y (3-15.5y), all nutritionally tube dependent (supplying nutritional intake: ≥ 90%, n = 12; 75%, n = 1), and diagnosed with organic acidaemias (Propionic acidaemia, n = 6; Vitamin B 12 non-responsive methyl malonic acidaemia, n = 4; Isovaleric acidaemia, n = 2; Glutaric aciduria type1, n = 1); were studied. Nutritional intake, biochemistry and anthropometry were monitored at week - 8, 0, 12, 26 and 79. Energy intake remained unchanged, providing 76% of estimated energy requirements. Dietary intakes of vitamins, minerals and essential fatty acids significantly increased from week 0 to week 79, but sodium, potassium, magnesium, decosahexanoic acid and fibre did not meet suggested requirements. Plasma zinc, selenium, haemoglobin and MCV significantly improved, and growth remained satisfactory. Natural protein intake met WHO/FAO/UNU 2007 recommendations. A protein-free composite feed formulated to meet the non-protein nutritional requirements of children aged over 1 year improved nutritional intake, biochemical nutritional status, and simplified enteral tube feeding regimens in children with organic acidaemias.

  5. The genomes and comparative genomics of Lactobacillus delbrueckii phages.

    Science.gov (United States)

    Riipinen, Katja-Anneli; Forsman, Päivi; Alatossava, Tapani

    2011-07-01

    Lactobacillus delbrueckii phages are a great source of genetic diversity. Here, the genome sequences of Lb. delbrueckii phages LL-Ku, c5 and JCL1032 were analyzed in detail, and the genetic diversity of Lb. delbrueckii phages belonging to different taxonomic groups was explored. The lytic isometric group b phages LL-Ku (31,080 bp) and c5 (31,841 bp) showed a minimum nucleotide sequence identity of 90% over about three-fourths of their genomes. The genomic locations of their lysis modules were unique, and the genomes featured several putative overlapping transcription units of genes. LL-Ku and c5 virions displayed peptidoglycan hydrolytic activity associated with a ~36-kDa protein similar in size to the endolysin. Unexpectedly, the 49,433-bp genome of the prolate phage JCL1032 (temperate, group c) revealed a conserved gene order within its structural genes. Lb. delbrueckii phages representing groups a (a phage LL-H), b and c possessed only limited protein sequence homology. Genomic comparison of LL-Ku and c5 suggested that diversification of Lb. delbrueckii phages is mainly due to insertions, deletions and recombination. For the first time, the complete genome sequences of group b and c Lb. delbrueckii phages are reported.

  6. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

    Directory of Open Access Journals (Sweden)

    Jonathan A Eisen

    2006-09-01

    Full Text Available The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC, which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases, using diverse resources (e.g., proteases and transporters, and generating structural complexity (e.g., kinesins and dyneins. In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates, no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from

  7. Genomic Organization of Zebrafish microRNAs

    Directory of Open Access Journals (Sweden)

    Paydar Ima

    2008-05-01

    Full Text Available Abstract Background microRNAs (miRNAs are small (~22 nt non-coding RNAs that regulate cell movement, specification, and development. Expression of miRNAs is highly regulated, both spatially and temporally. Based on direct cloning, sequence conservation, and predicted secondary structures, a large number of miRNAs have been identified in higher eukaryotic genomes but whether these RNAs are simply a subset of a much larger number of noncoding RNA families is unknown. This is especially true in zebrafish where genome sequencing and annotation is not yet complete. Results We analyzed the zebrafish genome to identify the number and location of proven and predicted miRNAs resulting in the identification of 35 new miRNAs. We then grouped all 415 zebrafish miRNAs into families based on seed sequence identity as a means to identify possible functional redundancy. Based on genomic location and expression analysis, we also identified those miRNAs that are likely to be encoded as part of polycistronic transcripts. Lastly, as a resource, we compiled existing zebrafish miRNA expression data and, where possible, listed all experimentally proven mRNA targets. Conclusion Current analysis indicates the zebrafish genome encodes 415 miRNAs which can be grouped into 44 families. The largest of these families (the miR-430 family contains 72 members largely clustered in two main locations along chromosome 4. Thus far, most zebrafish miRNAs exhibit tissue specific patterns of expression.

  8. Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model.

    Science.gov (United States)

    Gürsoy, Gamze; Xu, Yun; Liang, Jie

    2017-07-01

    Nuclear landmarks and biochemical factors play important roles in the organization of the yeast genome. The interaction pattern of budding yeast as measured from genome-wide 3C studies are largely recapitulated by model polymer genomes subject to landmark constraints. However, the origin of inter-chromosomal interactions, specific roles of individual landmarks, and the roles of biochemical factors in yeast genome organization remain unclear. Here we describe a multi-chromosome constrained self-avoiding chromatin model (mC-SAC) to gain understanding of the budding yeast genome organization. With significantly improved sampling of genome structures, both intra- and inter-chromosomal interaction patterns from genome-wide 3C studies are accurately captured in our model at higher resolution than previous studies. We show that nuclear confinement is a key determinant of the intra-chromosomal interactions, and centromere tethering is responsible for the inter-chromosomal interactions. In addition, important genomic elements such as fragile sites and tRNA genes are found to be clustered spatially, largely due to centromere tethering. We uncovered previously unknown interactions that were not captured by genome-wide 3C studies, which are found to be enriched with tRNA genes, RNAPIII and TFIIS binding. Moreover, we identified specific high-frequency genome-wide 3C interactions that are unaccounted for by polymer effects under landmark constraints. These interactions are enriched with important genes and likely play biological roles.

  9. Comparative genome analysis of Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  10. Plant STAND P-loop NTPases: a current perspective of genome distribution, evolution, and function : Plant STAND P-loop NTPases: genomic organization, evolution, and molecular mechanism models contribute broadly to plant pathogen defense.

    Science.gov (United States)

    Arya, Preeti; Acharya, Vishal

    2018-02-01

    STAND P-loop NTPase is the common weapon used by plant and other organisms from all three kingdoms of life to defend themselves against pathogen invasion. The purpose of this study is to review comprehensively the latest finding of plant STAND P-loop NTPase related to their genomic distribution, evolution, and their mechanism of action. Earlier, the plant STAND P-loop NTPase known to be comprised of only NBS-LRRs/AP-ATPase/NB-ARC ATPase. However, recent finding suggests that genome of early green plants comprised of two types of STAND P-loop NTPases: (1) mammalian NACHT NTPases and (2) NBS-LRRs. Moreover, YchF (unconventional G protein and members of P-loop NTPase) subfamily has been reported to be exceptionally involved in biotic stress (in case of Oryza sativa), thereby a novel member of STAND P-loop NTPase in green plants. The lineage-specific expansion and genome duplication events are responsible for abundance of plant STAND P-loop NTPases; where "moderate tandem and low segmental duplication" trajectory followed in majority of plant species with few exception (equal contribution of tandem and segmental duplication). Since the past decades, systematic research is being investigated into NBS-LRR function supported the direct recognition of pathogen or pathogen effectors by the latest models proposed via 'integrated decoy' or 'sensor domains' model. Here, we integrate the recently published findings together with the previous literature on the genomic distribution, evolution, and distinct models proposed for functional molecular mechanism of plant STAND P-loop NTPases.

  11. Genome-wide evolutionary characterization and expression analyses of major latex protein (MLP) family genes in Vitis vinifera.

    Science.gov (United States)

    Zhang, Ningbo; Li, Ruimin; Shen, Wei; Jiao, Shuzhen; Zhang, Junxiang; Xu, Weirong

    2018-04-27

    The major latex protein/ripening-related protein (MLP/RRP) subfamily is known to be involved in a wide range of biological processes of plant development and various stress responses. However, the biological function of MLP/RRP proteins is still far from being clear and identification of them may provide important clues for understanding their roles. Here, we report a genome-wide evolutionary characterization and gene expression analysis of the MLP family in European Vitis species. A total of 14 members, was found in the grape genome, all of which are located on chromosome 1, where are predominantly arranged in tandem clusters. We have noticed, most surprisingly, promoter-sharing by several non-identical but highly similar gene members to a greater extent than expected by chance. Synteny analysis between the grape and Arabidopsis thaliana genomes suggested that 3 grape MLP genes arose before the divergence of the two species. Phylogenetic analysis provided further insights into the evolutionary relationship between the genes, as well as their putative functions, and tissue-specific expression analysis suggested distinct biological roles for different members. Our expression data suggested a couple of candidate genes involved in abiotic stresses and phytohormone responses. The present work provides new insight into the evolution and regulation of Vitis MLP genes, which represent targets for future studies and inclusion in tolerance-related molecular breeding programs.

  12. The nucleoid protein Dps binds genomic DNA of Escherichia coli in a non-random manner

    Science.gov (United States)

    Kondrashov, F. A.; Toshchakov, S. V.; Dominova, I.; Shvyreva, U. S.; Vrublevskaya, V. V.; Morenkov, O. S.; Panyukov, V. V.

    2017-01-01

    Dps is a multifunctional homododecameric protein that oxidizes Fe2+ ions accumulating them in the form of Fe2O3 within its protein cavity, interacts with DNA tightly condensing bacterial nucleoid upon starvation and performs some other functions. During the last two decades from discovery of this protein, its ferroxidase activity became rather well studied, but the mechanism of Dps interaction with DNA still remains enigmatic. The crucial role of lysine residues in the unstructured N-terminal tails led to the conventional point of view that Dps binds DNA without sequence or structural specificity. However, deletion of dps changed the profile of proteins in starved cells, SELEX screen revealed genomic regions preferentially bound in vitro and certain affinity of Dps for artificial branched molecules was detected by atomic force microscopy. Here we report a non-random distribution of Dps binding sites across the bacterial chromosome in exponentially growing cells and show their enrichment with inverted repeats prone to form secondary structures. We found that the Dps-bound regions overlap with sites occupied by other nucleoid proteins, and contain overrepresented motifs typical for their consensus sequences. Of the two types of genomic domains with extensive protein occupancy, which can be highly expressed or transcriptionally silent only those that are enriched with RNA polymerase molecules were preferentially occupied by Dps. In the dps-null mutant we, therefore, observed a differentially altered expression of several targeted genes and found suppressed transcription from the dps promoter. In most cases this can be explained by the relieved interference with Dps for nucleoid proteins exploiting sequence-specific modes of DNA binding. Thus, protecting bacterial cells from different stresses during exponential growth, Dps can modulate transcriptional integrity of the bacterial chromosome hampering RNA biosynthesis from some genes via competition with RNA polymerase

  13. Genomics and physiology of a marine flavobacterium encoding a proteorhodopsin and a xanthorhodopsin-like protein.

    Directory of Open Access Journals (Sweden)

    Thomas Riedel

    Full Text Available Proteorhodopsin (PR photoheterotrophy in the marine flavobacterium Dokdonia sp. PRO95 has previously been investigated, showing no growth stimulation in the light at intermediate carbon concentrations. Here we report the genome sequence of strain PRO95 and compare it to two other PR encoding Dokdonia genomes: that of strain 4H-3-7-5 which shows the most similar genome, and that of strain MED134 which grows better in the light under oligotrophic conditions. Our genome analysis revealed that the PRO95 genome as well as the 4H-3-7-5 genome encode a protein related to xanthorhodopsins. The genomic environment and phylogenetic distribution of this gene suggest that it may have frequently been recruited by lateral gene transfer. Expression analyses by RT-PCR and direct mRNA-sequencing showed that both rhodopsins and the complete β-carotene pathway necessary for retinal production are transcribed in PRO95. Proton translocation measurements showed enhanced proton pump activity in response to light, supporting that one or both rhodopsins are functional. Genomic information and carbon source respiration data were used to develop a defined cultivation medium for PRO95, but reproducible growth always required small amounts of yeast extract. Although PRO95 contains and expresses two rhodopsin genes, light did not stimulate its growth as determined by cell numbers in a nutrient poor seawater medium that mimics its natural environment, confirming previous experiments at intermediate carbon concentrations. Starvation or stress conditions might be needed to observe the physiological effect of light induced energy acquisition.

  14. Molecular analysis and genomic organization of major DNA satellites in banana (Musa spp.).

    Science.gov (United States)

    Čížková, Jana; Hřibová, Eva; Humplíková, Lenka; Christelová, Pavla; Suchánková, Pavla; Doležel, Jaroslav

    2013-01-01

    Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa.

  15. CrisprGE: a central hub of CRISPR/Cas-based genome editing.

    Science.gov (United States)

    Kaur, Karambir; Tandon, Himani; Gupta, Amit Kumar; Kumar, Manoj

    2015-01-01

    CRISPR system is a powerful defense mechanism in bacteria and archaea to provide immunity against viruses. Recently, this process found a new application in intended targeting of the genomes. CRISPR-mediated genome editing is performed by two main components namely single guide RNA and Cas9 protein. Despite the enormous data generated in this area, there is a dearth of high throughput resource. Therefore, we have developed CrisprGE, a central hub of CRISPR/Cas-based genome editing. Presently, this database holds a total of 4680 entries of 223 unique genes from 32 model and other organisms. It encompasses information about the organism, gene, target gene sequences, genetic modification, modifications length, genome editing efficiency, cell line, assay, etc. This depository is developed using the open source LAMP (Linux Apache MYSQL PHP) server. User-friendly browsing, searching facility is integrated for easy data retrieval. It also includes useful tools like BLAST CrisprGE, BLAST NTdb and CRISPR Mapper. Considering potential utilities of CRISPR in the vast area of biology and therapeutics, we foresee this platform as an assistance to accelerate research in the burgeoning field of genome engineering. © The Author(s) 2015. Published by Oxford University Press.

  16. Complete genome sequence of Capnocytophaga ochracea type strain (VPI 2845T)

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Land, Miriam L [ORNL; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Bristow, James [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute

    2009-01-01

    Capnocytophaga ochracea (Pr vot et al. 1956) Leadbetter et al. 1982 is the type species of the genus Capnocytophaga. It is of interest because of its location in the Flavobacteriaceae, a genomically not yet charted family within the order Flavobacteriales. The species grows as fusiform to rod shaped cells which tend to form clumps and are able to move by gliding. C. ochracea is known as a capnophilic (CO2-requiring) organism with the ability to grow under anaerobic as well as aerobic conditions (oxygen concentration larger than 15%), here only in the presence of 5% CO2. Strain VPI 2845T, the type strain of the species, is portrayed in this report as a gliding, Gram-negative bacterium, originally isolated from a human oral cavity. Here we describe the features of this organism, together with the complete genome se-quence, and annotation. This is the first completed genome sequence from the flavobacterial genus Capnocytophaga, and the 2,612,925 bp long single replicon genome with its 2193 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Genome sequence of a cluster A13 mycobacteriophage detected in Mycobacterium phlei over a half century ago.

    Science.gov (United States)

    Marton, Szilvia; Fehér, Enikő; Horváth, Balázs; Háber, Katalin; Somogyi, Pál; Minárovits, János; Bányai, Krisztián

    2016-01-01

    A phage infecting Mycobacterium phlei was isolated in 1958 from a soil sample in Hungary. Some physicochemical and biological properties of the virus were described in independent studies over the years. Here, we report the genome sequence of this early mycobacteriophage isolate. The Phlei phage genome measured 50,418 bp, had a GC content of 60.1 % and was predicted to encode 81 proteins and three tRNAs. Phylogeny of the tape measure protein revealed genetic relatedness to other early isolates of mycobacteriophages within subcluster A2. The genomic organization and genetic relationships to other strains showed that the Phlei phage belongs to a novel genetic cluster, designated A13.

  18. Mutations in Encephalomyocarditis Virus 3A Protein Uncouple the Dependency of Genome Replication on Host Factors Phosphatidylinositol 4-Kinase IIIα and Oxysterol-Binding Protein

    NARCIS (Netherlands)

    Dorobantu, Cristina M|info:eu-repo/dai/nl/372622283; Albulescu, Lucian|info:eu-repo/dai/nl/369492382; Lyoo, Heyrhyoung|info:eu-repo/dai/nl/412352931; van Kampen, Mirjam; De Francesco, Raffaele; Lohmann, Volker; Harak, Christian; van der Schaar, Hilde M|info:eu-repo/dai/nl/318007568; Strating, Jeroen R P M|info:eu-repo/dai/nl/298979594; Gorbalenya, Alexander E; van Kuppeveld, Frank J M|info:eu-repo/dai/nl/156614723

    2016-01-01

    Positive-strand RNA [(+)RNA] viruses are true masters of reprogramming host lipid trafficking and synthesis to support virus genome replication. Via their membrane-associated 3A protein, picornaviruses of the genus Enterovirus (e.g., poliovirus, coxsackievirus, and rhinovirus) subvert Golgi

  19. Application of CRISPR/Cas9 Genome Editing to Improve Recombinant Protein Production in CHO Cells

    DEFF Research Database (Denmark)

    Grav, Lise Marie; Julie la Cour Karottki, Karen; Lee, Jae Seong

    2017-01-01

    and yields. In this chapter, we present our protocol on how to use the genome editing tool Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) to knockout engineering target genes in CHO cells. As an example, we refer to the glutamine synthetase (GS...

  20. The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle

    Energy Technology Data Exchange (ETDEWEB)

    Nelson, William C.; Stegen, James C.

    2015-07-21

    Candidate phylum OD1 bacteria (also referred to as Parcubacteria) have been identified in broad range of anoxic environments through community survey analysis. Although none of these species have been isolated in the laboratory, several genome sequences have been reconstructed from metagenomic sequence data and single-cell sequencing. The organisms have small (generally <1 Mb) genomes with severely reduced metabolic capabilities. We have reconstructed 8 partial to near-complete OD1 genomes from oxic groundwater samples, and compared them against existing genomic data. The conserved core gene set comprises 202 genes, or ~28% of the genomic complement. ‘Housekeeping’ genes and genes for biosynthesis of peptidoglycan and Type IV pilus production are conserved. Gene sets for biosynthesis of cofactors, amino acids, nucleotides and fatty acids are absent entirely or greatly reduced. The only aspects of energy metabolism conserved are the non-oxidative branch of the pentose-phosphate shunt and central glycolysis. These organisms also lack some activities conserved in almost all other known bacterial genomes, including signal recognition particle, pseudouridine synthase A, and FAD synthase. Pan-genome analysis indicates a broad genotypic diversity and perhaps a highly fluid gene complement, indicating historical adaptation to a wide range of growth environments and a high degree of specialization. The genomes were examined for signatures suggesting either a free-living, streamlined lifestyle or a symbiotic lifestyle. The lack of biosynthetic capabilities and DNA repair, along with the presence of potential attachment and adhesion proteins suggest the Parcubacteria are ectosymbionts or parasites of other organisms. The wide diversity of genes that potentially mediate cell-cell contact suggests a broad range of partner/prey organisms across the phylum.

  1. TAD-free analysis of architectural proteins and insulators.

    Science.gov (United States)

    Mourad, Raphaël; Cuvier, Olivier

    2018-03-16

    The three-dimensional (3D) organization of the genome is intimately related to numerous key biological functions including gene expression and DNA replication regulations. The mechanisms by which molecular drivers functionally organize the 3D genome, such as topologically associating domains (TADs), remain to be explored. Current approaches consist in assessing the enrichments or influences of proteins at TAD borders. Here, we propose a TAD-free model to directly estimate the blocking effects of architectural proteins, insulators and DNA motifs on long-range contacts, making the model intuitive and biologically meaningful. In addition, the model allows analyzing the whole Hi-C information content (2D information) instead of only focusing on TAD borders (1D information). The model outperforms multiple logistic regression at TAD borders in terms of parameter estimation accuracy and is validated by enhancer-blocking assays. In Drosophila, the results support the insulating role of simple sequence repeats and suggest that the blocking effects depend on the number of repeats. Motif analysis uncovered the roles of the transcriptional factors pannier and tramtrack in blocking long-range contacts. In human, the results suggest that the blocking effects of the well-known architectural proteins CTCF, cohesin and ZNF143 depend on the distance between loci, where each protein may participate at different scales of the 3D chromatin organization.

  2. Information assessment on predicting protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Gerstein Mark

    2004-10-01

    Full Text Available Abstract Background Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information. Results Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions. Conclusions In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the

  3. A framework for classification of prokaryotic protein kinases.

    Directory of Open Access Journals (Sweden)

    Nidhi Tyagi

    Full Text Available BACKGROUND: Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. METHODOLOGY/PRINCIPAL FINDINGS: We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. CONCLUSION/SIGNIFICANCE: Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular

  4. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  5. GPI-anchored protein organization and dynamics at the cell surface.

    Science.gov (United States)

    Saha, Suvrajit; Anilkumar, Anupama Ambika; Mayor, Satyajit

    2016-02-01

    The surface of eukaryotic cells is a multi-component fluid bilayer in which glycosylphosphatidylinositol (GPI)-anchored proteins are an abundant constituent. In this review, we discuss the complex nature of the organization and dynamics of GPI-anchored proteins at multiple spatial and temporal scales. Different biophysical techniques have been utilized for understanding this organization, including fluorescence correlation spectroscopy, fluorescence recovery after photobleaching, single particle tracking, and a number of super resolution methods. Major insights into the organization and dynamics have also come from exploring the short-range interactions of GPI-anchored proteins by fluorescence (or Förster) resonance energy transfer microscopy. Based on the nanometer to micron scale organization, at the microsecond to the second time scale dynamics, a picture of the membrane bilayer emerges where the lipid bilayer appears inextricably intertwined with the underlying dynamic cytoskeleton. These observations have prompted a revision of the current models of plasma membrane organization, and suggest an active actin-membrane composite. Copyright © 2016 by the American Society for Biochemistry and Molecular Biology, Inc.

  6. Prediction of arsenic and antimony transporter major intrinsic proteins from the genomes of crop plants.

    Science.gov (United States)

    Azad, Abul Kalam; Ahmed, Jahed; Alum, Md Asraful; Hasan, Md Mahbub; Ishikawa, Takahiro; Sawa, Yoshihiro

    2018-02-01

    Major intrinsic proteins (MIPs), commonly known as aquaporins, transport water and non-polar small solutes. Comparing the 3D models and the primary selectivity-related motifs (two Asn-Pro-Ala (NPA) regions, the aromatic/arginine (ar/R) selectivity filter, and Froger's positions (FPs)) of all plant MIPs that have been experimentally proven to transport arsenic (As) and antimony (Sb), some substrate-specific signature sequences (SSSS) or specificity determining sites (SDPs) have been predicted. These SSSS or SDPs were determined in 543 MIPs found in the genomes of 12 crop plants; the As and Sb transporters were predicted to be distributed in noduline-26 like intrinsic proteins (NIPs), and every plant had one or several As and Sb transporter NIPs. Phylogenetic grouping of the NIP subfamily based on the ar/R selectivity filter and FPs were linked to As and Sb transport. We further determined the group-wise substrate selectivity profiles of the NIPs in the 12 crop plants. In addition to two NPA regions, the ar/R filter, and FPs, certain amino acids especially in the pore line, loop D, and termini contribute to the functional distinctiveness of the NIP groups. Expression analysis of transcripts in different organs indicated that most of the As and Sb transporter NIPs were expressed in roots. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Genomic and proteomic analyses of Prdm5 reveal interactions with insulator binding proteins in embryonic stem cells

    DEFF Research Database (Denmark)

    Galli, Giorgio Giacomo; Carrara, Matteo; Francavilla, Chiara

    2013-01-01

    PRDM proteins belong to the SET- domain protein family involved in the regulation of gene expression. Although few PRDM members possess histone methyltransferase activity, the molecular mechanisms by which the other members exert transcriptional regulation remain to be delineated. In this study, we...... find that Prdm5 is highly expressed in mouse embryonic stem cells (mES) and exploit this cellular system to characterize molecular functions of Prdm5. By combining proteomics and next generation sequencing technologies we identify Prdm5 interaction partners and genomic occupancy. We demonstrate that......, despite Prdm5 is dispensable for mES cell maintenance, it directly targets genomic regions involved in early embryonic development and affects the expression of a subset of developmental regulators during cell differentiation. Importantly, Prdm5 interacts with Ctcf, Cohesin and TFIIIC and co...

  8. The family Rhabdoviridae: mono- and bipartite negative-sense RNA viruses with diverse genome organization and common evolutionary origins

    OpenAIRE

    Dietzgen, Ralf G.; Kondo, Hideki; Goodin, Michael M.; Kurath, Gael; Vasilakis, Nikos

    2016-01-01

    The family Rhabdoviridae consists of mostly enveloped, bullet-shaped or bacilliform viruses with a negative-sense, single-stranded RNA genome that infect vertebrates, invertebrates or plants. This ecological diversity is reflected by the diversity and complexity of their genomes. Five canonical structural protein genes are conserved in all rhabdoviruses, but may be overprinted, overlapped or interspersed with several novel and diverse accessory genes. This review gives an overview of the char...

  9. Identification of novel type 1 diabetes candidate genes by integrating genome-wide association data, protein-protein interactions, and human pancreatic islet gene expression

    DEFF Research Database (Denmark)

    Bergholdt, Regine; Brorsson, Caroline; Palleja, Albert

    2012-01-01

    Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated with dis......-cells. Our results provide novel insight to the mechanisms behind type 1 diabetes pathogenesis and, thus, may provide the basis for the design of novel treatment strategies.......Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated...... with disease, and they do not typically inform the broader context in which the disease genes operate. Here, we integrated type 1 diabetes GWAS data with protein-protein interactions to construct biological networks of relevance for disease. A total of 17 networks were identified. To prioritize...

  10. Genomics of an extreme psychrophile, Psychromonas ingrahamii

    Directory of Open Access Journals (Sweden)

    Hauser Loren J

    2008-05-01

    Full Text Available Abstract Background The genome sequence of the sea-ice bacterium Psychromonas ingrahamii 37, which grows exponentially at -12C, may reveal features that help to explain how this extreme psychrophile is able to grow at such low temperatures. Determination of the whole genome sequence allows comparison with genes of other psychrophiles and mesophiles. Results Correspondence analysis of the composition of all P. ingrahamii proteins showed that (1 there are 6 classes of proteins, at least one more than other bacteria, (2 integral inner membrane proteins are not sharply separated from bulk proteins suggesting that, overall, they may have a lower hydrophobic character, and (3 there is strong opposition between asparagine and the oxygen-sensitive amino acids methionine, arginine, cysteine and histidine and (4 one of the previously unseen clusters of proteins has a high proportion of "orphan" hypothetical proteins, raising the possibility these are cold-specific proteins. Based on annotation of proteins by sequence similarity, (1 P. ingrahamii has a large number (61 of regulators of cyclic GDP, suggesting tha