WorldWideScience

Sample records for genome encodes multiple

  1. ENCODE whole-genome data in the UCSC genome browser (2011 update).

    Science.gov (United States)

    Raney, Brian J; Cline, Melissa S; Rosenbloom, Kate R; Dreszer, Timothy R; Learned, Katrina; Barber, Galt P; Meyer, Laurence R; Sloan, Cricket A; Malladi, Venkat S; Roskin, Krishna M; Suh, Bernard B; Hinrichs, Angie S; Clawson, Hiram; Zweig, Ann S; Kirkup, Vanessa; Fujita, Pauline A; Rhead, Brooke; Smith, Kayla E; Pohl, Andy; Kuhn, Robert M; Karolchik, Donna; Haussler, David; Kent, W James

    2011-01-01

    The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.

  2. Molecular evolution of the Paramyxoviridae and Rhabdoviridae multiple-protein-encoding P gene.

    Science.gov (United States)

    Jordan, I K; Sutter, B A; McClure, M A

    2000-01-01

    Presented here is an analysis of the molecular evolutionary dynamics of the P gene among 76 representative sequences of the Paramyxoviridae and Rhabdoviridae RNA virus families. In a number of Paramyxoviridae taxa, as well as in vesicular stomatitis viruses of the Rhabdoviridae, the P gene encodes multiple proteins from a single genomic RNA sequence. These products include the phosphoprotein (P), as well as the C and V proteins. The complexity of the P gene makes it an intriguing locus to study from an evolutionary perspective. Amino acid sequence alignments of the proteins encoded at the P and N loci were used in independent phylogenetic reconstructions of the Paramyxoviridae and Rhabdoviridae families. P-gene-coding capacities were mapped onto the Paramyxoviridae phylogeny, and the most parsimonious path of multiple-coding-capacity evolution was determined. Levels of amino acid variation for Paramyxoviridae and Rhabdoviridae P-gene-encoded products were also analyzed. Proteins encoded in overlapping reading frames from the same nucleotides have different levels of amino acid variation. The nucleotide architecture that underlies the amino acid variation was determined in order to evaluate the role of selection in the evolution of the P gene overlapping reading frames. In every case, the evolution of one of the proteins encoded in the overlapping reading frames has been constrained by negative selection while the other has evolved more rapidly. The integrity of the overlapping reading frame that represents a derived state is generally maintained at the expense of the ancestral reading frame encoded by the same nucleotides. The evolution of such multicoding sequences is likely a response by RNA viruses to selective pressure to maximize genomic information content while maintaining small genome size. The ability to evolve such a complex genomic strategy is intimately related to the dynamics of the viral quasispecies, which allow enhanced exploration of the adaptive

  3. Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana.

    Science.gov (United States)

    Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi

    2014-01-03

    Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome

  4. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

    DEFF Research Database (Denmark)

    Birney, Ewan; Stamatoyannopoulos, John A; Dutta, Anindya

    2007-01-01

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses...

  5. Genome-wide identification of structural variants in genes encoding drug targets

    DEFF Research Database (Denmark)

    Rasmussen, Henrik Berg; Dahmcke, Christina Mackeprang

    2012-01-01

    The objective of the present study was to identify structural variants of drug target-encoding genes on a genome-wide scale. We also aimed at identifying drugs that are potentially amenable for individualization of treatments based on knowledge about structural variation in the genes encoding...

  6. The human homolog of S. cerevisiae CDC27, CDC27 Hs, is encoded by a highly conserved intronless gene present in multiple copies in the human genome

    Energy Technology Data Exchange (ETDEWEB)

    Devor, E.J.; Dill-Devor, R.M. [Univ. of Iowa College of Medicine, Iowa City (United States)

    1994-09-01

    We have obtained a number of unique sequences via PCR amplification of human genomic DNA using degenerate primers under low stringency (42{degrees}C). One of these, an 853 bp product, has been identified as a partial genomic sequence of the human homolog of the S. cerevisiae CDC27 gene, CDC27Hs (GenBank No. U00001). This gene, reported by Turgendreich et al. is also designated EST00556 from Adams et al. We have undertaken a more detailed examination of our sequence, MCP34N, and have found that: 1. the genomic sequence is nearly identical to CDC27Hs over its entire 853 bp length; 2. an MCP34N-specific PCR assay of several non-human primate species reveals amplification products in chimpanzee and gorilla genomes having greater than 90% sequence identity with CDC27Hs; and 3. an MCP34N-specific PCR assay of the BIOS hybrid cell line panel gives a discordancy pattern suggesting multiple loci. Based upon these data, we present the following initial characterization: 1. the complete MCP34N sequence identity with CDC27Hs indicates that the latter is encoded by an intronless gene; 2. CDC27Hs is highly conserved among higher primates; and 3. CDC27Hs is present in multiple copies in the human genome. These characteristics, taken together with those initially reported for CDC27Hs, suggest that this is an old gene that carries out an important but, as yet, unknown function in the human brain.

  7. Multiple-stage pure phase encoding with biometric information

    Science.gov (United States)

    Chen, Wen

    2018-01-01

    In recent years, many optical systems have been developed for securing information, and optical encryption/encoding has attracted more and more attention due to the marked advantages, such as parallel processing and multiple-dimensional characteristics. In this paper, an optical security method is presented based on pure phase encoding with biometric information. Biometric information (such as fingerprint) is employed as security keys rather than plaintext used in conventional optical security systems, and multiple-stage phase-encoding-based optical systems are designed for generating several phase-only masks with biometric information. Subsequently, the extracted phase-only masks are further used in an optical setup for encoding an input image (i.e., plaintext). Numerical simulations are conducted to illustrate the validity, and the results demonstrate that high flexibility and high security can be achieved.

  8. Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis.

    Science.gov (United States)

    Yutin, Natalya; Bäckström, Disa; Ettema, Thijs J G; Krupovic, Mart; Koonin, Eugene V

    2018-04-10

    Analysis of metagenomic sequences has become the principal approach for the study of the diversity of viruses. Many recent, extensive metagenomic studies on several classes of viruses have dramatically expanded the visible part of the virosphere, showing that previously undetected viruses, or those that have been considered rare, actually are important components of the global virome. We investigated the provenance of viruses related to tail-less bacteriophages of the family Tectiviridae by searching genomic and metagenomics sequence databases for distant homologs of the tectivirus-like Double Jelly-Roll major capsid proteins (DJR MCP). These searches resulted in the identification of numerous genomes of virus-like elements that are similar in size to tectiviruses (10-15 kilobases) and have diverse gene compositions. By comparison of the gene repertoires, the DJR MCP-encoding genomes were classified into 6 distinct groups that can be predicted to differ in reproduction strategies and host ranges. Only the DJR MCP gene that is present by design is shared by all these genomes, and most also encode a predicted DNA-packaging ATPase; the rest of the genes are present only in subgroups of this unexpectedly diverse collection of DJR MCP-encoding genomes. Only a minority encode a DNA polymerase which is a hallmark of the family Tectiviridae and the putative family "Autolykiviridae". Notably, one of the identified putative DJR MCP viruses encodes a homolog of Cas1 endonuclease, the integrase involved in CRISPR-Cas adaptation and integration of transposon-like elements called casposons. This is the first detected occurrence of Cas1 in a virus. Many of the identified elements are individual contigs flanked by inverted or direct repeats and appear to represent complete, extrachromosomal viral genomes, whereas others are flanked by bacterial genes and thus can be considered as proviruses. These contigs come from metagenomes of widely different environments, some dominated by

  9. The DNA-encoded nucleosome organization of a eukaryotic genome.

    Science.gov (United States)

    Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran

    2009-03-19

    Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.

  10. Detailed analysis of putative genes encoding small proteins in legume genomes

    Directory of Open Access Journals (Sweden)

    Gabriel eGuillén

    2013-06-01

    Full Text Available Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30-150 amino acids encoded by short open reading frames (sORFs. SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription, presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10461, 30521, and 23599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation.

  11. Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology

    DEFF Research Database (Denmark)

    Rossin, Elizabeth J.; Hansen, Kasper Lage; Raychaudhuri, Soumya

    2011-01-01

    Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these r......Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed...... in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more...... that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non...

  12. Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.

    Science.gov (United States)

    Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M

    1991-02-15

    The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.

  13. Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

    Science.gov (United States)

    Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.

  14. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  15. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

    Directory of Open Access Journals (Sweden)

    Benjamin C Hitz

    Full Text Available The Encyclopedia of DNA elements (ENCODE project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data has been released as a separate Python package.

  16. The mitochondrial gene encoding ribosomal protein S12 has been translocated to the nuclear genome in Oenothera.

    Science.gov (United States)

    Grohmann, L; Brennicke, A; Schuster, W

    1992-01-01

    The Oenothera mitochondrial genome contains only a gene fragment for ribosomal protein S12 (rps12), while other plants encode a functional gene in the mitochondrion. The complete Oenothera rps12 gene is located in the nucleus. The transit sequence necessary to target this protein to the mitochondrion is encoded by a 5'-extension of the open reading frame. Comparison of the amino acid sequence encoded by the nuclear gene with the polypeptides encoded by edited mitochondrial cDNA and genomic sequences of other plants suggests that gene transfer between mitochondrion and nucleus started from edited mitochondrial RNA molecules. Mechanisms and requirements of gene transfer and activation are discussed. Images PMID:1454526

  17. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

    Science.gov (United States)

    Birney, Ewan; Stamatoyannopoulos, John A; Dutta, Anindya; Guigó, Roderic; Gingeras, Thomas R; Margulies, Elliott H; Weng, Zhiping; Snyder, Michael; Dermitzakis, Emmanouil T; Thurman, Robert E; Kuehn, Michael S; Taylor, Christopher M; Neph, Shane; Koch, Christoph M; Asthana, Saurabh; Malhotra, Ankit; Adzhubei, Ivan; Greenbaum, Jason A; Andrews, Robert M; Flicek, Paul; Boyle, Patrick J; Cao, Hua; Carter, Nigel P; Clelland, Gayle K; Davis, Sean; Day, Nathan; Dhami, Pawandeep; Dillon, Shane C; Dorschner, Michael O; Fiegler, Heike; Giresi, Paul G; Goldy, Jeff; Hawrylycz, Michael; Haydock, Andrew; Humbert, Richard; James, Keith D; Johnson, Brett E; Johnson, Ericka M; Frum, Tristan T; Rosenzweig, Elizabeth R; Karnani, Neerja; Lee, Kirsten; Lefebvre, Gregory C; Navas, Patrick A; Neri, Fidencio; Parker, Stephen C J; Sabo, Peter J; Sandstrom, Richard; Shafer, Anthony; Vetrie, David; Weaver, Molly; Wilcox, Sarah; Yu, Man; Collins, Francis S; Dekker, Job; Lieb, Jason D; Tullius, Thomas D; Crawford, Gregory E; Sunyaev, Shamil; Noble, William S; Dunham, Ian; Denoeud, France; Reymond, Alexandre; Kapranov, Philipp; Rozowsky, Joel; Zheng, Deyou; Castelo, Robert; Frankish, Adam; Harrow, Jennifer; Ghosh, Srinka; Sandelin, Albin; Hofacker, Ivo L; Baertsch, Robert; Keefe, Damian; Dike, Sujit; Cheng, Jill; Hirsch, Heather A; Sekinger, Edward A; Lagarde, Julien; Abril, Josep F; Shahab, Atif; Flamm, Christoph; Fried, Claudia; Hackermüller, Jörg; Hertel, Jana; Lindemeyer, Manja; Missal, Kristin; Tanzer, Andrea; Washietl, Stefan; Korbel, Jan; Emanuelsson, Olof; Pedersen, Jakob S; Holroyd, Nancy; Taylor, Ruth; Swarbreck, David; Matthews, Nicholas; Dickson, Mark C; Thomas, Daryl J; Weirauch, Matthew T; Gilbert, James; Drenkow, Jorg; Bell, Ian; Zhao, XiaoDong; Srinivasan, K G; Sung, Wing-Kin; Ooi, Hong Sain; Chiu, Kuo Ping; Foissac, Sylvain; Alioto, Tyler; Brent, Michael; Pachter, Lior; Tress, Michael L; Valencia, Alfonso; Choo, Siew Woh; Choo, Chiou Yu; Ucla, Catherine; Manzano, Caroline; Wyss, Carine; Cheung, Evelyn; Clark, Taane G; Brown, James B; Ganesh, Madhavan; Patel, Sandeep; Tammana, Hari; Chrast, Jacqueline; Henrichsen, Charlotte N; Kai, Chikatoshi; Kawai, Jun; Nagalakshmi, Ugrappa; Wu, Jiaqian; Lian, Zheng; Lian, Jin; Newburger, Peter; Zhang, Xueqing; Bickel, Peter; Mattick, John S; Carninci, Piero; Hayashizaki, Yoshihide; Weissman, Sherman; Hubbard, Tim; Myers, Richard M; Rogers, Jane; Stadler, Peter F; Lowe, Todd M; Wei, Chia-Lin; Ruan, Yijun; Struhl, Kevin; Gerstein, Mark; Antonarakis, Stylianos E; Fu, Yutao; Green, Eric D; Karaöz, Ulaş; Siepel, Adam; Taylor, James; Liefer, Laura A; Wetterstrand, Kris A; Good, Peter J; Feingold, Elise A; Guyer, Mark S; Cooper, Gregory M; Asimenos, George; Dewey, Colin N; Hou, Minmei; Nikolaev, Sergey; Montoya-Burgos, Juan I; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Huang, Haiyan; Zhang, Nancy R; Holmes, Ian; Mullikin, James C; Ureta-Vidal, Abel; Paten, Benedict; Seringhaus, Michael; Church, Deanna; Rosenbloom, Kate; Kent, W James; Stone, Eric A; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross C; Haussler, David; Miller, Webb; Sidow, Arend; Trinklein, Nathan D; Zhang, Zhengdong D; Barrera, Leah; Stuart, Rhona; King, David C; Ameur, Adam; Enroth, Stefan; Bieda, Mark C; Kim, Jonghwan; Bhinge, Akshay A; Jiang, Nan; Liu, Jun; Yao, Fei; Vega, Vinsensius B; Lee, Charlie W H; Ng, Patrick; Shahab, Atif; Yang, Annie; Moqtaderi, Zarmik; Zhu, Zhou; Xu, Xiaoqin; Squazzo, Sharon; Oberley, Matthew J; Inman, David; Singer, Michael A; Richmond, Todd A; Munn, Kyle J; Rada-Iglesias, Alvaro; Wallerman, Ola; Komorowski, Jan; Fowler, Joanna C; Couttet, Phillippe; Bruce, Alexander W; Dovey, Oliver M; Ellis, Peter D; Langford, Cordelia F; Nix, David A; Euskirchen, Ghia; Hartman, Stephen; Urban, Alexander E; Kraus, Peter; Van Calcar, Sara; Heintzman, Nate; Kim, Tae Hoon; Wang, Kun; Qu, Chunxu; Hon, Gary; Luna, Rosa; Glass, Christopher K; Rosenfeld, M Geoff; Aldred, Shelley Force; Cooper, Sara J; Halees, Anason; Lin, Jane M; Shulha, Hennady P; Zhang, Xiaoling; Xu, Mousheng; Haidar, Jaafar N S; Yu, Yong; Ruan, Yijun; Iyer, Vishwanath R; Green, Roland D; Wadelius, Claes; Farnham, Peggy J; Ren, Bing; Harte, Rachel A; Hinrichs, Angie S; Trumbower, Heather; Clawson, Hiram; Hillman-Jackson, Jennifer; Zweig, Ann S; Smith, Kayla; Thakkapallayil, Archana; Barber, Galt; Kuhn, Robert M; Karolchik, Donna; Armengol, Lluis; Bird, Christine P; de Bakker, Paul I W; Kern, Andrew D; Lopez-Bigas, Nuria; Martin, Joel D; Stranger, Barbara E; Woodroffe, Abigail; Davydov, Eugene; Dimas, Antigone; Eyras, Eduardo; Hallgrímsdóttir, Ingileif B; Huppert, Julian; Zody, Michael C; Abecasis, Gonçalo R; Estivill, Xavier; Bouffard, Gerard G; Guan, Xiaobin; Hansen, Nancy F; Idol, Jacquelyn R; Maduro, Valerie V B; Maskeri, Baishali; McDowell, Jennifer C; Park, Morgan; Thomas, Pamela J; Young, Alice C; Blakesley, Robert W; Muzny, Donna M; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Jiang, Huaiyang; Weinstock, George M; Gibbs, Richard A; Graves, Tina; Fulton, Robert; Mardis, Elaine R; Wilson, Richard K; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B; Chang, Jean L; Lindblad-Toh, Kerstin; Lander, Eric S; Koriabine, Maxim; Nefedov, Mikhail; Osoegawa, Kazutoyo; Yoshinaga, Yuko; Zhu, Baoli; de Jong, Pieter J

    2007-06-14

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

  18. Exploratory re-encoding of Yellow Fever Virus genome: new insights for the design of live-attenuated viruses

    OpenAIRE

    Klitting, Raphaelle; Riziki, Toilhata; Moureau, Gregory; De Lamballerie, Xavier; Piorkowski, Geraldine

    2018-01-01

    Virus attenuation by genome re-encoding is a pioneering approach for generating live-attenuated vaccine candidates. Its core principle is to introduce a large number of slightly deleterious synonymous mutations into the viral genome to produce a stable attenuation of the targeted virus. The large number of mutations introduced is supposed to guarantee the stability of the attenuated phenotype by lowering the risks of reversion and recombination for re-encoded sequences. In this prospect, iden...

  19. EasyCloneMulti: A Set of Vectors for Simultaneous and Multiple Genomic Integrations in Saccharomyces cerevisiae

    DEFF Research Database (Denmark)

    Maury, Jerome; Germann, Susanne Manuela; Jacobsen, Simo Abdessamad

    2016-01-01

    Saccharomyces cerevisiae is widely used in the biotechnology industry for production of ethanol, recombinant proteins, food ingredients and other chemicals. In order to generate highly producing and stable strains, genome integration of genes encoding metabolic pathway enzymes is the preferred...... of integrative vectors, EasyCloneMulti, that enables multiple and simultaneous integration of genes in S. cerevisiae. By creating vector backbones that combine consensus sequences that aim at targeting subsets of Ty sequences and a quickly degrading selective marker, integrations at multiple genomic loci...... and a range of expression levels were obtained, as assessed with the green fluorescent protein (GFP) reporter system. The EasyCloneMulti vector set was applied to balance the expression of the rate-controlling step in the β-alanine pathway for biosynthesis of 3-hydroxypropionic acid (3HP). The best 3HP...

  20. Structured RNAs in the ENCODE selected regions of the human genome

    DEFF Research Database (Denmark)

    Washietl, Stefan; Pedersen, Jakob Skou; Korbel, Jan O

    2007-01-01

    Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack...... with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz...

  1. Current View on Phytoplasma Genomes and Encoded Metabolism

    Directory of Open Access Journals (Sweden)

    Michael Kube

    2012-01-01

    Full Text Available Phytoplasmas are specialised bacteria that are obligate parasites of plant phloem tissue and insects. These bacteria have resisted all attempts of cell-free cultivation. Genome research is of particular importance to analyse the genetic endowment of such bacteria. Here we review the gene content of the four completely sequenced ‘Candidatus Phytoplasma’ genomes that include those of ‘Ca. P. asteris’ strains OY-M and AY-WB, ‘Ca. P. australiense,’ and ‘Ca. P. mali’. These genomes are characterized by chromosome condensation resulting in sizes below 900 kb and a G + C content of less than 28%. Evolutionary adaption of the phytoplasmas to nutrient-rich environments resulted in losses of genetic modules and increased host dependency highlighted by the transport systems and limited metabolic repertoire. On the other hand, duplication and integration events enlarged the chromosomes and contribute to genome instability. Present differences in the content of membrane and secreted proteins reflect the host adaptation in the phytoplasma strains. General differences are obvious between different phylogenetic subgroups. ‘Ca. P. mali’ is separated from the other strains by its deviating chromosome organization, the genetic repertoire for recombination and excision repair of nucleotides or the loss of the complete energy-yielding part of the glycolysis. Apart from these differences, comparative analysis exemplified that all four phytoplasmas are likely to encode an alternative pathway to generate pyruvate and ATP.

  2. Genomics and physiology of a marine flavobacterium encoding a proteorhodopsin and a xanthorhodopsin-like protein.

    Directory of Open Access Journals (Sweden)

    Thomas Riedel

    Full Text Available Proteorhodopsin (PR photoheterotrophy in the marine flavobacterium Dokdonia sp. PRO95 has previously been investigated, showing no growth stimulation in the light at intermediate carbon concentrations. Here we report the genome sequence of strain PRO95 and compare it to two other PR encoding Dokdonia genomes: that of strain 4H-3-7-5 which shows the most similar genome, and that of strain MED134 which grows better in the light under oligotrophic conditions. Our genome analysis revealed that the PRO95 genome as well as the 4H-3-7-5 genome encode a protein related to xanthorhodopsins. The genomic environment and phylogenetic distribution of this gene suggest that it may have frequently been recruited by lateral gene transfer. Expression analyses by RT-PCR and direct mRNA-sequencing showed that both rhodopsins and the complete β-carotene pathway necessary for retinal production are transcribed in PRO95. Proton translocation measurements showed enhanced proton pump activity in response to light, supporting that one or both rhodopsins are functional. Genomic information and carbon source respiration data were used to develop a defined cultivation medium for PRO95, but reproducible growth always required small amounts of yeast extract. Although PRO95 contains and expresses two rhodopsin genes, light did not stimulate its growth as determined by cell numbers in a nutrient poor seawater medium that mimics its natural environment, confirming previous experiments at intermediate carbon concentrations. Starvation or stress conditions might be needed to observe the physiological effect of light induced energy acquisition.

  3. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes.

    Directory of Open Access Journals (Sweden)

    Kris Popendorf

    Full Text Available BACKGROUND: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. METHODOLOGY/PRINCIPAL FINDINGS: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1 adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2 parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow in 21 hours CPU time (42 minutes wall time. This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. CONCLUSIONS/SIGNIFICANCE: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with

  4. GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface.

    Science.gov (United States)

    Lajugie, Julien; Fourel, Nicolas; Bouhassira, Eric E

    2015-01-01

    Parallel visualization of multiple individual human genomes is a complex endeavor that is rapidly gaining importance with the increasing number of personal, phased and cancer genomes that are being generated. It requires the display of variants such as SNPs, indels and structural variants that are unique to specific genomes and the introduction of multiple overlapping gaps in the reference sequence. Here, we describe GenPlay Multi-Genome, an application specifically written to visualize and analyze multiple human genomes in parallel. GenPlay Multi-Genome is ideally suited for the comparison of allele-specific expression and functional genomic data obtained from multiple phased genomes in a graphical interface with access to multiple-track operation. It also allows the analysis of data that have been aligned to custom genomes rather than to a standard reference and can be used as a variant calling format file browser and as a tool to compare different genome assembly, such as hg19 and hg38. GenPlay is available under the GNU public license (GPL-3) from http://genplay.einstein.yu.edu. The source code is available at https://github.com/JulienLajugie/GenPlay. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Comparative genomics of multidrug resistance-encoding IncA/C plasmids from commensal and pathogenic Escherichia coli from multiple animal sources.

    Science.gov (United States)

    Fernández-Alarcón, Claudia; Singer, Randall S; Johnson, Timothy J

    2011-01-01

    Incompatibility group A/C (IncA/C) plasmids have received recent attention for their broad host range and ability to confer resistance to multiple antimicrobial agents. Due to the potential spread of multidrug resistance (MDR) phenotypes from foodborne pathogens to human pathogens, the dissemination of these plasmids represents a public health risk. In this study, four animal-source IncA/C plasmids isolated from Escherichia coli were sequenced and analyzed, including isolates from commercial dairy cows, pigs and turkeys in the U.S. and Chile. These plasmids were initially selected because they either contained the floR and tetA genes encoding for florfenicol and tetracycline resistance, respectively, and/or the bla(CMY-2) gene encoding for extended spectrum β-lactamase resistance. Overall, sequence analysis revealed that each of the four plasmids retained a remarkably stable and conserved backbone sequence, with differences observed primarily within their accessory regions, which presumably have evolved via horizontal gene transfer events involving multiple modules. Comparison of these plasmids with other available IncA/C plasmid sequences further defined the core and accessory elements of these plasmids in E. coli and Salmonella. Our results suggest that the bla(CMY-2) plasmid lineage appears to have derived from an ancestral IncA/C plasmid type harboring floR-tetAR-strAB and Tn21-like accessory modules. Evidence is mounting that IncA/C plasmids are widespread among enteric bacteria of production animals and these emergent plasmids have flexibility in their acquisition of MDR-encoding modules, necessitating further study to understand the evolutionary mechanisms involved in their dissemination and stability in bacterial populations.

  6. Comparative genomics of multidrug resistance-encoding IncA/C plasmids from commensal and pathogenic Escherichia coli from multiple animal sources.

    Directory of Open Access Journals (Sweden)

    Claudia Fernández-Alarcón

    Full Text Available Incompatibility group A/C (IncA/C plasmids have received recent attention for their broad host range and ability to confer resistance to multiple antimicrobial agents. Due to the potential spread of multidrug resistance (MDR phenotypes from foodborne pathogens to human pathogens, the dissemination of these plasmids represents a public health risk. In this study, four animal-source IncA/C plasmids isolated from Escherichia coli were sequenced and analyzed, including isolates from commercial dairy cows, pigs and turkeys in the U.S. and Chile. These plasmids were initially selected because they either contained the floR and tetA genes encoding for florfenicol and tetracycline resistance, respectively, and/or the bla(CMY-2 gene encoding for extended spectrum β-lactamase resistance. Overall, sequence analysis revealed that each of the four plasmids retained a remarkably stable and conserved backbone sequence, with differences observed primarily within their accessory regions, which presumably have evolved via horizontal gene transfer events involving multiple modules. Comparison of these plasmids with other available IncA/C plasmid sequences further defined the core and accessory elements of these plasmids in E. coli and Salmonella. Our results suggest that the bla(CMY-2 plasmid lineage appears to have derived from an ancestral IncA/C plasmid type harboring floR-tetAR-strAB and Tn21-like accessory modules. Evidence is mounting that IncA/C plasmids are widespread among enteric bacteria of production animals and these emergent plasmids have flexibility in their acquisition of MDR-encoding modules, necessitating further study to understand the evolutionary mechanisms involved in their dissemination and stability in bacterial populations.

  7. Genomes of ubiquitous marine and hypersaline Hydrogenovibrio, Thiomicrorhabdus and Thiomicrospira spp. encode a diversity of mechanisms to sustain chemolithoautotrophy in heterogeneous environments.

    Science.gov (United States)

    Scott, Kathleen M; Williams, John; Porter, Cody M B; Russel, Sydney; Harmer, Tara L; Paul, John H; Antonen, Kirsten M; Bridges, Megan K; Camper, Gary J; Campla, Christie K; Casella, Leila G; Chase, Eva; Conrad, James W; Cruz, Mercedez C; Dunlap, Darren S; Duran, Laura; Fahsbender, Elizabeth M; Goldsmith, Dawn B; Keeley, Ryan F; Kondoff, Matthew R; Kussy, Breanna I; Lane, Marannda K; Lawler, Stephanie; Leigh, Brittany A; Lewis, Courtney; Lostal, Lygia M; Marking, Devon; Mancera, Paola A; McClenthan, Evan C; McIntyre, Emily A; Mine, Jessica A; Modi, Swapnil; Moore, Brittney D; Morgan, William A; Nelson, Kaleigh M; Nguyen, Kimmy N; Ogburn, Nicholas; Parrino, David G; Pedapudi, Anangamanjari D; Pelham, Rebecca P; Preece, Amanda M; Rampersad, Elizabeth A; Richardson, Jason C; Rodgers, Christina M; Schaffer, Brent L; Sheridan, Nancy E; Solone, Michael R; Staley, Zachery R; Tabuchi, Maki; Waide, Ramond J; Wanjugi, Pauline W; Young, Suzanne; Clum, Alicia; Daum, Chris; Huntemann, Marcel; Ivanova, Natalia; Kyrpides, Nikos; Mikhailova, Natalia; Palaniappan, Krishnaveni; Pillay, Manoj; Reddy, T B K; Shapiro, Nicole; Stamatis, Dimitrios; Varghese, Neha; Woyke, Tanja; Boden, Rich; Freyermuth, Sharyn K; Kerfeld, Cheryl A

    2018-03-09

    Chemolithoautotrophic bacteria from the genera Hydrogenovibrio, Thiomicrorhabdus and Thiomicrospira are common, sometimes dominant, isolates from sulfidic habitats including hydrothermal vents, soda and salt lakes and marine sediments. Their genome sequences confirm their membership in a deeply branching clade of the Gammaproteobacteria. Several adaptations to heterogeneous habitats are apparent. Their genomes include large numbers of genes for sensing and responding to their environment (EAL- and GGDEF-domain proteins and methyl-accepting chemotaxis proteins) despite their small sizes (2.1-3.1 Mbp). An array of sulfur-oxidizing complexes are encoded, likely to facilitate these organisms' use of multiple forms of reduced sulfur as electron donors. Hydrogenase genes are present in some taxa, including group 1d and 2b hydrogenases in Hydrogenovibrio marinus and H. thermophilus MA2-6, acquired via horizontal gene transfer. In addition to high-affinity cbb 3 cytochrome c oxidase, some also encode cytochrome bd-type quinol oxidase or ba 3 -type cytochrome c oxidase, which could facilitate growth under different oxygen tensions, or maintain redox balance. Carboxysome operons are present in most, with genes downstream encoding transporters from four evolutionarily distinct families, which may act with the carboxysomes to form CO 2 concentrating mechanisms. These adaptations to habitat variability likely contribute to the cosmopolitan distribution of these organisms. © 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.

  8. On the Immortality of Television Sets: ?Function? in the Human Genome According to the Evolution-Free Gospel of ENCODE

    OpenAIRE

    Graur, Dan; Zheng, Yichen; Price, Nicholas; Azevedo, Ricardo B.R.; Zufall, Rebecca A.; Elhaik, Eran

    2013-01-01

    A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selec...

  9. Designs of Optoelectronic Trinary Signed-Digit Multiplication by use of Joint Spatial Encodings and Optical Correlation

    Science.gov (United States)

    Cherri, Abdallah K.

    1999-02-01

    Trinary signed-digit (TSD) symbolic-substitution-based (SS-based) optical adders, which were recently proposed, are used as the basic modules for designing highly parallel optical multiplications by use of cascaded optical correlators. The proposed multiplications perform carry-free generation of the multiplication partial products of two words in constant time. Also, three different multiplication designs are presented, and new joint spatial encodings for the TSD numbers are introduced. The proposed joint spatial encodings allow one to reduce the SS computation rules involved in optical multiplication. In addition, the proposed joint spatial encodings increase the space bandwidth product of the spatial light modulators of the optical system. This increase is achieved by reduction of the numbers of pixels in the joint spatial encodings for the input TSD operands as well as reduction of the number of pixels used in the proposed matched spatial filters for the optical multipliers.

  10. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    Science.gov (United States)

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  11. The number of genes encoding repeat domain-containing proteins positively correlates with genome size in amoebal giant viruses

    Science.gov (United States)

    Shukla, Avi; Chatterjee, Anirvan

    2018-01-01

    Abstract Curiously, in viruses, the virion volume appears to be predominantly driven by genome length rather than the number of proteins it encodes or geometric constraints. With their large genome and giant particle size, amoebal viruses (AVs) are ideally suited to study the relationship between genome and virion size and explore the role of genome plasticity in their evolutionary success. Different genomic regions of AVs exhibit distinct genealogies. Although the vertically transferred core genes and their functions are universally conserved across the nucleocytoplasmic large DNA virus (NCLDV) families and are essential for their replication, the horizontally acquired genes are variable across families and are lineage-specific. When compared with other giant virus families, we observed a near–linear increase in the number of genes encoding repeat domain-containing proteins (RDCPs) with the increase in the genome size of AVs. From what is known about the functions of RDCPs in bacteria and eukaryotes and their prevalence in the AV genomes, we envisage important roles for RDCPs in the life cycle of AVs, their genome expansion, and plasticity. This observation also supports the evolution of AVs from a smaller viral ancestor by the acquisition of diverse gene families from the environment including RDCPs that might have helped in host adaption. PMID:29308275

  12. Comparative genomic analysis uncovers 3 novel loci encoding type six secretion systems differentially distributed in Salmonella serotypes

    Directory of Open Access Journals (Sweden)

    Santiviago Carlos A

    2009-08-01

    Full Text Available Abstract Background The recently described Type VI Secretion System (T6SS represents a new paradigm of protein secretion in bacteria. A number of bioinformatic studies have been conducted to identify T6SS gene clusters in the available bacterial genome sequences. According to these studies, Salmonella harbors a unique T6SS encoded in the Salmonella Pathogenicity Island 6 (SPI-6. Since these studies only considered few Salmonella genomes, the present work aimed to identify novel T6SS loci by in silico analysis of every genome sequence of Salmonella available. Results The analysis of sequencing data from 44 completed or in progress Salmonella genome projects allowed the identification of 3 novel T6SS loci. These clusters are located in differentially-distributed genomic islands we designated SPI-19, SPI-20 and SPI-21, respectively. SPI-19 was identified in a subset of S. enterica serotypes including Dublin, Weltevreden, Agona, Gallinarum and Enteritidis. In the later, an internal deletion eliminated most of the island. On the other hand, SPI-20 and SPI-21 were restricted to S. enterica subspecies arizonae (IIIa serotype 62:z4,z23:-. Remarkably, SPI-21 encodes a VgrG protein containing a C-terminal extension similar to S-type pyocins of Pseudomonas aeruginosa. This is not only the first evolved VgrG described in Salmonella, but also the first evolved VgrG including a pyocin domain described so far in the literature. In addition, the data indicate that SPI-6 T6SS is widely distributed in S. enterica and absent in serotypes Enteritidis, Gallinarum, Agona, Javiana, Paratyphi B, Virchow, IIIa 62:z4,z23:- and IIIb 61:1,v:1,5,(7. Interestingly, while some serotypes harbor multiple T6SS (Dublin, Weltvreden and IIIa 62:z4,z23:- others do not encode for any (Enteritidis, Paratyphi B, Javiana, Virchow and IIIb 61:1,v:1,5,(7. Comparative and phylogenetic analyses indicate that the 4 T6SS loci in Salmonella have a distinct evolutionary history. Finally, we

  13. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Wasnick Michael

    2008-03-01

    Full Text Available Abstract Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any

  14. Molecular characterization of genome segments 1 and 3 encoding two capsid proteins of Antheraea mylitta cytoplasmic polyhedrosis virus

    Directory of Open Access Journals (Sweden)

    Chakrabarti Mrinmay

    2010-08-01

    Full Text Available Abstract Background Antheraea mylitta cytoplasmic polyhedrosis virus (AmCPV, a cypovirus of Reoviridae family, infects Indian non-mulberry silkworm, Antheraea mylitta, and contains 11 segmented double stranded RNA (S1-S11 in its genome. Some of its genome segments (S2 and S6-S11 have been previously characterized but genome segments encoding viral capsid have not been characterized. Results In this study genome segments 1 (S1 and 3 (S3 of AmCPV were converted to cDNA, cloned and sequenced. S1 consisted of 3852 nucleotides, with one long ORF of 3735 nucleotides and could encode a protein of 1245 amino acids with molecular mass of ~141 kDa. Similarly, S3 consisted of 3784 nucleotides having a long ORF of 3630 nucleotides and could encode a protein of 1210 amino acids with molecular mass of ~137 kDa. BLAST analysis showed 20-22% homology of S1 and S3 sequence with spike and capsid proteins, respectively, of other closely related cypoviruses like Bombyx mori CPV (BmCPV, Lymantria dispar CPV (LdCPV, and Dendrolimus punctatus CPV (DpCPV. The ORFs of S1 and S3 were expressed as 141 kDa and 137 kDa insoluble His-tagged fusion proteins, respectively, in Escherichia coli M15 cells via pQE-30 vector, purified through Ni-NTA chromatography and polyclonal antibodies were raised. Immunoblot analysis of purified polyhedra, virion particles and virus infected mid-gut cells with the raised anti-p137 and anti-p141 antibodies showed specific immunoreactive bands and suggest that S1 and S3 may code for viral structural proteins. Expression of S1 and S3 ORFs in insect cells via baculovirus recombinants showed to produce viral like particles (VLPs by transmission electron microscopy. Immunogold staining showed that S3 encoded proteins self assembled to form viral outer capsid and VLPs maintained their stability at different pH in presence of S1 encoded protein. Conclusion Our results of cloning, sequencing and functional analysis of AmCPV S1 and S3 indicate that S3

  15. Origin of multiple periodicities in the Fourier power spectra of the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Nunes Miriam CS

    2011-12-01

    Full Text Available Abstract Background Fourier transforms and their associated power spectra are used for detecting periodicities and protein-coding genes and is generally regarded as a well established technique. Many of the periodicities which have been found with this method are quite well understood such as the periodicity of 3 nt which is associated to codon usage. But what is the origin of the peculiar frequency multiples k/21 which were reported for a tiny section of chromosome 2 in P. falciparum? Are these present in other chromosomes and perhaps in related organisms? And how should we interpret fractional periodicities in genomes? Results We applied the binary indicator power spectrum to all chromosomes of P. falciparum, and found that the frequency overtones k/21 are present only in non-coding sections. We did not find such frequency overtones in any other related genomes. Furthermore, the frequency overtones were identified as artifacts of the way the genome is encoded into a numerical sequence, that is, they are frequency aliases. By choosing a different way to encode the sequence the overtones do not appear. In view of these results, we revisited early applications of this technique to proteins where frequency overtones were reported. Conclusions Some authors hinted recently at the possibility of mapping artifacts and frequency aliases in power spectra. However, in the case of P. falciparum the frequency aliases are particularly strong and can mask the 1/3 frequency which is used for gene detecting. This shows that albeit being a well known technique, with a long history of application in proteins, few researchers seem to be aware of the problems represented by frequency aliases.

  16. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture.

    Science.gov (United States)

    Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N

    2017-11-14

    Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.

  17. Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology

    Science.gov (United States)

    Rossin, Elizabeth J.; Lage, Kasper; Raychaudhuri, Soumya; Xavier, Ramnik J.; Tatar, Diana; Benita, Yair

    2011-01-01

    Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in

  18. Molecular evolution of avian reovirus: evidence for genetic diversity and reassortment of the S-class genome segments and multiple cocirculating lineages

    International Nuclear Information System (INIS)

    Liu, Hung J.; Lee, Long H.; Hsu, Hsiao W.; Kuo, Liam C.; Liao, Ming H.

    2003-01-01

    Nucleotide sequences of the S-class genome segments of 17 field-isolates and vaccine strains of avian reovirus (ARV) isolated over a 23-year period from different hosts, pathotypes, and geographic locations were examined and analyzed to define phylogenetic profiles and evolutionary mechanism. The S1 genome segment showed noticeably higher divergence than the other S-class genes. The σC-encoding gene has evolved into six distinct lineages. In contrast, the other S-class genes showed less divergence than that of the σC-encoding gene and have evolved into two to three major distinct lineages, respectively. Comparative sequence analysis provided evidence indicating extensive sequence divergence between ARV and other orthoreoviruses. The evolutionary trees of each gene were distinct, suggesting that these genes evolve in an independent manner. Furthermore, variable topologies were the result of frequent genetic reassortment among multiple cocirculating lineages. Results showed genetic diversity correlated more closely with date of isolation and geographic sites than with host species and pathotypes. This is the first evidence demonstrating genetic variability among circulating ARVs through a combination of evolutionary mechanisms involving multiple cocirculating lineages and genetic reassortment. The evolutionary rates and patterns of base substitutions were examined. The evolutionary rate for the σC-encoding gene and σC protein was higher than for the other S-class genes and other family of viruses. With the exception of the σC-encoding gene, which nonsynonymous substitutions predominate over synonymous, the evolutionary process of the other S-class genes can be explained by the neutral theory of molecular evolution. Results revealed that synonymous substitutions predominate over nonsynonymous in the S-class genes, even though genetic diversity and substitution rates vary among the viruses

  19. Coevolution between Nuclear-Encoded DNA Replication, Recombination, and Repair Genes and Plastid Genome Complexity.

    Science.gov (United States)

    Zhang, Jin; Ruhlman, Tracey A; Sabir, Jamal S M; Blazier, John Chris; Weng, Mao-Lun; Park, Seongjun; Jansen, Robert K

    2016-02-17

    Disruption of DNA replication, recombination, and repair (DNA-RRR) systems has been hypothesized to cause highly elevated nucleotide substitution rates and genome rearrangements in the plastids of angiosperms, but this theory remains untested. To investigate nuclear-plastid genome (plastome) coevolution in Geraniaceae, four different measures of plastome complexity (rearrangements, repeats, nucleotide insertions/deletions, and substitution rates) were evaluated along with substitution rates of 12 nuclear-encoded, plastid-targeted DNA-RRR genes from 27 Geraniales species. Significant correlations were detected for nonsynonymous (dN) but not synonymous (dS) substitution rates for three DNA-RRR genes (uvrB/C, why1, and gyrA) supporting a role for these genes in accelerated plastid genome evolution in Geraniaceae. Furthermore, correlation between dN of uvrB/C and plastome complexity suggests the presence of nucleotide excision repair system in plastids. Significant correlations were also detected between plastome complexity and 13 of the 90 nuclear-encoded organelle-targeted genes investigated. Comparisons revealed significant acceleration of dN in plastid-targeted genes of Geraniales relative to Brassicales suggesting this correlation may be an artifact of elevated rates in this gene set in Geraniaceae. Correlation between dN of plastid-targeted DNA-RRR genes and plastome complexity supports the hypothesis that the aberrant patterns in angiosperm plastome evolution could be caused by dysfunction in DNA-RRR systems. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Multiple Regions of a Cortical Network Commonly Encode the Meaning of Words in Multiple Grammatical Positions of Read Sentences.

    Science.gov (United States)

    Anderson, Andrew James; Lalor, Edmund C; Lin, Feng; Binder, Jeffrey R; Fernandino, Leonardo; Humphries, Colin J; Conant, Lisa L; Raizada, Rajeev D S; Grimm, Scott; Wang, Xixi

    2018-05-16

    Deciphering how sentence meaning is represented in the brain remains a major challenge to science. Semantically related neural activity has recently been shown to arise concurrently in distributed brain regions as successive words in a sentence are read. However, what semantic content is represented by different regions, what is common across them, and how this relates to words in different grammatical positions of sentences is weakly understood. To address these questions, we apply a semantic model of word meaning to interpret brain activation patterns elicited in sentence reading. The model is based on human ratings of 65 sensory/motor/emotional and cognitive features of experience with words (and their referents). Through a process of mapping functional Magnetic Resonance Imaging activation back into model space we test: which brain regions semantically encode content words in different grammatical positions (e.g., subject/verb/object); and what semantic features are encoded by different regions. In left temporal, inferior parietal, and inferior/superior frontal regions we detect the semantic encoding of words in all grammatical positions tested and reveal multiple common components of semantic representation. This suggests that sentence comprehension involves a common core representation of multiple words' meaning being encoded in a network of regions distributed across the brain.

  1. Multiple-Trait Genomic Selection Methods Increase Genetic Value Prediction Accuracy

    Science.gov (United States)

    Jia, Yi; Jannink, Jean-Luc

    2012-01-01

    Genetic correlations between quantitative traits measured in many breeding programs are pervasive. These correlations indicate that measurements of one trait carry information on other traits. Current single-trait (univariate) genomic selection does not take advantage of this information. Multivariate genomic selection on multiple traits could accomplish this but has been little explored and tested in practical breeding programs. In this study, three multivariate linear models (i.e., GBLUP, BayesA, and BayesCπ) were presented and compared to univariate models using simulated and real quantitative traits controlled by different genetic architectures. We also extended BayesA with fixed hyperparameters to a full hierarchical model that estimated hyperparameters and BayesCπ to impute missing phenotypes. We found that optimal marker-effect variance priors depended on the genetic architecture of the trait so that estimating them was beneficial. We showed that the prediction accuracy for a low-heritability trait could be significantly increased by multivariate genomic selection when a correlated high-heritability trait was available. Further, multiple-trait genomic selection had higher prediction accuracy than single-trait genomic selection when phenotypes are not available on all individuals and traits. Additional factors affecting the performance of multiple-trait genomic selection were explored. PMID:23086217

  2. Simultaneous gene finding in multiple genomes.

    Science.gov (United States)

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. The Nostoc punctiforme Genome

    Energy Technology Data Exchange (ETDEWEB)

    John C. Meeks

    2001-12-31

    Nostoc punctiforme is a filamentous cyanobacterium with extensive phenotypic characteristics and a relatively large genome, approaching 10 Mb. The phenotypic characteristics include a photoautotrophic, diazotrophic mode of growth, but N. punctiforme is also facultatively heterotrophic; its vegetative cells have multiple development alternatives, including terminal differentiation into nitrogen-fixing heterocysts and transient differentiation into spore-like akinetes or motile filaments called hormogonia; and N. punctiforme has broad symbiotic competence with fungi and terrestrial plants, including bryophytes, gymnosperms and an angiosperm. The shotgun-sequencing phase of the N. punctiforme strain ATCC 29133 genome has been completed by the Joint Genome Institute. Annotation of an 8.9 Mb database yielded 7432 open reading frames, 45% of which encode proteins with known or probable known function and 29% of which are unique to N. punctiforme. Comparative analysis of the sequence indicates a genome that is highly plastic and in a state of flux, with numerous insertion sequences and multilocus repeats, as well as genes encoding transposases and DNA modification enzymes. The sequence also reveals the presence of genes encoding putative proteins that collectively define almost all characteristics of cyanobacteria as a group. N. punctiforme has an extensive potential to sense and respond to environmental signals as reflected by the presence of more than 400 genes encoding sensor protein kinases, response regulators and other transcriptional factors. The signal transduction systems and any of the large number of unique genes may play essential roles in the cell differentiation and symbiotic interaction properties of N. punctiforme.

  4. The Mimivirus Genome Encodes a Mitochondrial Carrier That Transports dATP and dTTP▿

    Science.gov (United States)

    Monné, Magnus; Robinson, Alan J.; Boes, Christoph; Harbour, Michael E.; Fearnley, Ian M.; Kunji, Edmund R. S.

    2007-01-01

    Members of the mitochondrial carrier family have been reported in eukaryotes only, where they transport metabolites and cofactors across the mitochondrial inner membrane to link the metabolic pathways of the cytosol and the matrix. The genome of the giant virus Mimiviridae mimivirus encodes a member of the mitochondrial carrier family of transport proteins. This viral protein has been expressed in Lactococcus lactis and is shown to transport dATP and dTTP. As the 1.2-Mb double-stranded DNA mimivirus genome is rich in A and T residues, we speculate that the virus is using this protein to target the host mitochondria as a source of deoxynucleotides for its replication. PMID:17229695

  5. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  6. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.

    Science.gov (United States)

    Graur, Dan; Zheng, Yichen; Price, Nicholas; Azevedo, Ricardo B R; Zufall, Rebecca A; Elhaik, Eran

    2013-01-01

    A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 - 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these "functional" regions or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly by employing the seldom used "causal role" definition of biological function and then applying it inconsistently to different biochemical properties, by committing a logical fallacy known as "affirming the consequent," by failing to appreciate the crucial difference between "junk DNA" and "garbage DNA," by using analytical methods that yield biased errors and inflate estimates of functionality, by favoring statistical sensitivity over specificity, and by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

  7. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  8. Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement.

    Science.gov (United States)

    Blazier, J Chris; Ruhlman, Tracey A; Weng, Mao-Lun; Rehman, Sumaiyah K; Sabir, Jamal S M; Jansen, Robert K

    2016-04-18

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA.

  9. Genome-wide association study identifies multiple susceptibility loci for multiple myeloma

    DEFF Research Database (Denmark)

    Mitchell, Jonathan S; Li, Ni; Weinhold, Niels

    2016-01-01

    Multiple myeloma (MM) is a plasma cell malignancy with a significant heritable basis. Genome-wide association studies have transformed our understanding of MM predisposition, but individual studies have had limited power to discover risk loci. Here we perform a meta-analysis of these GWAS, add a ...

  10. Identification of functional elements and regulatory circuits by Drosophila modENCODE

    Energy Technology Data Exchange (ETDEWEB)

    Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.; Kheradpour, Pouya; Negre, Nicolas; Eaton, Matthew L.; Landolin, Jane M.; Bristow, Christopher A.; Ma, Lijia; Lin, Michael F.; Washietl, Stefan; Arshinoff, Bradley I.; Ay, Ferhat; Meyer, Patrick E.; Robine, Nicolas; Washington, Nicole L.; Stefano, Luisa Di; Berezikov, Eugene; Brown, Christopher D.; Candeias, Rogerio; Carlson, Joseph W.; Carr, Adrian; Jungreis, Irwin; Marbach, Daniel; Sealfon, Rachel; Tolstorukov, Michael Y.; Will, Sebastian; Alekseyenko, Artyom A.; Artieri, Carlo; Booth, Benjamin W.; Brooks, Angela N.; Dai, Qi; Davis, Carrie A.; Duff, Michael O.; Feng, Xin; Gorchakov, Andrey A.; Gu, Tingting; Henikoff, Jorja G.; Kapranov, Philipp; Li, Renhua; MacAlpine, Heather K.; Malone, John; Minoda, Aki; Nordman, Jared; Okamura, Katsutomo; Perry, Marc; Powell, Sara K.; Riddle, Nicole C.; Sakai, Akiko; Samsonova, Anastasia; Sandler, Jeremy E.; Schwartz, Yuri B.; Sher, Noa; Spokony, Rebecca; Sturgill, David; van Baren, Marijke; Wan, Kenneth H.; Yang, Li; Yu, Charles; Feingold, Elise; Good, Peter; Guyer, Mark; Lowdon, Rebecca; Ahmad, Kami; Andrews, Justen; Berger, Bonnie; Brenner, Steven E.; Brent, Michael R.; Cherbas, Lucy; Elgin, Sarah C. R.; Gingeras, Thomas R.; Grossman, Robert; Hoskins, Roger A.; Kaufman, Thomas C.; Kent, William; Kuroda, Mitzi I.; Orr-Weaver, Terry; Perrimon, Norbert; Pirrotta, Vincenzo; Posakony, James W.; Ren, Bing; Russell, Steven; Cherbas, Peter; Graveley, Brenton R.; Lewis, Suzanna; Micklem, Gos; Oliver, Brian; Park, Peter J.; Celniker, Susan E.; Henikoff, Steven; Karpen, Gary H.; Lai, Eric C.; MacAlpine, David M.; Stein, Lincoln D.; White, Kevin P.; Kellis, Manolis

    2010-12-22

    To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions

  11. Genome sequence of Shigella flexneri strain SP1, a diarrheal isolate that encodes an extended-spectrum β-lactamase (ESBL).

    Science.gov (United States)

    Shen, Ping; Fan, Jianzhong; Guo, Lihua; Li, Jiahua; Li, Ang; Zhang, Jing; Ying, Chaoqun; Ji, Jinru; Xu, Hao; Zheng, Beiwen; Xiao, Yonghong

    2017-05-12

    Shigellosis is the most common cause of gastrointestinal infections in developing countries. In China, the species most frequently responsible for shigellosis is Shigella flexneri. S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on biochemical and serological properties. Moreover, increasing numbers of ESBL-producing Shigella strains have been isolated from clinical samples. Despite this, only a few cases of ESBL-producing Shigella have been described in China. Therefore, a better understanding of ESBL-producing Shigella from a genomic standpoint is required. In this study, a S. flexneri type 1a isolate SP1 harboring bla CTX-M-14 , which was recovered from the patient with diarrhea, was subjected to whole genome sequencing. The draft genome assembly of S. flexneri strain SP1 consisted of 4,592,345 bp with a G+C content of 50.46%. RAST analysis revealed the genome contained 4798 coding sequences (CDSs) and 100 RNA-encoding genes. We detected one incomplete prophage and six candidate CRISPR loci in the genome. In vitro antimicrobial susceptibility testing demonstrated that strain SP1 is resistant to ampicillin, amoxicillin/clavulanic acid, cefazolin, ceftriaxone and trimethoprim. In silico analysis detected genes mediating resistance to aminoglycosides, β-lactams, phenicol, tetracycline, sulphonamides, and trimethoprim. The bla CTX-M-14 gene was located on an IncFII2 plasmid. A series of virulence factors were identified in the genome. In this study, we report the whole genome sequence of a bla CTX-M-14 -encoding S. flexneri strain SP1. Dozens of resistance determinants were detected in the genome and may be responsible for the multidrug-resistance of this strain, although further confirmation studies are warranted. Numerous virulence factors identified in the strain suggest that isolate SP1 is potential pathogenic. The availability of the genome sequence and comparative analysis with other S

  12. Wavelength-encoding/temporal-spreading optical code division multiple-access system with in-fiber chirped moiré gratings.

    Science.gov (United States)

    Chen, L R; Smith, P W; de Sterke, C M

    1999-07-20

    We propose an optical code division multiple-access (OCDMA) system that uses in-fiber chirped moiré gratings (CMG's) for encoding and decoding of broadband pulses. In reflection the wavelength-selective and dispersive nature of CMG's can be used to implement wavelength-encoding/temporal-spreading OCDMA. We give examples of codes designed around the constraints imposed by the encoding devices and present numerical simulations that demonstrate the proposed concept.

  13. On the Immortality of Television Sets: “Function” in the Human Genome According to the Evolution-Free Gospel of ENCODE

    Science.gov (United States)

    Graur, Dan; Zheng, Yichen; Price, Nicholas; Azevedo, Ricardo B.R.; Zufall, Rebecca A.; Elhaik, Eran

    2013-01-01

    A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, by committing a logical fallacy known as “affirming the consequent,” by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” by using analytical methods that yield biased errors and inflate estimates of functionality, by favoring statistical sensitivity over specificity, and by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten. PMID:23431001

  14. Note: high precision angle generator using multiple ultrasonic motors and a self-calibratable encoder.

    Science.gov (United States)

    Kim, Jong-Ahn; Kim, Jae Wan; Kang, Chu-Shik; Jin, Jonghan; Eom, Tae Bong

    2011-11-01

    We present an angle generator with high resolution and accuracy, which uses multiple ultrasonic motors and a self-calibratable encoder. A cylindrical air bearing guides a rotational motion, and the ultrasonic motors achieve high resolution over the full circle range with a simple configuration. The self-calibratable encoder can compensate the scale error of a divided circle (signal period: 20") effectively by applying the equal-division-averaged method. The angle generator configures a position feedback control loop using the readout of the encoder. By combining the ac and dc operation mode, the angle generator produced stepwise angular motion with 0.005" resolution. We also evaluated the performance of the angle generator using a precision angle encoder and an autocollimator. The expanded uncertainty (k = 2) in the angle generation was estimated less than 0.03", which included the calibrated scale error and the nonlinearity error. © 2011 American Institute of Physics

  15. Estimating Accurate Target Coordinates with Magnetic Resonance Images by Using Multiple Phase-Encoding Directions during Acquisition.

    Science.gov (United States)

    Kim, Minsoo; Jung, Na Young; Park, Chang Kyu; Chang, Won Seok; Jung, Hyun Ho; Chang, Jin Woo

    2018-06-01

    Stereotactic procedures are image guided, often using magnetic resonance (MR) images limited by image distortion, which may influence targets for stereotactic procedures. The aim of this work was to assess methods of identifying target coordinates for stereotactic procedures with MR in multiple phase-encoding directions. In 30 patients undergoing deep brain stimulation, we acquired 5 image sets: stereotactic brain computed tomography (CT), T2-weighted images (T2WI), and T1WI in both right-to-left (RL) and anterior-to-posterior (AP) phase-encoding directions. Using CT coordinates as a reference, we analyzed anterior commissure and posterior commissure coordinates to identify any distortion relating to phase-encoding direction. Compared with CT coordinates, RL-directed images had more positive x-axis values (0.51 mm in T1WI, 0.58 mm in T2WI). AP-directed images had more negative y-axis values (0.44 mm in T1WI, 0.59 mm in T2WI). We adopted 2 methods to predict CT coordinates with MR image sets: parallel translation and selective choice of axes according to phase-encoding direction. Both were equally effective at predicting CT coordinates using only MR; however, the latter may be easier to use in clinical settings. Acquiring MR in multiple phase-encoding directions and selecting axes according to the phase-encoding direction allows identification of more accurate coordinates for stereotactic procedures. © 2018 S. Karger AG, Basel.

  16. Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies.

    Directory of Open Access Journals (Sweden)

    Florian Mittag

    Full Text Available Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS. However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM as they allow for better subsequent interpretation without significant loss of accuracy.

  17. Fiber Bragg grating for spectral phase optical code-division multiple-access encoding and decoding

    Science.gov (United States)

    Fang, Xiaohui; Wang, Dong-Ning; Li, Shichen

    2003-08-01

    A new method for realizing spectral phase optical code-division multiple-access (OCDMA) coding based on step chirped fiber Bragg gratings (SCFBGs) is proposed and the corresponding encoder/decoder is presented. With this method, a mapping code is introduced for the m-sequence address code and the phase shift can be inserted into the subgratings of the SCFBG according to the mapping code. The transfer matrix method together with Fourier transform is used to investigate the characteristics of the encoder/decoder. The factors that influence the correlation property of the encoder/decoder, including index modulation and bandwidth of the subgrating, are identified. The system structure is simple and good correlation output can be obtained. The performance of the OCDMA system based on SCFBGs has been analyzed.

  18. The Drosophila melanogaster DmCK2beta transcription unit encodes for functionally non-redundant protein isoforms.

    Science.gov (United States)

    Jauch, Eike; Wecklein, Heike; Stark, Felix; Jauch, Mandy; Raabe, Thomas

    2006-06-07

    Genes encoding for the two evolutionary highly conserved subunits of a heterotetrameric protein kinase CK2 holoenzyme are present in all examined eukaryotic genomes. Depending on the organism, multiple transcription units encoding for a catalytically active CK2alpha subunit and/or a regulatory CK2beta subunit may exist. The phosphotransferase activity of members of the protein kinase CK2alpha family is thought to be independent of second messengers but is modulated by interaction with CK2beta-like proteins. In the genome of Drosophila melanogaster, one gene encoding for a CK2alpha subunit and three genes encoding for CK2beta-like proteins are present. The X-linked DmCK2beta transcription unit encodes for several CK2beta protein isoforms due to alternative splicing of its primary transcript. We addressed the question whether CK2beta-like proteins are redundant in function. Our in vivo experiments show that variations of the very C-terminal tail of CK2beta isoforms encoded by the X-linked DmCK2beta transcription unit influence their functional properties. In addition, we find that CK2beta-like proteins encoded by the autosomal D. melanogaster genes CK2betates and CK2beta' cannot fully substitute for a loss of CK2beta isoforms encoded by DmCK2beta.

  19. Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE.

    Science.gov (United States)

    Trinh, Quang M; Jen, Fei-Yang Arthur; Zhou, Ziru; Chu, Kar Ming; Perry, Marc D; Kephart, Ellen T; Contrino, Sergio; Ruzanov, Peter; Stein, Lincoln D

    2013-07-22

    Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition. In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies. Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.

  20. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms

    Directory of Open Access Journals (Sweden)

    Meller Jaroslaw

    2007-03-01

    Full Text Available Abstract Background Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes. Results We present a new tool, Cinteny, for fast identification and analysis of synteny with different sets of markers and various levels of coarse graining of syntenic blocks. Using Hannenhalli-Pevzner approach and its extensions, Cinteny also enables interactive determination of evolutionary relationships between genomes in terms of the number of rearrangements (the reversal distance. In particular, Cinteny provides: i integration of synteny browsing with assessment of evolutionary distances for multiple genomes; ii flexibility to adjust the parameters and re-compute the results on-the-fly; iii ability to work with user provided data, such as orthologous genes, sequence tags or other conserved markers. In addition, Cinteny provides many annotated mammalian, invertebrate and fungal genomes that are pre-loaded and available for analysis at http://cinteny.cchmc.org. Conclusion Cinteny allows one to automatically compare multiple genomes and perform sensitivity analysis for synteny block detection and for the subsequent computation of reversal distances

  1. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria

    Directory of Open Access Journals (Sweden)

    Keeling Patrick J

    2007-09-01

    Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements

  2. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

    Directory of Open Access Journals (Sweden)

    Messeguer Xavier

    2006-10-01

    Full Text Available Abstract Background Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. Results To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. Conclusion M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at: http://alggen.lsi.upc.es/recerca/align/mgcat/intro-mgcat.html.

  3. Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

    Science.gov (United States)

    Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk

    Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.

  4. The genome of the pear (Pyrus bretschneideri Rehd.).

    Science.gov (United States)

    Wu, Jun; Wang, Zhiwen; Shi, Zebin; Zhang, Shu; Ming, Ray; Zhu, Shilin; Khan, M Awais; Tao, Shutian; Korban, Schuyler S; Wang, Hao; Chen, Nancy J; Nishio, Takeshi; Xu, Xun; Cong, Lin; Qi, Kaijie; Huang, Xiaosan; Wang, Yingtao; Zhao, Xiang; Wu, Juyou; Deng, Cao; Gou, Caiyun; Zhou, Weili; Yin, Hao; Qin, Gaihua; Sha, Yuhui; Tao, Ye; Chen, Hui; Yang, Yanan; Song, Yue; Zhan, Dongliang; Wang, Juan; Li, Leiting; Dai, Meisong; Gu, Chao; Wang, Yuezhi; Shi, Daihu; Wang, Xiaowei; Zhang, Huping; Zeng, Liang; Zheng, Danman; Wang, Chunlei; Chen, Maoshan; Wang, Guangbiao; Xie, Lin; Sovero, Valpuri; Sha, Shoufeng; Huang, Wenjiang; Zhang, Shujun; Zhang, Mingyue; Sun, Jiangmei; Xu, Linlin; Li, Yuan; Liu, Xing; Li, Qingsong; Shen, Jiahui; Wang, Junyi; Paull, Robert E; Bennetzen, Jeffrey L; Wang, Jun; Zhang, Shaoling

    2013-02-01

    The draft genome of the pear (Pyrus bretschneideri) using a combination of BAC-by-BAC and next-generation sequencing is reported. A 512.0-Mb sequence corresponding to 97.1% of the estimated genome size of this highly heterozygous species is assembled with 194× coverage. High-density genetic maps comprising 2005 SNP markers anchored 75.5% of the sequence to all 17 chromosomes. The pear genome encodes 42,812 protein-coding genes, and of these, ~28.5% encode multiple isoforms. Repetitive sequences of 271.9 Mb in length, accounting for 53.1% of the pear genome, are identified. Simulation of eudicots to the ancestor of Rosaceae has reconstructed nine ancestral chromosomes. Pear and apple diverged from each other ~5.4-21.5 million years ago, and a recent whole-genome duplication (WGD) event must have occurred 30-45 MYA prior to their divergence, but following divergence from strawberry. When compared with the apple genome sequence, size differences between the apple and pear genomes are confirmed mainly due to the presence of repetitive sequences predominantly contributed by transposable elements (TEs), while genic regions are similar in both species. Genes critical for self-incompatibility, lignified stone cells (a unique feature of pear fruit), sorbitol metabolism, and volatile compounds of fruit have also been identified. Multiple candidate SFB genes appear as tandem repeats in the S-locus region of pear; while lignin synthesis-related gene family expansion and highly expressed gene families of HCT, C3'H, and CCOMT contribute to high accumulation of both G-lignin and S-lignin. Moreover, alpha-linolenic acid metabolism is a key pathway for aroma in pear fruit.

  5. Ebolavirus comparative genomics

    Science.gov (United States)

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035

  6. Natural biased coin encoded in the genome determines cell strategy.

    Directory of Open Access Journals (Sweden)

    Faezeh Dorri

    Full Text Available Decision making at a cellular level determines different fates for isogenic cells. However, it is not yet clear how rational decisions are encoded in the genome, how they are transmitted to their offspring, and whether they evolve and become optimized throughout generations. In this paper, we use a game theoretic approach to explain how rational decisions are made in the presence of cooperators and competitors. Our results suggest the existence of an internal switch that operates as a biased coin. The biased coin is, in fact, a biochemical bistable network of interacting genes that can flip to one of its stable states in response to different environmental stimuli. We present a framework to describe how the positions of attractors in such a gene regulatory network correspond to the behavior of a rational player in a competing environment. We evaluate our model by considering lysis/lysogeny decision making of bacteriophage lambda in E. coli.

  7. Genome sequence of the tsetse fly (Glossina morsitans ): Vector of African trypanosomiasis

    KAUST Repository

    Watanabe, Junichi

    2014-04-24

    Tsetse flies are the sole vectors of human African trypanosomiasis throughout sub-Saharan Africa. Both sexes of adult tsetse feed exclusively on blood and contribute to disease transmission. Notable differences between tsetse and other disease vectors include obligate microbial symbioses, viviparous reproduction, and lactation. Here, we describe the sequence and annotation of the 366-megabase Glossina morsitans morsitans genome. Analysis of the genome and the 12,308 predicted protein-encoding genes led to multiple discoveries, including chromosomal integrations of bacterial (Wolbachia) genome sequences, a family of lactation-specific proteins, reduced complement of host pathogen recognition proteins, and reduced olfaction/chemosensory associated genes. These genome data provide a foundation for research into trypanosomiasis prevention and yield important insights with broad implications for multiple aspects of tsetse biology.

  8. Genome sequence of the tsetse fly (Glossina morsitans ): Vector of African trypanosomiasis

    KAUST Repository

    Watanabe, Junichi; Hattori, Masahira; Berriman, Matthew; Lehane, Michael J.; Hall, Neil; Solano, Philippe; Aksoy, Serap; Hide, Winston; Touré , Yé ya Tié moko; Attardo, Geoffrey M.; Darby, Alistair Charles; Toyoda, Atsushi; Hertz-Fowler, Christiane; Larkin, Denis M.; Cotton, James A.; Sanders, Mandy J.; Swain, Martin T.; Quail, Michael A.; Inoue, Noboru; Ravel, Sophie; Taylor, Todd Duane; Srivastava, Tulika P.; Sharma, Vineet Kumar; Warren, Wesley C.; Wilson, Richard K.; Suzuki, Yutaka; Lawson, Daniel; Hughes, Daniel Seth Toney; Megy, Karyn; Masiga, Daniel K.; Mireji, Paul Odhiambo; Hansen, Immo Alex; Van Den Abbeele, Jan; Benoit, Joshua B.; Bourtzis, Kostas; Obiero, George F O; Robertson, Hugh M.; Jones, Jeffery W.; Zhou, Jingjiang; Field, Linda M.; Friedrich, Markus; Nyanjom, Steven R G; Telleria, Erich Loza; Caljon, Guy; Ribeiro, José M. C.; Acosta-Serrano, Alvaro; Ooi, Cherpheng; Rose, Clair; Price, David P.; Haines, Lee Rafuse; Christoffels, Alan G.; Sim, Cheolho; Pham, Daphne Q D; Denlinger, David L.; Geiser, Dawn L.; Omedo, Irene A.; Winzerling, Joy J.; Peyton, Justin T.; Marucha, Kevin K.; Jonas, Mario; Meuti, Megan E.; Rawlings, Neil David; Zhang, Qirui; Macharia, Rosaline Wanjiru; Michalkova, Veronika; Dashti, Zahra Jalali Sefid; Baumann, Aaron A.; Gä de, Gerd; Marco, Heather G.; Caers, Jelle; Schoofs, Liliane; Riehle, Michael A.; Hu, Wanqi; Tu, Zhijian; Tarone, Aaron M.; Malacrida, Anna Rodolfa; Kibet, Caleb K.; Scolari, Francesca; Koekemoer, J. J. O.; Willis, Judith H.; Gomulski, Ludvik M.; Falchetto, Marco; Scott, Maxwell J.; Fu, Shuhua; Sze, Singhoi; Luiz, Thiago; Weiss, Brian L.; Walshe, Deirdre P.; Wang, Jingwen; Wamalwa, Mark; Mwangi, Sarah; Ramphul, Urvashi N.; Snyder, Anna K.; Brelsfoard, Corey L.; Thomas, Gavin H.; Tsiamis, George; Arensburger, Peter; Rio, Rita V M; Macdonald, Sandy J.; Panji, Sumir; Kruger, Adele F.; Benkahla, Alia; Balyeidhusa, Apollo Simon Peter; Msangi, Atway R.; Okoro, Chinyere K.; Stephens, Dawn; Stanley, Eleanor J.; Mpondo, Feziwe; Wamwiri, Florence N.; Mramba, Furaha; Siwo, Geoffrey H.; Githinji, George; Harkins, Gordon William; Murilla, Grace Adira; Lehvä slaiho, Heikki; Malele, Imna I.; Auma, Joanna Eseri; Kinyua, Johnson K.; Ouma, Johnson O.; Okedi, Loyce M A; Manga, Lucien; Aslett, Martin A.; Koffi, Mathurin; Gaunt, Michael W.; Makgamathe, Mmule; Mulder, Nicola Jane; Manangwa, Oliver; Abila, Patrick P.; Wincker, Patrick; Gregory, Richard I.; Bateta, Rosemary; Sakate, Ryuichi; Ommeh, Sheila; Lehane, Stella M.; Imanishi, Tadashi; Osamor, Victor Chukwudi; Kawahara, Yoshihiro

    2014-01-01

    Tsetse flies are the sole vectors of human African trypanosomiasis throughout sub-Saharan Africa. Both sexes of adult tsetse feed exclusively on blood and contribute to disease transmission. Notable differences between tsetse and other disease vectors include obligate microbial symbioses, viviparous reproduction, and lactation. Here, we describe the sequence and annotation of the 366-megabase Glossina morsitans morsitans genome. Analysis of the genome and the 12,308 predicted protein-encoding genes led to multiple discoveries, including chromosomal integrations of bacterial (Wolbachia) genome sequences, a family of lactation-specific proteins, reduced complement of host pathogen recognition proteins, and reduced olfaction/chemosensory associated genes. These genome data provide a foundation for research into trypanosomiasis prevention and yield important insights with broad implications for multiple aspects of tsetse biology.

  9. The Saccharomyces Genome Database Variant Viewer.

    Science.gov (United States)

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  11. Genome sequence of an enhancin gene-rich nucleopolyhedrovirus (NPV) from Agrotis segetum: collinearity with Spodoptera exigua multiple NPV

    NARCIS (Netherlands)

    Jakubowska, A.K.; Peters, S.A.; Ziemnicka, J.; Vlak, J.M.; Oers, van M.M.

    2006-01-01

    The genome sequence of a Polish isolate of Agrotis segetum nucleopolyhedrovirus (AgseNPV-A) was determined and analysed. The circular genome is composed of 147 544 bp and has a G+C content of 45¿7 mol%. It contains 153 putative, non-overlapping open reading frames (ORFs) encoding predicted proteins

  12. ENCODE: A Sourcebook of Epigenomes and Chromatin Language

    Directory of Open Access Journals (Sweden)

    Maryam Yavartanoo

    2013-03-01

    Full Text Available Until recently, since the Human Genome Project, the general view has been that the majority of the human genome is composed of junk DNA and has little or no selective advantage to the organism. Now we know that this conclusion is an oversimplification. In April 2003, the National Human Genome Research Institute (NHGRI launched an international research consortium called Encyclopedia of DNA Elements (ENCODE to uncover non-coding functional elements in the human genome. The result of this project has identified a set of new DNA regulatory elements, based on novel relationships among chromatin accessibility, histone modifications, nucleosome positioning, DNA methylation, transcription, and the occupancy of sequence-specific factors. The project gives us new insights into the organization and regulation of the human genome and epigenome. Here, we sought to summarize particular aspects of the ENCODE project and highlight the features and data that have recently been released. At the end of this review, we have summarized a case study we conducted using the ENCODE epigenome data.

  13. A novel gene encoding a TIG multiple domain protein is a positional candidate for autosomal recessive polycystic kidney disease.

    Science.gov (United States)

    Xiong, Huaqi; Chen, Yongxiong; Yi, Yajun; Tsuchiya, Karen; Moeckel, Gilbert; Cheung, Joseph; Liang, Dan; Tham, Kyi; Xu, Xiaohu; Chen, Xing-Zhen; Pei, York; Zhao, Zhizhuang Jeo; Wu, Guanqing

    2002-07-01

    Autosomal recessive polycystic kidney disease (ARPKD) is a common hereditary renal cystic disease in infants and children. By genetic linkage analyses, the gene responsible for this disease, termed polycystic kidney and hepatic disease 1 (PKHD1), was mapped on human chromosome 6p21.1-p12, and has been further localized to a 1-cM genetic interval flanked by the D6S1714/D6S243 (telomeric) and D6S1024 (centromeric) markers. We recently identified a novel gene in this genetic interval from kidney cDNA, using cloning strategies. The gene PKHD1 (PKHD1-tentative) encodes a novel 3396-amino-acid protein with no apparent homology with any known proteins. We named its gene product "tigmin" because it contains multiple TIG domains, which usually are seen in proteins containing immunoglobulin-like folds. PKHD1 encodes an 11.6-kb transcript and is composed of 61 exons spanning an approximately 365-kb genomic region on chromosome 6p12-p11.2 adjacent to the marker D6S1714. Northern blot analyses demonstrated that the gene has discrete bands with one peak signal at approximately 11 kb, indicating that PKHD1 is likely to have multiple alternative transcripts. PKHD1 is highly expressed in adult and infant kidneys and weakly expressed in liver in northern blot analysis. This expression pattern parallels the tissue involvement observed in ARPKD. In situ hybridization analysis further revealed that the expression of PKHD1 in the kidney is mainly localized to the epithelial cells of the collecting duct, the specific tubular segment involved in cyst formation in ARPKD. These features of PKHD1 make it a strong positional candidate gene for ARPKD.

  14. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    Energy Technology Data Exchange (ETDEWEB)

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  15. Identification of functional elements and regulatory circuits by Drosophila modENCODE.

    Science.gov (United States)

    Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V; Kheradpour, Pouya; Negre, Nicolas; Eaton, Matthew L; Landolin, Jane M; Bristow, Christopher A; Ma, Lijia; Lin, Michael F; Washietl, Stefan; Arshinoff, Bradley I; Ay, Ferhat; Meyer, Patrick E; Robine, Nicolas; Washington, Nicole L; Di Stefano, Luisa; Berezikov, Eugene; Brown, Christopher D; Candeias, Rogerio; Carlson, Joseph W; Carr, Adrian; Jungreis, Irwin; Marbach, Daniel; Sealfon, Rachel; Tolstorukov, Michael Y; Will, Sebastian; Alekseyenko, Artyom A; Artieri, Carlo; Booth, Benjamin W; Brooks, Angela N; Dai, Qi; Davis, Carrie A; Duff, Michael O; Feng, Xin; Gorchakov, Andrey A; Gu, Tingting; Henikoff, Jorja G; Kapranov, Philipp; Li, Renhua; MacAlpine, Heather K; Malone, John; Minoda, Aki; Nordman, Jared; Okamura, Katsutomo; Perry, Marc; Powell, Sara K; Riddle, Nicole C; Sakai, Akiko; Samsonova, Anastasia; Sandler, Jeremy E; Schwartz, Yuri B; Sher, Noa; Spokony, Rebecca; Sturgill, David; van Baren, Marijke; Wan, Kenneth H; Yang, Li; Yu, Charles; Feingold, Elise; Good, Peter; Guyer, Mark; Lowdon, Rebecca; Ahmad, Kami; Andrews, Justen; Berger, Bonnie; Brenner, Steven E; Brent, Michael R; Cherbas, Lucy; Elgin, Sarah C R; Gingeras, Thomas R; Grossman, Robert; Hoskins, Roger A; Kaufman, Thomas C; Kent, William; Kuroda, Mitzi I; Orr-Weaver, Terry; Perrimon, Norbert; Pirrotta, Vincenzo; Posakony, James W; Ren, Bing; Russell, Steven; Cherbas, Peter; Graveley, Brenton R; Lewis, Suzanna; Micklem, Gos; Oliver, Brian; Park, Peter J; Celniker, Susan E; Henikoff, Steven; Karpen, Gary H; Lai, Eric C; MacAlpine, David M; Stein, Lincoln D; White, Kevin P; Kellis, Manolis

    2010-12-24

    To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.

  16. HAL: a hierarchical format for storing and analyzing multiple genome alignments.

    Science.gov (United States)

    Hickey, Glenn; Paten, Benedict; Earl, Dent; Zerbino, Daniel; Haussler, David

    2013-05-15

    Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. hickey@soe.ucsc.edu or haussler@soe.ucsc.edu Supplementary data are available at Bioinformatics online.

  17. [Investigation of RNA viral genome amplification by multiple displacement amplification technique].

    Science.gov (United States)

    Pang, Zheng; Li, Jian-Dong; Li, Chuan; Liang, Mi-Fang; Li, De-Xin

    2013-06-01

    In order to facilitate the detection of newly emerging or rare viral infectious diseases, a negative-strand RNA virus-severe fever with thrombocytopenia syndrome bunyavirus, and a positive-strand RNA virus-dengue virus, were used to investigate RNA viral genome unspecific amplification by multiple displacement amplification technique from clinical samples. Series of 10-fold diluted purified viral RNA were utilized as analog samples with different pathogen loads, after a series of reactions were sequentially processed, single-strand cDNA, double-strand cDNA, double-strand cDNA treated with ligation without or with supplemental RNA were generated, then a Phi29 DNA polymerase depended isothermal amplification was employed, and finally the target gene copies were detected by real time PCR assays to evaluate the amplification efficiencies of various methods. The results showed that multiple displacement amplification effects of single-strand or double-strand cDNA templates were limited, while the fold increases of double-strand cDNA templates treated with ligation could be up to 6 X 10(3), even 2 X 10(5) when supplemental RNA existed, and better results were obtained when viral RNA loads were lower. A RNA viral genome amplification system using multiple displacement amplification technique was established in this study and effective amplification of RNA viral genome with low load was achieved, which could provide a tool to synthesize adequate viral genome for multiplex pathogens detection.

  18. Comparative genomic analysis of multiple strains of two unusual plant pathogens: Pseudomonas corrugata and Pseudomonas mediterranea

    Science.gov (United States)

    Trantas, Emmanouil A.; Licciardello, Grazia; Almeida, Nalvo F.; Witek, Kamil; Strano, Cinzia P.; Duxbury, Zane; Ververidis, Filippos; Goumas, Dimitrios E.; Jones, Jonathan D. G.; Guttman, David S.; Catara, Vittoria; Sarris, Panagiotis F.

    2015-01-01

    The non-fluorescent pseudomonads, Pseudomonas corrugata (Pcor) and P. mediterranea (Pmed), are closely related species that cause pith necrosis, a disease of tomato that causes severe crop losses. However, they also show strong antagonistic effects against economically important pathogens, demonstrating their potential for utilization as biological control agents. In addition, their metabolic versatility makes them attractive for the production of commercial biomolecules and bioremediation. An extensive comparative genomics study is required to dissect the mechanisms that Pcor and Pmed employ to cause disease, prevent disease caused by other pathogens, and to mine their genomes for genes that encode proteins involved in commercially important chemical pathways. Here, we present the draft genomes of nine Pcor and Pmed strains from different geographical locations. This analysis covered significant genetic heterogeneity and allowed in-depth genomic comparison. All examined strains were able to trigger symptoms in tomato plants but not all induced a hypersensitive-like response in Nicotiana benthamiana. Genome-mining revealed the absence of type III secretion system and known type III effector-encoding genes from all examined Pcor and Pmed strains. The lack of a type III secretion system appears to be unique among the plant pathogenic pseudomonads. Several gene clusters coding for type VI secretion system were detected in all genomes. Genome-mining also revealed the presence of gene clusters for biosynthesis of siderophores, polyketides, non-ribosomal peptides, and hydrogen cyanide. A highly conserved quorum sensing system was detected in all strains, although species specific differences were observed. Our study provides the basis for in-depth investigations regarding the molecular mechanisms underlying virulence strategies in the battle between plants and microbes. PMID:26300874

  19. Immunoglobulin superfamily members encoded by viruses and their multiple roles in immune evasion.

    Science.gov (United States)

    Farré, Domènec; Martínez-Vicente, Pablo; Engel, Pablo; Angulo, Ana

    2017-05-01

    Pathogens have developed a plethora of strategies to undermine host immune defenses in order to guarantee their survival. For large DNA viruses, these immune evasion mechanisms frequently rely on the expression of genes acquired from host genomes. Horizontally transferred genes include members of the immunoglobulin superfamily, whose products constitute the most diverse group of proteins of vertebrate genomes. Their promiscuous immunoglobulin domains, which comprise the building blocks of these molecules, are involved in a large variety of functions mediated by ligand-binding interactions. The flexible structural nature of the immunoglobulin domains makes them appealing targets for viral capture due to their capacity to generate high functional diversity. Here, we present an up-to-date review of immunoglobulin superfamily gene homologs encoded by herpesviruses, poxviruses, and adenoviruses, that include CD200, CD47, Fc receptors, interleukin-1 receptor 2, interleukin-18 binding protein, CD80, carcinoembryonic antigen-related cell adhesion molecules, and signaling lymphocyte activation molecules. We discuss their distinct structural attributes, binding properties, and functions, shaped by evolutionary pressures to disarm specific immune pathways. We include several novel genes identified from extensive genome database surveys. An understanding of the properties and modes of action of these viral proteins may guide the development of novel immune-modulatory therapeutic tools. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Genome Context Viewer: visual exploration of multiple annotated genomes using microsynteny.

    Science.gov (United States)

    Cleary, Alan; Farmer, Andrew

    2018-05-01

    The Genome Context Viewer is a visual data-mining tool that allows users to search across multiple providers of genome data for regions with similarly annotated content that may be aligned and visualized at the level of their shared functional elements. By handling ordered sequences of gene family memberships as a unit of search and comparison, the user interface enables quick and intuitive assessment of the degree of gene content divergence and the presence of various types of structural events within syntenic contexts. Insights into functionally significant differences seen at this level of abstraction can then serve to direct the user to more detailed explorations of the underlying data in other interconnected, provider-specific tools. GCV is provided under the GNU General Public License version 3 (GPL-3.0). Source code is available at https://github.com/legumeinfo/lis_context_viewer. adf@ncgr.org. Supplementary data are available at Bioinformatics online.

  1. The Genome of Deep-Sea Vent Chemolithoautotroph Thiomicrospiracrunogena XCL-2

    Energy Technology Data Exchange (ETDEWEB)

    Scott, Kathleen M.; Sievert, Stefan M.; Abril, Fereniki N.; Ball,Lois A.; Barrett, Chantell J.; Blake, Rodrigo A.; Boller, Amanda J.; Chain, Patrick S.G.; Clark, Justine A.; Davis, Carisa R.; Detter, Chris; Do, Kimberly F.; Dobrinski, Kimberly P.; Faza, BrandonI.; Fitzpatrick,Kelly A.; Freyermuth, Sharyn K.; Harmer, Tara L.; Hauser, Loren J.; Hugler, Michael; Kerfeld, Cheryl A.; Klotz, Martin G.; Kong, William W.; Land, Miriam; Lapidus, Alla; Larimer, Frank W.; Longo, Dana L.; Lucas,Susan; Malfatti, Stephanie A.; Massey, Steven E.; Martin, Darlene D.; McCuddin, Zoe; Meyer, Folker; Moore, Jessica L.; Ocampo, Luis H.; Paul,John H.; Paulsen, Ian T.; Reep, Douglas K.; Ren, Qinghu; Ross, Rachel L.; Sato, Priscila Y.; Thomas, Phaedra; Tinkham, Lance E.; Zeruth, Gary T.

    2006-08-23

    Presented here is the complete genome sequence ofThiomicrospira crunogena XCL-2, representative of ubiquitouschemolithoautotrophic sulfur-oxidizing bacteria isolated from deep-seahydrothermal vents. This gammaproteobacterium has a single chromosome(2,427,734 bp), and its genome illustrates many of the adaptations thathave enabled it to thrive at vents globally. It has 14 methyl-acceptingchemotaxis protein genes, including four that may assist in positioningit in the redoxcline. A relative abundance of CDSs encoding regulatoryproteins likely control the expression of genes encoding carboxysomes,multiple dissolved inorganic nitrogen and phosphate transporters, as wellas a phosphonate operon, which provide this species with a variety ofoptions for acquiring these substrates from the environment. T. crunogenaXCL-2 is unusual among obligate sulfur oxidizing bacteria in relying onthe Sox system for the oxidation of reduced sulfur compounds. A 38 kbprophage is present, and a high level of prophage induction was observed,which may play a role in keeping competing populations of close relativesin check. The genome has characteristics consistent with an obligatelychemolithoautotrophic lifestyle, including few transporters predicted tohave organic allocrits, and Calvin-Benson-Bassham cycle CDSs scatteredthroughout the genome.

  2. A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.

    Science.gov (United States)

    Moraes, Fernanda; Góes, Andréa

    2016-05-06

    The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.

  3. Analysis of the genetic variation in Mycobacterium tuberculosis strains by multiple genome alignments

    Directory of Open Access Journals (Sweden)

    Morales Juan

    2008-11-01

    Full Text Available Abstract Background The recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C, along with the genomes of laboratory strains (H37Rv and H37Ra, provides new insights on the mechanisms of adaptation of this bacterium to the human host. Findings The genetic variation found in six M. tuberculosis strains does not involve significant genomic rearrangements. Most of the variation results from deletion and transposition events preferentially associated with insertion sequences and genes of the PE/PPE family but not with genes implicated in virulence. Using a Perl-based software islandsanalyser, which creates a representation of the genetic variation in the genome, we identified differences in the patterns of distribution and frequency of the polymorphisms across the genome. The identification of genes displaying strain-specific polymorphisms and the extrapolation of the number of strain-specific polymorphisms to an unlimited number of genomes indicates that the different strains contain a limited number of unique polymorphisms. Conclusion The comparison of multiple genomes demonstrates that the M. tuberculosis genome is currently undergoing an active process of gene decay, analogous to the adaptation process of obligate bacterial symbionts. This observation opens new perspectives into the evolution and the understanding of the pathogenesis of this bacterium.

  4. The complete sequence of the first Spodoptera frugiperda Betabaculovirus genome: a natural multiple recombinant virus.

    Science.gov (United States)

    Cuartas, Paola E; Barrera, Gloria P; Belaich, Mariano N; Barreto, Emiliano; Ghiringhelli, Pablo D; Villamizar, Laura F

    2015-01-20

    Spodoptera frugiperda (Lepidoptera: Noctuidae) is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008) has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV). The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs) and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs) and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs), 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness.

  5. VISTA - computational tools for comparative genomics

    Energy Technology Data Exchange (ETDEWEB)

    Frazer, Kelly A.; Pachter, Lior; Poliakov, Alexander; Rubin,Edward M.; Dubchak, Inna

    2004-01-01

    Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/VISTA/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, submit their own sequences of interest to several VISTA servers for various types of comparative analysis, and obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kilobase (kb) interval on human chromosome 5 that encodes for the kinesin family member3A (KIF3A) protein.

  6. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

    Directory of Open Access Journals (Sweden)

    Borowski Krzysztof

    2008-06-01

    Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

  7. Single-cell genomics reveals pyrrolysine-encoding potential in members of uncultivated archaeal candidate division MSBL1

    KAUST Repository

    Guan, Yue

    2017-05-11

    Pyrrolysine (Pyl), the 22nd canonical amino acid, is only decoded and synthesized by a limited number of organisms in the domains Archaea and Bacteria. Pyl is encoded by the amber codon UAG, typically a stop codon. To date, all known Pyl-decoding archaea are able to carry out methylotrophic methanogenesis. The functionality of methylamine methyltransferases, an important component of corrinoid-dependent methyltransfer reactions, depends on the presence of Pyl. Here, we present a putative pyl gene cluster obtained from single-cell genomes of the archaeal Mediterranean Sea Brine Lakes group 1 (MSBL1) from the Red Sea. Functional annotation of the MSBL1 single cell amplified genomes (SAGs) also revealed a complete corrinoid-dependent methyl-transfer pathway suggesting that members of MSBL1 may possibly be capable of synthesizing Pyl and metabolizing methylated amines. This article is protected by copyright. All rights reserved.

  8. Single-cell genomics reveals pyrrolysine-encoding potential in members of uncultivated archaeal candidate division MSBL1

    KAUST Repository

    Guan, Yue; Haroon, Mohamed; Alam, Intikhab; Ferry, James G.; Stingl, Ulrich

    2017-01-01

    Pyrrolysine (Pyl), the 22nd canonical amino acid, is only decoded and synthesized by a limited number of organisms in the domains Archaea and Bacteria. Pyl is encoded by the amber codon UAG, typically a stop codon. To date, all known Pyl-decoding archaea are able to carry out methylotrophic methanogenesis. The functionality of methylamine methyltransferases, an important component of corrinoid-dependent methyltransfer reactions, depends on the presence of Pyl. Here, we present a putative pyl gene cluster obtained from single-cell genomes of the archaeal Mediterranean Sea Brine Lakes group 1 (MSBL1) from the Red Sea. Functional annotation of the MSBL1 single cell amplified genomes (SAGs) also revealed a complete corrinoid-dependent methyl-transfer pathway suggesting that members of MSBL1 may possibly be capable of synthesizing Pyl and metabolizing methylated amines. This article is protected by copyright. All rights reserved.

  9. A Legionella pneumophila effector protein encoded in a region of genomic plasticity binds to Dot/Icm-modified vacuoles.

    Directory of Open Access Journals (Sweden)

    Shira Ninio

    2009-01-01

    Full Text Available Legionella pneumophila is an opportunistic pathogen that can cause a severe pneumonia called Legionnaires' disease. In the environment, L. pneumophila is found in fresh water reservoirs in a large spectrum of environmental conditions, where the bacteria are able to replicate within a variety of protozoan hosts. To survive within eukaryotic cells, L. pneumophila require a type IV secretion system, designated Dot/Icm, that delivers bacterial effector proteins into the host cell cytoplasm. In recent years, a number of Dot/Icm substrate proteins have been identified; however, the function of most of these proteins remains unknown, and it is unclear why the bacterium maintains such a large repertoire of effectors to promote its survival. Here we investigate a region of the L. pneumophila chromosome that displays a high degree of plasticity among four sequenced L. pneumophila strains. Analysis of GC content suggests that several genes encoded in this region were acquired through horizontal gene transfer. Protein translocation studies establish that this region of genomic plasticity encodes for multiple Dot/Icm effectors. Ectopic expression studies in mammalian cells indicate that one of these substrates, a protein called PieA, has unique effector activities. PieA is an effector that can alter lysosome morphology and associates specifically with vacuoles that support L. pneumophila replication. It was determined that the association of PieA with vacuoles containing L. pneumophila requires modifications to the vacuole mediated by other Dot/Icm effectors. Thus, the localization properties of PieA reveal that the Dot/Icm system has the ability to spatially and temporally control the association of an effector with vacuoles containing L. pneumophila through activities mediated by other effector proteins.

  10. Genomic polymorphism, recombination, and linkage disequilibrium in human major histocompatibility complex-encoded antigen-processing genes.

    Science.gov (United States)

    van Endert, P M; Lopez, M T; Patel, S D; Monaco, J J; McDevitt, H O

    1992-01-01

    Recently, two subunits of a large cytosolic protease and two putative peptide transporter proteins were found to be encoded by genes within the class II region of the major histocompatibility complex (MHC). These genes have been suggested to be involved in the processing of antigenic proteins for presentation by MHC class I molecules. Because of the high degree of polymorphism in MHC genes, and previous evidence for both functional and polypeptide sequence polymorphism in the proteins encoded by the antigen-processing genes, we tested DNA from 27 consanguineous human cell lines for genomic polymorphism by restriction fragment length polymorphism (RFLP) analysis. These studies demonstrate a strong linkage disequilibrium between TAP1 and LMP2 RFLPs. Moreover, RFLPs, as well as a polymorphic stop codon in the telomeric TAP2 gene, appear to be in linkage disequilibrium with HLA-DR alleles and RFLPs in the HLA-DO gene. A high rate of recombination, however, seems to occur in the center of the complex, between the TAP1 and TAP2 genes. Images PMID:1360671

  11. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  12. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  13. Versatile protein recognition by the encoded display of multiple chemical elements on a constant macrocyclic scaffold

    Science.gov (United States)

    Li, Yizhou; De Luca, Roberto; Cazzamalli, Samuele; Pretto, Francesca; Bajic, Davor; Scheuermann, Jörg; Neri, Dario

    2018-03-01

    In nature, specific antibodies can be generated as a result of an adaptive selection and expansion of lymphocytes with suitable protein binding properties. We attempted to mimic antibody-antigen recognition by displaying multiple chemical diversity elements on a defined macrocyclic scaffold. Encoding of the displayed combinations was achieved using distinctive DNA tags, resulting in a library size of 35,393,112. Specific binders could be isolated against a variety of proteins, including carbonic anhydrase IX, horseradish peroxidase, tankyrase 1, human serum albumin, alpha-1 acid glycoprotein, calmodulin, prostate-specific antigen and tumour necrosis factor. Similar to antibodies, the encoded display of multiple chemical elements on a constant scaffold enabled practical applications, such as fluorescence microscopy procedures or the selective in vivo delivery of payloads to tumours. Furthermore, the versatile structure of the scaffold facilitated the generation of protein-specific chemical probes, as illustrated by photo-crosslinking.

  14. Endogenous viral elements in animal genomes.

    Directory of Open Access Journals (Sweden)

    Aris Katzourakis

    2010-11-01

    Full Text Available Integration into the nuclear genome of germ line cells can lead to vertical inheritance of retroviral genes as host alleles. For other viruses, germ line integration has only rarely been documented. Nonetheless, we identified endogenous viral elements (EVEs derived from ten non-retroviral families by systematic in silico screening of animal genomes, including the first endogenous representatives of double-stranded RNA, reverse-transcribing DNA, and segmented RNA viruses, and the first endogenous DNA viruses in mammalian genomes. Phylogenetic and genomic analysis of EVEs across multiple host species revealed novel information about the origin and evolution of diverse virus groups. Furthermore, several of the elements identified here encode intact open reading frames or are expressed as mRNA. For one element in the primate lineage, we provide statistically robust evidence for exaptation. Our findings establish that genetic material derived from all known viral genome types and replication strategies can enter the animal germ line, greatly broadening the scope of paleovirological studies and indicating a more significant evolutionary role for gene flow from virus to animal genomes than has previously been recognized.

  15. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  16. Heterodyne detection using spectral line pairing for spectral phase encoding optical code division multiple access and dynamic dispersion compensation.

    Science.gov (United States)

    Yang, Yi; Foster, Mark; Khurgin, Jacob B; Cooper, A Brinton

    2012-07-30

    A novel coherent optical code-division multiple access (OCDMA) scheme is proposed that uses spectral line pairing to generate signals suitable for heterodyne decoding. Both signal and local reference are transmitted via a single optical fiber and a simple balanced receiver performs sourceless heterodyne detection, canceling speckle noise and multiple-access interference (MAI). To validate the idea, a 16 user fully loaded phase encoded system is simulated. Effects of fiber dispersion on system performance are studied as well. Both second and third order dispersion management is achieved by using a spectral phase encoder to adjust phase shifts of spectral components at the optical network unit (ONU).

  17. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets.

    Science.gov (United States)

    Khan, Aziz; Mathelier, Anthony

    2017-05-31

    A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene , with the web application available at https://asntech.shinyapps.io/intervene .

  18. Comparison of closely related, uncultivated Coxiella tick endosymbiont population genomes reveals clues about the mechanisms of symbiosis.

    Science.gov (United States)

    Tsementzi, Despina; Castro Gordillo, Juan; Mahagna, Mustafa; Gottlieb, Yuval; Konstantinidis, Konstantinos T

    2018-05-01

    Understanding the symbiotic interaction between Coxiella-like endosymbionts (CLE) and their tick hosts is challenging due to lack of isolates and difficulties in tick functional assays. Here we sequenced the metagenome of a CLE population from wild Rhipicephalus sanguineus ticks (CRs) and compared it to the previously published genome of its close relative, CLE of R. turanicus (CRt). The tick hosts are closely related sympatric species, and their two endosymbiont genomes are highly similar with only minor differences in gene content. Both genomes encode numerous pseudogenes, consistent with an ongoing genome reduction process. In silico flux balance metabolic analysis (FBA) revealed the excess production of L-proline for both genomes, indicating a possible proline transport from Coxiella to the tick. Additionally, both CR genomes encode multiple copies of the proline/betaine transporter, proP gene. Modelling additional Coxiellaceae members including other tick CLE, did not identify proline as an excreted metabolite. Although both CRs and CRt genomes encode intact B vitamin synthesis pathway genes, which are presumed to underlay the mechanism of CLE-tick symbiosis, the FBA analysis indicated no changes for their products. Therefore, this study provides new testable hypotheses for the symbiosis mechanism and a better understanding of CLE genome evolution and diversity. © 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.

  19. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

    Science.gov (United States)

    Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C.; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

    2018-01-01

    In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. PMID:29097376

  20. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems.

    Science.gov (United States)

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

    2018-01-04

    In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. Copyright © 2018 Montesinos-Lopez et al.

  1. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

    Directory of Open Access Journals (Sweden)

    Osval A. Montesinos-López

    2018-01-01

    Full Text Available In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF and the matrix factorization algorithm (MF in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

  2. The Complete Sequence of the First Spodoptera frugiperda Betabaculovirus Genome: A Natural Multiple Recombinant Virus

    Directory of Open Access Journals (Sweden)

    Paola E. Cuartas

    2015-01-01

    Full Text Available Spodoptera frugiperda (Lepidoptera: Noctuidae is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008 has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV. The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs, 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness.

  3. Genome complexity in the coelacanth is reflected in its adaptive immune system

    Science.gov (United States)

    Saha, Nil Ratan; Ota, Tatsuya; Litman, Gary W.; Hansen, John; Parra, Zuly; Hsu, Ellen; Buonocore, Francesco; Canapa, Adriana; Cheng, Jan-Fang; Amemiya, Chris T.

    2014-01-01

    We have analyzed the available genome and transcriptome resources from the coelacanth in order to characterize genes involved in adaptive immunity. Two highly distinctive IgW-encoding loci have been identified that exhibit a unique genomic organization, including a multiplicity of tandemly repeated constant region exons. The overall organization of the IgW loci precludes typical heavy chain class switching. A locus encoding IgM could not be identified either computationally or by using several different experimental strategies. Four distinct sets of genes encoding Ig light chains were identified. This includes a variant sigma-type Ig light chain previously identified only in cartilaginous fishes and which is now provisionally denoted sigma-2. Genes encoding α/β and γ/δ T-cell receptors, and CD3, CD4, and CD8 co-receptors also were characterized. Ig heavy chain variable region genes and TCR components are interspersed within the TCR α/δ locus; this organization previously was reported only in tetrapods and raises questions regarding evolution and functional cooption of genes encoding variable regions. The composition, organization and syntenic conservation of the major histocompatibility complex locus have been characterized. We also identified large numbers of genes encoding cytokines and their receptors, and other genes associated with adaptive immunity. In terms of sequence identity and organization, the adaptive immune genes of the coelacanth more closely resemble orthologous genes in tetrapods than those in teleost fishes, consistent with current phylogenomic interpretations. Overall, the work reported described herein highlights the complexity inherent in the coelacanth genome and provides a rich catalog of immune genes for future investigations.

  4. Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

    Directory of Open Access Journals (Sweden)

    Nordlund Henri R

    2005-03-01

    Full Text Available Abstract Background A chicken egg contains several biotin-binding proteins (BBPs, whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins.

  5. Multiple genes encode the major surface glycoprotein of Pneumocystis carinii

    DEFF Research Database (Denmark)

    Kovacs, J A; Powell, F; Edman, J C

    1993-01-01

    hydrophobic region at the carboxyl terminus. The presence of multiple related msg genes encoding the major surface glycoprotein of P. carinii suggests that antigenic variation is a possible mechanism for evading host defenses. Further characterization of this family of genes should allow the development......The major surface antigen of Pneumocystis carinii, a life-threatening opportunistic pathogen in human immunodeficiency virus-infected patients, is an abundant glycoprotein that functions in host-organism interactions. A monoclonal antibody to this antigen is protective in animals, and thus...... blot studies using chromosomal or restricted DNA, the major surface glycoproteins are the products of a multicopy family of genes. The predicted protein has an M(r) of approximately 123,000, is relatively rich in cysteine residues (5.5%) that are very strongly conserved, and contains a well conserved...

  6. Genome-wide analysis reveals loci encoding anti-macrophage factors in the human pathogen Burkholderia pseudomallei K96243.

    Directory of Open Access Journals (Sweden)

    Andrea J Dowling

    2010-12-01

    Full Text Available Burkholderia pseudomallei is an important human pathogen whose infection biology is still poorly understood. The bacterium is endemic to tropical regions, including South East Asia and Northern Australia, where it causes melioidosis, a serious disease associated with both high mortality and antibiotic resistance. B. pseudomallei is a Gram-negative facultative intracellular pathogen that is able to replicate in macrophages. However despite the critical nature of its interaction with macrophages, few anti-macrophage factors have been characterized to date. Here we perform a genome-wide gain of function screen of B. pseudomallei strain K96243 to identify loci encoding factors with anti-macrophage activity. We identify a total of 113 such loci scattered across both chromosomes, with positive gene clusters encoding transporters and secretion systems, enzymes/toxins, secondary metabolite, biofilm, adhesion and signal response related factors. Further phenotypic analysis of four of these regions shows that the encoded factors cause striking cellular phenotypes relevant to infection biology, including apoptosis, formation of actin 'tails' and multi-nucleation within treated macrophages. The detailed analysis of the remaining host of loci will facilitate genetic dissection of the interaction of this important pathogen with host macrophages and thus further elucidate this critical part of its infection cycle.

  7. Lactobacillus plantarum gene clusters encoding putative cell-surface protein complexes for carbohydrate utilization are conserved in specific gram-positive bacteria

    Directory of Open Access Journals (Sweden)

    Muscariello Lidia

    2006-05-01

    Full Text Available Abstract Background Genomes of gram-positive bacteria encode many putative cell-surface proteins, of which the majority has no known function. From the rapidly increasing number of available genome sequences it has become apparent that many cell-surface proteins are conserved, and frequently encoded in gene clusters or operons, suggesting common functions, and interactions of multiple components. Results A novel gene cluster encoding exclusively cell-surface proteins was identified, which is conserved in a subgroup of gram-positive bacteria. Each gene cluster generally has one copy of four new gene families called cscA, cscB, cscC and cscD. Clusters encoding these cell-surface proteins were found only in complete genomes of Lactobacillus plantarum, Lactobacillus sakei, Enterococcus faecalis, Listeria innocua, Listeria monocytogenes, Lactococcus lactis ssp lactis and Bacillus cereus and in incomplete genomes of L. lactis ssp cremoris, Lactobacillus casei, Enterococcus faecium, Pediococcus pentosaceus, Lactobacillius brevis, Oenococcus oeni, Leuconostoc mesenteroides, and Bacillus thuringiensis. These genes are neither present in the genomes of streptococci, staphylococci and clostridia, nor in the Lactobacillus acidophilus group, suggesting a niche-specific distribution, possibly relating to association with plants. All encoded proteins have a signal peptide for secretion by the Sec-dependent pathway, while some have cell-surface anchors, novel WxL domains, and putative domains for sugar binding and degradation. Transcriptome analysis in L. plantarum shows that the cscA-D genes are co-expressed, supporting their operon organization. Many gene clusters are significantly up-regulated in a glucose-grown, ccpA-mutant derivative of L. plantarum, suggesting catabolite control. This is supported by the presence of predicted CRE-sites upstream or inside the up-regulated cscA-D gene clusters. Conclusion We propose that the CscA, CscB, CscC and Csc

  8. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies

    Science.gov (United States)

    Zhang, Shujun

    2018-01-01

    Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896

  9. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Xingjie Hao

    2018-01-01

    Full Text Available Genome-wide association studies (GWASs have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART. With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study.

  10. Avian reovirus L2 genome segment sequences and predicted structure/function of the encoded RNA-dependent RNA polymerase protein

    Directory of Open Access Journals (Sweden)

    Xu Wanhong

    2008-12-01

    Full Text Available Abstract Background The orthoreoviruses are infectious agents that possess a genome comprised of 10 double-stranded RNA segments encased in two concentric protein capsids. Like virtually all RNA viruses, an RNA-dependent RNA polymerase (RdRp enzyme is required for viral propagation. RdRp sequences have been determined for the prototype mammalian orthoreoviruses and for several other closely-related reoviruses, including aquareoviruses, but have not yet been reported for any avian orthoreoviruses. Results We determined the L2 genome segment nucleotide sequences, which encode the RdRp proteins, of two different avian reoviruses, strains ARV138 and ARV176 in order to define conserved and variable regions within reovirus RdRp proteins and to better delineate structure/function of this important enzyme. The ARV138 L2 genome segment was 3829 base pairs long, whereas the ARV176 L2 segment was 3830 nucleotides long. Both segments were predicted to encode λB RdRp proteins 1259 amino acids in length. Alignments of these newly-determined ARV genome segments, and their corresponding proteins, were performed with all currently available homologous mammalian reovirus (MRV and aquareovirus (AqRV genome segment and protein sequences. There was ~55% amino acid identity between ARV λB and MRV λ3 proteins, making the RdRp protein the most highly conserved of currently known orthoreovirus proteins, and there was ~28% identity between ARV λB and homologous MRV and AqRV RdRp proteins. Predictive structure/function mapping of identical and conserved residues within the known MRV λ3 atomic structure indicated most identical amino acids and conservative substitutions were located near and within predicted catalytic domains and lining RdRp channels, whereas non-identical amino acids were generally located on the molecule's surfaces. Conclusion The ARV λB and MRV λ3 proteins showed the highest ARV:MRV identity values (~55% amongst all currently known ARV and MRV

  11. The Genome of Deep-Sea Vent Chemolithoautotroph Thiomicrospira crunogena XCL-2

    Energy Technology Data Exchange (ETDEWEB)

    Scott, K M; Sievert, S M; Abril, F N; Ball, L A; Barrett, C J; Blake, R A; Boller, A J; Chain, P G; Clark, J A; Davis, C R; Detter, C; Do, K F; Dobrinski, K P; Faza, B I; Fitzpatrick, K A; Freyermuth, S K; Harmer, T L; Hauser, L J; Hugler, M; Kerfeld, C A; Klotz, M G; Kong, W W; Land, M; Lapidus, A; Larimer, F W; Longo, D L; Lucas, S; Malfatti, S A; Massey, S E; Martin, D D; McCuddin, Z; Meyer, F; Moore, J L; Ocampo Jr., L H; Paul, J H; Paulsen, I T; Reep, D K; Ren, Q; Ross, R L; Sato, P Y; Thomas, P; Tinkham, L E; Zerugh, G T

    2007-01-10

    Presented here is the complete genome sequence of Thiomicrospira crunogena XCL-2, representative of ubiquitous chemolithoautotrophic sulfur-oxidizing bacteria isolated from deep-sea hydrothermal vents. This gammaproteobacterium has a single chromosome (2,427,734 bp), and its genome illustrates many of the adaptations that have enabled it to thrive at vents globally. It has 14 methyl-accepting chemotaxis protein genes, including four that may assist in positioning it in the redoxcline. A relative abundance of CDSs encoding regulatory proteins likely control the expression of genes encoding carboxysomes, multiple dissolved inorganic nitrogen and phosphate transporters, as well as a phosphonate operon, which provide this species with a variety of options for acquiring these substrates from the environment. T. crunogena XCL-2 is unusual among obligate sulfur oxidizing bacteria in relying on the Sox system for the oxidation of reduced sulfur compounds. A 38 kb prophage is present, and a high level of prophage induction was observed, which may play a role in keeping competing populations of close relatives in check. The genome has characteristics consistent with an obligately chemolithoautotrophic lifestyle, including few transporters predicted to have organic allocrits, and Calvin-Benson-Bassham cycle CDSs scattered throughout the genome.

  12. Extreme expansion of NBS-encoding genes in Rosaceae.

    Science.gov (United States)

    Jia, YanXiao; Yuan, Yang; Zhang, Yanchun; Yang, Sihai; Zhang, Xiaohui

    2015-05-03

    Nucleotide binding site leucine-rich repeats (NBS-LRR) genes encode a large class of disease resistance (R) proteins in plants. Extensive studies have been carried out to identify and investigate NBS-encoding gene families in many important plant species. However, no comprehensive research into NBS-encoding genes in the Rosaceae has been performed. In this study, five whole-genome sequenced Rosaceae species, including apple, pear, peach, mei, and strawberry, were analyzed to investigate the evolutionary pattern of NBS-encoding genes and to compare them to those of three Cucurbitaceae species, cucumber, melon, and watermelon. Considerable differences in the copy number of NBS-encoding genes were observed between Cucurbitaceae and Rosaceae species. In Rosaceae species, a large number and a high proportion of NBS-encoding genes were observed in peach (437, 1.52%), mei (475, 1.51%), strawberry (346, 1.05%) and pear (617, 1.44%), and apple contained a whopping 1303 (2.05%) NBS-encoding genes, which might be the highest number of R-genes in all of these reported diploid plant. However, no more than 100 NBS-encoding genes were identified in Cucurbitaceae. Many more species-specific gene families were classified and detected with the signature of positive selection in Rosaceae species, especially in the apple genome. Taken together, our findings indicate that NBS-encoding genes in Rosaceae, especially in apple, have undergone extreme expansion and rapid adaptive evolution. Useful information was provided for further research on the evolutionary mode of disease resistance genes in Rosaceae crops.

  13. Whole genome sequence analysis of Geitlerinema sp. FC II unveils competitive edge of the strain in marine cultivation system for biofuel production.

    Science.gov (United States)

    Batchu, Navish Kumar; Khater, Shradha; Patil, Sonal; Nagle, Vinod; Das, Gautam; Bhadra, Bhaskar; Sapre, Ajit; Dasgupta, Santanu

    2018-03-05

    A filamentous cyanobacteria, Geitlerinema sp. FC II, was isolated from marine algae culture pond at Reliance Industries Limited (RIL), India. The 6.7 Mb draft genome of FC II encodes for 6697 protein coding genes. Analysis of the whole genome sequence revealed presence of nif gene cluster, supporting its capability to fix atmospheric nitrogen. FC II genome contains two variants of sulfide:quinone oxidoreductases (SQR), which is a crucial elector donor in cyanobacterial metabolic processes. FC II is characterized by the presence of multiple CRISPR- Cas (Clustered Regularly Interspaced Short Palindrome Repeats - CRISPR associated proteins) clusters, multiple variants of genes encoding photosystem reaction centres, biosynthetic gene clusters of alkane, polyketides and non-ribosomal peptides. Presence of these pathways will help FC II in gaining an ecological advantage over other strains for biomass production in large scale cultivation system. Hence, FC II may be used for production of biofuel and other industrially important metabolites. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

    NARCIS (Netherlands)

    Ellrott, Kyle; Bailey, Matthew H.; Saksena, Gordon; Covington, Kyle R.; Kandoth, Cyriac; Stewart, Chip; Hess, Julian; Ma, Singer; Chiotti, Kami E.; McLellan, Michael; Sofia, Heidi J.; Hutter, Carolyn M.; Getz, Gad; Wheeler, David A.; Ding, Li; Caesar-Johnson, Samantha J.; Demchok, John A.; Felau, Ina; Kasapi, Melpomeni; Ferguson, Martin L.; Hutter, Carolyn M.; Sofia, Heidi J.; Tarnuzzer, Roy; Wang, Zhining; Yang, Liming; Zenklusen, Jean C.; Zhang, Jiashan (Julia); Chudamani, Sudha; Liu, Jia; Lolla, Laxmi; Naresh, Rashi; Pihl, Todd; Sun, Qiang; Wan, Yunhu; Wu, Ye; Cho, Juok; DeFreitas, Timothy; Frazer, Scott; Gehlenborg, Nils; Getz, Gad; Heiman, David I.; Kim, Jaegil; Lawrence, Michael S.; Lin, Pei; Meier, Sam; Noble, Michael S.; Saksena, Gordon; Voet, Doug; Zhang, Hailei; Bernard, Brady; Chambwe, Nyasha; Dhankani, Varsha; Knijnenburg, Theo; Kramer, Roger; Leinonen, Kalle; Liu, Yuexin; Miller, Michael; Reynolds, Sheila; Shmulevich, Ilya; Thorsson, Vesteinn; Zhang, Wei; Akbani, Rehan; Broom, Bradley M.; Hegde, Apurva M.; Ju, Zhenlin; Kanchi, Rupa S.; Korkut, Anil; Li, Jun; Liang, Han; Ling, Shiyun; Liu, Wenbin; Lu, Yiling; Mills, Gordon B.; Ng, Kwok Shing; Rao, Arvind; Ryan, Michael; Wang, Jing; Weinstein, John N.; Zhang, Jiexin; Abeshouse, Adam; Armenia, Joshua; Chakravarty, Debyani; Chatila, Walid K.; de Bruijn, Ino; Gao, Jianjiong; Gross, Benjamin E.; Heins, Zachary J.; Kundra, Ritika; La, Konnor; Ladanyi, Marc; Luna, Augustin; Nissan, Moriah G.; Ochoa, Angelica; Phillips, Sarah M.; Reznik, Ed; Sanchez-Vega, Francisco; Sander, Chris; Schultz, Nikolaus; Sheridan, Robert; Sumer, S. Onur; Sun, Yichao; Taylor, Barry S.; Wang, Jioajiao; Zhang, Hongxin; Anur, Pavana; Peto, Myron; Spellman, Paul; Benz, Christopher; Stuart, Joshua M.; Wong, Christopher K.; Yau, Christina; Hayes, D. Neil; Wilkerson, Matthew D.; Ally, Adrian; Balasundaram, Miruna; Bowlby, Reanne; Brooks, Denise; Carlsen, Rebecca; Chuah, Eric; Dhalla, Noreen; Holt, Robert; Jones, Steven J.M.; Kasaian, Katayoon; Lee, Darlene; Ma, Yussanne; Marra, Marco A.; Mayo, Michael; Moore, Richard A.; Mungall, Andrew J.; Mungall, Karen; Robertson, A. Gordon; Sadeghi, Sara; Schein, Jacqueline E.; Sipahimalani, Payal; Tam, Angela; Thiessen, Nina; Tse, Kane; Wong, Tina; Berger, Ashton C.; Beroukhim, Rameen; Cherniack, Andrew D.; Cibulskis, Carrie; Gabriel, Stacey B.; Gao, Galen F.; Ha, Gavin; Meyerson, Matthew; Schumacher, Steven E.; Shih, Juliann; Kucherlapati, Melanie H.; Kucherlapati, Raju S.; Baylin, Stephen; Cope, Leslie; Danilova, Ludmila; Bootwalla, Moiz S.; Lai, Phillip H.; Maglinte, Dennis T.; Van Den Berg, David J.; Weisenberger, Daniel J.; Auman, J. Todd; Balu, Saianand; Bodenheimer, Tom; Fan, Cheng; Hoadley, Katherine A.; Hoyle, Alan P.; Jefferys, Stuart R.; Jones, Corbin D.; Meng, Shaowu; Mieczkowski, Piotr A.; Mose, Lisle E.; Perou, Amy H.; Perou, Charles M.; Roach, Jeffrey; Shi, Yan; Simons, Janae V.; Skelly, Tara; Soloway, Matthew G.; Tan, Donghui; Veluvolu, Umadevi; Fan, Huihui; Hinoue, Toshinori; Laird, Peter W.; Shen, Hui; Zhou, Wanding; Bellair, Michelle; Chang, Kyle; Covington, Kyle; Creighton, Chad J.; Dinh, Huyen; Doddapaneni, Harsha Vardhan; Donehower, Lawrence A.; Drummond, Jennifer; Gibbs, Richard A.; Glenn, Robert; Hale, Walker; Han, Yi; Hu, Jianhong; Korchina, Viktoriya; Lee, Sandra; Lewis, Lora; Li, Wei; Liu, Xiuping; Morgan, Margaret; Morton, Donna; Muzny, Donna; Santibanez, Jireh; Sheth, Margi; Shinbrot, Eve; Wang, Linghua; Wang, Min; Wheeler, David A.; Xi, Liu; Zhao, Fengmei; Hess, Julian; Appelbaum, Elizabeth L.; Bailey, Matthew; Cordes, Matthew G.; Ding, Li; Fronick, Catrina C.; Fulton, Lucinda A.; Fulton, Robert S.; Kandoth, Cyriac; Mardis, Elaine R.; McLellan, Michael D.; Miller, Christopher A.; Schmidt, Heather K.; Wilson, Richard K.; Crain, Daniel; Curley, Erin; Gardner, Johanna; Lau, Kevin; Mallery, David; Morris, Scott; Paulauskis, Joseph; Penny, Robert; Shelton, Candace; Shelton, Troy; Sherman, Mark; Thompson, Eric; Yena, Peggy; Bowen, Jay; Gastier-Foster, Julie M.; Gerken, Mark; Leraas, Kristen M.; Lichtenberg, Tara M.; Ramirez, Nilsa C.; Wise, Lisa; Zmuda, Erik; Corcoran, Niall; Costello, Tony; Hovens, Christopher; Carvalho, Andre L.; de Carvalho, Ana C.; Fregnani, José H.; Longatto-Filho, Adhemar; Reis, Rui M.; Scapulatempo-Neto, Cristovam; Silveira, Henrique C.S.; Vidal, Daniel O.; Burnette, Andrew; Eschbacher, Jennifer; Hermes, Beth; Noss, Ardene; Singh, Rosy; Anderson, Matthew L.; Castro, Patricia D.; Ittmann, Michael; Huntsman, David; Kohl, Bernard; Le, Xuan; Thorp, Richard; Andry, Chris; Duffy, Elizabeth R.; Lyadov, Vladimir; Paklina, Oxana; Setdikova, Galiya; Shabunin, Alexey; Tavobilov, Mikhail; McPherson, Christopher; Warnick, Ronald; Berkowitz, Ross; Cramer, Daniel; Feltmate, Colleen; Horowitz, Neil; Kibel, Adam; Muto, Michael; Raut, Chandrajit P.; Malykh, Andrei; Barnholtz-Sloan, Jill S.; Barrett, Wendi; Devine, Karen; Fulop, Jordonna; Ostrom, Quinn T.; Shimmel, Kristen; Wolinsky, Yingli; Sloan, Andrew E.; De Rose, Agostino; Giuliante, Felice; Goodman, Marc; Karlan, Beth Y.; Hagedorn, Curt H.; Eckman, John; Harr, Jodi; Myers, Jerome; Tucker, Kelinda; Zach, Leigh Anne; Deyarmin, Brenda; Hu, Hai; Kvecher, Leonid; Larson, Caroline; Mural, Richard J.; Somiari, Stella; Vicha, Ales; Zelinka, Tomas; Bennett, Joseph; Iacocca, Mary; Rabeno, Brenda; Swanson, Patricia; Latour, Mathieu; Lacombe, Louis; Têtu, Bernard; Bergeron, Alain; McGraw, Mary; Staugaitis, Susan M.; Chabot, John; Hibshoosh, Hanina; Sepulveda, Antonia; Su, Tao; Wang, Timothy; Potapova, Olga; Voronina, Olga; Desjardins, Laurence; Mariani, Odette; Roman-Roman, Sergio; Sastre, Xavier; Stern, Marc Henri; Cheng, Feixiong; Signoretti, Sabina; Berchuck, Andrew; Bigner, Darell; Lipp, Eric; Marks, Jeffrey; McCall, Shannon; McLendon, Roger; Secord, Angeles; Sharp, Alexis; Behera, Madhusmita; Brat, Daniel J.; Chen, Amy; Delman, Keith; Force, Seth; Khuri, Fadlo; Magliocca, Kelly; Maithel, Shishir; Olson, Jeffrey J.; Owonikoko, Taofeek; Pickens, Alan; Ramalingam, Suresh; Shin, Dong M.; Sica, Gabriel; Van Meir, Erwin G.; Zhang, Hongzheng; Eijckenboom, Wil; Gillis, Ad; Korpershoek, Esther; Looijenga, Leendert; Oosterhuis, Wolter; Stoop, Hans; van Kessel, Kim E.; Zwarthoff, Ellen C.; Calatozzolo, Chiara; Cuppini, Lucia; Cuzzubbo, Stefania; DiMeco, Francesco; Finocchiaro, Gaetano; Mattei, Luca; Perin, Alessandro; Pollo, Bianca; Chen, Chu; Houck, John; Lohavanichbutr, Pawadee; Hartmann, Arndt; Stoehr, Christine; Stoehr, Robert; Taubert, Helge; Wach, Sven; Wullich, Bernd; Kycler, Witold; Murawa, Dawid; Wiznerowicz, Maciej; Chung, Ki; Edenfield, W. Jeffrey; Martin, Julie; Baudin, Eric; Bubley, Glenn; Bueno, Raphael; De Rienzo, Assunta; Richards, William G.; Kalkanis, Steven; Mikkelsen, Tom; Noushmehr, Houtan; Scarpace, Lisa; Girard, Nicolas; Aymerich, Marta; Campo, Elias; Giné, Eva; Guillermo, Armando López; Van Bang, Nguyen; Hanh, Phan Thi; Phu, Bui Duc; Tang, Yufang; Colman, Howard; Evason, Kimberley; Dottino, Peter R.; Martignetti, John A.; Gabra, Hani; Juhl, Hartmut; Akeredolu, Teniola; Stepa, Serghei; Hoon, Dave; Ahn, Keunsoo; Kang, Koo Jeong; Beuschlein, Felix; Breggia, Anne; Birrer, Michael; Bell, Debra; Borad, Mitesh; Bryce, Alan H.; Castle, Erik; Chandan, Vishal; Cheville, John; Copland, John A.; Farnell, Michael; Flotte, Thomas; Giama, Nasra; Ho, Thai; Kendrick, Michael; Kocher, Jean Pierre; Kopp, Karla; Moser, Catherine; Nagorney, David; O'Brien, Daniel; O'Neill, Brian Patrick; Patel, Tushar; Petersen, Gloria; Que, Florencia; Rivera, Michael; Roberts, Lewis; Smallridge, Robert; Smyrk, Thomas; Stanton, Melissa; Thompson, R. Houston; Torbenson, Michael; Yang, Ju Dong; Zhang, Lizhi; Brimo, Fadi; Ajani, Jaffer A.; Angulo Gonzalez, Ana Maria; Behrens, Carmen; Bondaruk, Jolanta; Broaddus, Russell; Czerniak, Bogdan; Esmaeli, Bita; Fujimoto, Junya; Gershenwald, Jeffrey; Guo, Charles; Lazar, Alexander J.; Logothetis, Christopher; Meric-Bernstam, Funda; Moran, Cesar; Ramondetta, Lois; Rice, David; Sood, Anil; Tamboli, Pheroze; Thompson, Timothy; Troncoso, Patricia; Tsao, Anne; Wistuba, Ignacio; Carter, Candace; Haydu, Lauren; Hersey, Peter; Jakrot, Valerie; Kakavand, Hojabr; Kefford, Richard; Lee, Kenneth; Long, Georgina; Mann, Graham; Quinn, Michael; Saw, Robyn; Scolyer, Richard; Shannon, Kerwin; Spillane, Andrew; Stretch, Jonathan; Synott, Maria; Thompson, John; Wilmott, James; Al-Ahmadie, Hikmat; Chan, Timothy A.; Ghossein, Ronald; Gopalan, Anuradha; Levine, Douglas A.; Reuter, Victor; Singer, Samuel; Singh, Bhuvanesh; Tien, Nguyen Viet; Broudy, Thomas; Mirsaidi, Cyrus; Nair, Praveen; Drwiega, Paul; Miller, Judy; Smith, Jennifer; Zaren, Howard; Park, Joong Won; Hung, Nguyen Phi; Kebebew, Electron; Linehan, W. Marston; Metwalli, Adam R.; Pacak, Karel; Pinto, Peter A.; Schiffman, Mark; Schmidt, Laura S.; Vocke, Cathy D.; Wentzensen, Nicolas; Worrell, Robert; Yang, Hannah; Moncrieff, Marc; Goparaju, Chandra; Melamed, Jonathan; Pass, Harvey; Botnariuc, Natalia; Caraman, Irina; Cernat, Mircea; Chemencedji, Inga; Clipca, Adrian; Doruc, Serghei; Gorincioi, Ghenadie; Mura, Sergiu; Pirtac, Maria; Stancul, Irina; Tcaciuc, Diana; Albert, Monique; Alexopoulou, Iakovina; Arnaout, Angel; Bartlett, John; Engel, Jay; Gilbert, Sebastien; Parfitt, Jeremy; Sekhon, Harman; Thomas, George; Rassl, Doris M.; Rintoul, Robert C.; Bifulco, Carlo; Tamakawa, Raina; Urba, Walter; Hayward, Nicholas; Timmers, Henri; Antenucci, Anna; Facciolo, Francesco; Grazi, Gianluca; Marino, Mirella; Merola, Roberta; de Krijger, Ronald; Gimenez-Roqueplo, Anne Paule; Piché, Alain; Chevalier, Simone; McKercher, Ginette; Birsoy, Kivanc; Barnett, Gene; Brewer, Cathy; Farver, Carol; Naska, Theresa; Pennell, Nathan A.; Raymond, Daniel; Schilero, Cathy; Smolenski, Kathy; Williams, Felicia; Morrison, Carl; Borgia, Jeffrey A.; Liptay, Michael J.; Pool, Mark; Seder, Christopher W.; Junker, Kerstin; Omberg, Larsson; Dinkin, Mikhail; Manikhas, George; Alvaro, Domenico; Bragazzi, Maria Consiglia; Cardinale, Vincenzo; Carpino, Guido; Gaudio, Eugenio; Chesla, David; Cottingham, Sandra; Dubina, Michael; Moiseenko, Fedor; Dhanasekaran, Renumathy; Becker, Karl Friedrich; Janssen, Klaus Peter; Slotta-Huspenina, Julia; Abdel-Rahman, Mohamed H.; Aziz, Dina; Bell, Sue; Cebulla, Colleen M.; Davis, Amy; Duell, Rebecca; Elder, J. Bradley; Hilty, Joe; Kumar, Bahavna; Lang, James; Lehman, Norman L.; Mandt, Randy; Nguyen, Phuong; Pilarski, Robert; Rai, Karan; Schoenfield, Lynn; Senecal, Kelly; Wakely, Paul; Hansen, Paul; Lechan, Ronald; Powers, James; Tischler, Arthur; Grizzle, William E.; Sexton, Katherine C.; Kastl, Alison; Henderson, Joel; Porten, Sima; Waldmann, Jens; Fassnacht, Martin; Asa, Sylvia L.; Schadendorf, Dirk; Couce, Marta; Graefen, Markus; Huland, Hartwig; Sauter, Guido; Schlomm, Thorsten; Simon, Ronald; Tennstedt, Pierre; Olabode, Oluwole; Nelson, Mark; Bathe, Oliver; Carroll, Peter R.; Chan, June M.; Disaia, Philip; Glenn, Pat; Kelley, Robin K.; Landen, Charles N.; Phillips, Joanna; Prados, Michael; Simko, Jeffry; Smith-McCune, Karen; VandenBerg, Scott; Roggin, Kevin; Fehrenbach, Ashley; Kendler, Ady; Sifri, Suzanne; Steele, Ruth; Jimeno, Antonio; Carey, Francis; Forgie, Ian; Mannelli, Massimo; Carney, Michael; Hernandez, Brenda; Campos, Benito; Herold-Mende, Christel; Jungk, Christin; Unterberg, Andreas; von Deimling, Andreas; Bossler, Aaron; Galbraith, Joseph; Jacobus, Laura; Knudson, Michael; Knutson, Tina; Ma, Deqin; Milhem, Mohammed; Sigmund, Rita; Godwin, Andrew K.; Madan, Rashna; Rosenthal, Howard G.; Adebamowo, Clement; Adebamowo, Sally N.; Boussioutas, Alex; Beer, David; Giordano, Thomas; Mes-Masson, Anne Marie; Saad, Fred; Bocklage, Therese; Landrum, Lisa; Mannel, Robert; Moore, Kathleen; Moxley, Katherine; Postier, Russel; Walker, Joan; Zuna, Rosemary; Feldman, Michael; Valdivieso, Federico; Dhir, Rajiv; Luketich, James; Mora Pinero, Edna M.; Quintero-Aguilo, Mario; Carlotti, Carlos Gilberto; Dos Santos, Jose Sebastião; Kemp, Rafael; Sankarankuty, Ajith; Tirapelli, Daniela; Catto, James; Agnew, Kathy; Swisher, Elizabeth; Creaney, Jenette; Robinson, Bruce; Shelley, Carl Simon; Godwin, Eryn M.; Kendall, Sara; Shipman, Cassaundra; Bradford, Carol; Carey, Thomas; Haddad, Andrea; Moyer, Jeffey; Peterson, Lisa; Prince, Mark; Rozek, Laura; Wolf, Gregory; Bowman, Rayleen; Fong, Kwun M.; Yang, Ian; Korst, Robert; Rathmell, W. Kimryn; Fantacone-Campbell, J. Leigh; Hooke, Jeffrey A.; Kovatich, Albert J.; Shriver, Craig D.; DiPersio, John; Drake, Bettina; Govindan, Ramaswamy; Heath, Sharon; Ley, Timothy; Van Tine, Brian; Westervelt, Peter; Rubin, Mark A.; Lee, Jung Il; Aredes, Natália D.; Mariamidze, Armaz

    2018-01-01

    The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a

  15. Syntenic block overlap multiplicities with a panel of reference genomes provide a signature of ancient polyploidization events.

    Science.gov (United States)

    Zheng, Chunfang; Santos Muñoz, Daniella; Albert, Victor A; Sankoff, David

    2015-01-01

    Following whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome. With time, the distribution broadens and loses volume due to variable decay of duplicate gene similarity and to the process of duplicate gene loss. If there are two WGD, the older one becomes so reduced and broad that it merges with the tail of the distributions resulting from more recent events, and it becomes difficult to distinguish them. The goal of this paper is to advance statistical methods of identifying, or at least counting, the WGD events in the lineage of a given genome. For a set of 15 angiosperm genomes, we analyze all 15 × 14 = 210 ordered pairs of target genome versus reference genome, using SynMap to find syntenic blocks. We consider all sets of B ≥ 2 syntenic blocks in the target genome that overlap in the reference genome as evidence of WGD activity in the target, whether it be one event or several. We hypothesize that in fitting an exponential function to the tail of the empirical distribution f (B) of block multiplicities, the size of the exponent will reflect the amount of WGD in the history of the target genome. By amalgamating the results from all reference genomes, a range of values of SynMap parameters, and alternative cutoff points for the tail, we find a clear pattern whereby multiple-WGD core eudicots have the smallest (negative) exponents, followed by core eudicots with only the single "γ" triplication in their history, followed by a non-core eudicot with a single WGD, followed by the monocots, with a basal angiosperm, the WGD-free Amborella having the largest exponent. The hypothesis that the exponent of the fit to the tail of the multiplicity distribution is a signature of the amount of WGD is verified, but there is also a clear complicating factor in the monocot clade, where a history of multiple WGD is not reflected in a small exponent.

  16. In silico pattern-based analysis of the human cytomegalovirus genome.

    Science.gov (United States)

    Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas

    2003-04-01

    More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).

  17. Genomics and the making of yeast biodiversity.

    Science.gov (United States)

    Hittinger, Chris Todd; Rokas, Antonis; Bai, Feng-Yan; Boekhout, Teun; Gonçalves, Paula; Jeffries, Thomas W; Kominek, Jacek; Lachance, Marc-André; Libkind, Diego; Rosa, Carlos A; Sampaio, José Paulo; Kurtzman, Cletus P

    2015-12-01

    Yeasts are unicellular fungi that do not form fruiting bodies. Although the yeast lifestyle has evolved multiple times, most known species belong to the subphylum Saccharomycotina (syn. Hemiascomycota, hereafter yeasts). This diverse group includes the premier eukaryotic model system, Saccharomyces cerevisiae; the common human commensal and opportunistic pathogen, Candida albicans; and over 1000 other known species (with more continuing to be discovered). Yeasts are found in every biome and continent and are more genetically diverse than angiosperms or chordates. Ease of culture, simple life cycles, and small genomes (∼10-20Mbp) have made yeasts exceptional models for molecular genetics, biotechnology, and evolutionary genomics. Here we discuss recent developments in understanding the genomic underpinnings of the making of yeast biodiversity, comparing and contrasting natural and human-associated evolutionary processes. Only a tiny fraction of yeast biodiversity and metabolic capabilities has been tapped by industry and science. Expanding the taxonomic breadth of deep genomic investigations will further illuminate how genome function evolves to encode their diverse metabolisms and ecologies. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. A deep auto-encoder model for gene expression prediction.

    Science.gov (United States)

    Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua

    2017-11-17

    Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

  19. Divergence of RNA polymerase ? subunits in angiosperm plastid genomes is mediated by genomic rearrangement

    OpenAIRE

    Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.

    2016-01-01

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP ? subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled an...

  20. Parallel encoders for pixel detectors

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1991-01-01

    A new method of fast encoding and determining the multiplicity and coordinates of fired pixels is described. A specific example construction of parallel encodes and MCC for n=49 and t=2 is given. 16 refs.; 6 figs.; 2 tabs

  1. Genomic analyses of the Chlamydia trachomatis core genome show an association between chromosomal genome, plasmid type and disease

    NARCIS (Netherlands)

    Versteeg, Bart; Bruisten, Sylvia M.; Pannekoek, Yvonne; Jolley, Keith A.; Maiden, Martin C. J.; van der Ende, Arie; Harrison, Odile B.

    2018-01-01

    Background: Chlamydia trachomatis (Ct) plasmid has been shown to encode genes essential for infection. We evaluated the population structure of Ct using whole-genome sequence data (WGS). In particular, the relationship between the Ct genome, plasmid and disease was investigated. Results: WGS data

  2. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  3. Verbal episodic memory in 426 multiple sclerosis patients: impairment in encoding, retrieval or both?

    Science.gov (United States)

    Brissart, H; Morele, E; Baumann, C; Debouverie, M

    2012-10-01

    Episodic memory is frequently impaired in multiple sclerosis (MS) patients but the exact nature of the disorder is controversial. It was initially thought to be due to a retrieval deficit but some studies have demonstrated an encoding deficit, which could be linked to a slowing of information processing speed or to a deficit in elaboration of strategies. The main objective of this study is to assess the prevalence and the nature of verbal episodic memory (VEM) impairment in MS patients. We retrieved memory performances of 426 patients [314 F-112 M; mean age: 46.1 years; median Expanded Disability Status Scale (EDSS) score: 3.1] from a neuropsychological data base. VEM was assessed using the 16 words RL-RI 16 test. 66% MS patients present at least one recall impaired in VEM (37.2% from 2 to 5 recall). 14.2% MS patients present an impairment in encoding phase. We observed that 5% of patients presented recognition difficulties. Correlations were observed between VEM performances and EDSS, and disease duration but no group effect (ANOVA) is observed between form of MS and VEM performances. These results confirm the high prevalence of VEM impairment in MS patients. Deficits affect mainly information retrieval in early stage MS patients and are then linked to encoding as disability increases. Storage disorders are infrequent, so cognitive rehabilitation with mental imaging could be effective in MS patients.

  4. Localized Plasticity in the Streamlined Genomes of Vinyl Chloride Respiring Dehalococcoides

    Energy Technology Data Exchange (ETDEWEB)

    McMurdie, Paul J.; Behrens, Sebastien F.; Muller, Jochen A.; Goke, Jonathan; Ritalahti, Kirsti M.; Wagner, Ryan; Goltsman, Eugene; Lapidus, Alla; Holmes, Susan; Loffler, Frank E.; Spormann, Alfred M.

    2009-06-30

    Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain the majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism.

  5. Linking disease associations with regulatory information in the human genome

    KAUST Repository

    Schaub, M. A.; Boyle, A. P.; Kundaje, A.; Batzoglou, S.; Snyder, M.

    2012-01-01

    Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

  6. Linking disease associations with regulatory information in the human genome

    KAUST Repository

    Schaub, M. A.

    2012-09-01

    Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

  7. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    Science.gov (United States)

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily

  8. Nanoliter reactors improve multiple displacement amplification of genomes from single cells.

    Directory of Open Access Journals (Sweden)

    Yann Marcy

    2007-09-01

    Full Text Available Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.

  9. Insights on Genomic and Molecular Alterations in Multiple Myeloma and Their Incorporation towards Risk-Adapted Treatment Strategy: Concise Clinical Review

    Directory of Open Access Journals (Sweden)

    Taiga Nishihori

    2017-01-01

    Full Text Available Although recent advances in novel treatment approaches and therapeutics have shifted the treatment landscape of multiple myeloma, it remains an incurable plasma cell malignancy. Growing knowledge of the genome and expressed genomic information characterizing the biologic behavior of multiple myeloma continues to accumulate. However, translation and incorporation of vast molecular understanding of complex tumor biology to deliver personalized and precision treatment to cure multiple myeloma have not been successful to date. Our review focuses on current evidence and understanding of myeloma biology with characterization in the context of genomic and molecular alterations. We also discuss future clinical application of the genomic and molecular knowledge, and more translational research is needed to benefit our myeloma patients.

  10. Cloning of an epoxide hydrolase encoding gene from Rhodotorula mucilaginosa and functional expresion in Yarrowia lipolytica

    CSIR Research Space (South Africa)

    Labuschagne, M

    2007-01-01

    Full Text Available , were used to amplify the genomic EH-encoding gene from Rhodotorula mucilaginosa. The 2347 bp genomic sequence revealed a 1979 bp ORF containing nine introns. The cDNA sequence revealed an 1185 bp EH-encoding gene that translates into a 394 amino acid...

  11. GOBASE: an organelle genome database

    OpenAIRE

    O?Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

    2008-01-01

    The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (?913 000) and chloroplast-encoded sequences (?250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing...

  12. Neuropeptides encoded by the genomes of the Akoya pearl oyster Pinctata fucata and Pacific oyster Crassostrea gigas: a bioinformatic and peptidomic survey.

    Science.gov (United States)

    Stewart, Michael J; Favrel, Pascal; Rotgans, Bronwyn A; Wang, Tianfang; Zhao, Min; Sohail, Manzar; O'Connor, Wayne A; Elizur, Abigail; Henry, Joel; Cummins, Scott F

    2014-10-02

    Oysters impart significant socio-ecological benefits from primary production of food supply, to estuarine ecosystems via reduction of water column nutrients, plankton and seston biomass. Little though is known at the molecular level of what genes are responsible for how oysters reproduce, filter nutrients, survive stressful physiological events and form reef communities. Neuropeptides represent a diverse class of chemical messengers, instrumental in orchestrating these complex physiological events in other species. By a combination of in silico data mining and peptide analysis of ganglia, 74 putative neuropeptide genes were identified from genome and transcriptome databases of the Akoya pearl oyster, Pinctata fucata and the Pacific oyster, Crassostrea gigas, encoding precursors for over 300 predicted bioactive peptide products, including three newly identified neuropeptide precursors PFGx8amide, RxIamide and Wx3Yamide. Our findings also include a gene for the gonadotropin-releasing hormone (GnRH) and two egg-laying hormones (ELH) which were identified from both oysters. Multiple sequence alignments and phylogenetic analysis supports similar global organization of these mature peptides. Computer-based peptide modeling of the molecular tertiary structures of ELH highlights the structural homologies within ELH family, which may facilitate ELH activity leading to the release of gametes. Our analysis demonstrates that oysters possess conserved molluscan neuropeptide domains and overall precursor organization whilst highlighting many previously unrecognized bivalve idiosyncrasies. This genomic analysis provides a solid foundation from which further studies aimed at the functional characterization of these molluscan neuropeptides can be conducted to further stimulate advances in understanding the ecology and cultivation of oysters.

  13. FSPP: A Tool for Genome-Wide Prediction of smORF-Encoded Peptides and Their Functions

    Directory of Open Access Journals (Sweden)

    Hui Li

    2018-04-01

    Full Text Available smORFs are small open reading frames of less than 100 codons. Recent low throughput experiments showed a lot of smORF-encoded peptides (SEPs played crucial rule in processes such as regulation of transcription or translation, transportation through membranes and the antimicrobial activity. In order to gather more functional SEPs, it is necessary to have access to genome-wide prediction tools to give profound directions for low throughput experiments. In this study, we put forward a functional smORF-encoded peptides predictor (FSPP which tended to predict authentic SEPs and their functions in a high throughput method. FSPP used the overlap of detected SEPs from Ribo-seq and mass spectrometry as target objects. With the expression data on transcription and translation levels, FSPP built two co-expression networks. Combing co-location relations, FSPP constructed a compound network and then annotated SEPs with functions of adjacent nodes. Tested on 38 sequenced samples of 5 human cell lines, FSPP successfully predicted 856 out of 960 annotated proteins. Interestingly, FSPP also highlighted 568 functional SEPs from these samples. After comparison, the roles predicted by FSPP were consistent with known functions. These results suggest that FSPP is a reliable tool for the identification of functional small peptides. FSPP source code can be acquired at https://www.bioinfo.org/FSPP.

  14. Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

    Science.gov (United States)

    Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

    2011-01-01

    Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.

  15. Whole-genome sequencing of multiple myeloma from diagnosis to plasma cell leukemia reveals genomic initiating events, evolution, and clonal tides.

    Science.gov (United States)

    Egan, Jan B; Shi, Chang-Xin; Tembe, Waibhav; Christoforides, Alexis; Kurdoglu, Ahmet; Sinari, Shripad; Middha, Sumit; Asmann, Yan; Schmidt, Jessica; Braggio, Esteban; Keats, Jonathan J; Fonseca, Rafael; Bergsagel, P Leif; Craig, David W; Carpten, John D; Stewart, A Keith

    2012-08-02

    The longitudinal evolution of a myeloma genome from diagnosis to plasma cell leukemia has not previously been reported. We used whole-genome sequencing (WGS) on 4 purified tumor samples and patient germline DNA drawn over a 5-year period in a t(4;14) multiple myeloma patient. Tumor samples were acquired at diagnosis, first relapse, second relapse, and end-stage secondary plasma cell leukemia (sPCL). In addition to the t(4;14), all tumor time points also shared 10 common single-nucleotide variants (SNVs) on WGS comprising shared initiating events. Interestingly, we observed genomic sequence variants that waxed and waned with time in progressive tumors, suggesting the presence of multiple independent, yet related, clones at diagnosis that rose and fell in dominance. Five newly acquired SNVs, including truncating mutations of RB1 and ZKSCAN3, were observed only in the final sPCL sample suggesting leukemic transformation events. This longitudinal WGS characterization of the natural history of a high-risk myeloma patient demonstrated tumor heterogeneity at diagnosis with shifting dominance of tumor clones over time and has also identified potential mutations contributing to myelomagenesis as well as transformation from myeloma to overt extramedullary disease such as sPCL.

  16. Illuminating the Druggable Genome (IDG)

    Data.gov (United States)

    Federal Laboratory Consortium — Results from the Human Genome Project revealed that the human genome contains 20,000 to 25,000 genes. A gene contains (encodes) the information that each cell uses...

  17. Genomic Analysis of Caldithrix abyssi, the Thermophilic Anaerobic Bacterium of the Novel Bacterial Phylum Calditrichaeota.

    Science.gov (United States)

    Kublanov, Ilya V; Sigalova, Olga M; Gavrilov, Sergey N; Lebedinsky, Alexander V; Rinke, Christian; Kovaleva, Olga; Chernyh, Nikolai A; Ivanova, Natalia; Daum, Chris; Reddy, T B K; Klenk, Hans-Peter; Spring, Stefan; Göker, Markus; Reva, Oleg N; Miroshnichenko, Margarita L; Kyrpides, Nikos C; Woyke, Tanja; Gelfand, Mikhail S; Bonch-Osmolovskaya, Elizaveta A

    2017-01-01

    The genome of Caldithrix abyssi , the first cultivated representative of a phylum-level bacterial lineage, was sequenced within the framework of Genomic Encyclopedia of Bacteria and Archaea (GEBA) project. The genomic analysis revealed mechanisms allowing this anaerobic bacterium to ferment peptides or to implement nitrate reduction with acetate or molecular hydrogen as electron donors. The genome encoded five different [NiFe]- and [FeFe]-hydrogenases, one of which, group 1 [NiFe]-hydrogenase, is presumably involved in lithoheterotrophic growth, three other produce H 2 during fermentation, and one is apparently bidirectional. The ability to reduce nitrate is determined by a nitrate reductase of the Nap family, while nitrite reduction to ammonia is presumably catalyzed by an octaheme cytochrome c nitrite reductase εHao. The genome contained genes of respiratory polysulfide/thiosulfate reductase, however, elemental sulfur and thiosulfate were not used as the electron acceptors for anaerobic respiration with acetate or H 2 , probably due to the lack of the gene of the maturation protein. Nevertheless, elemental sulfur and thiosulfate stimulated growth on fermentable substrates (peptides), being reduced to sulfide, most probably through the action of the cytoplasmic sulfide dehydrogenase and/or NAD(P)-dependent [NiFe]-hydrogenase (sulfhydrogenase) encoded by the genome. Surprisingly, the genome of this anaerobic microorganism encoded all genes for cytochrome c oxidase, however, its maturation machinery seems to be non-operational due to genomic rearrangements of supplementary genes. Despite the fact that sugars were not among the substrates reported when C. abyssi was first described, our genomic analysis revealed multiple genes of glycoside hydrolases, and some of them were predicted to be secreted. This finding aided in bringing out four carbohydrates that supported the growth of C. abyssi : starch, cellobiose, glucomannan and xyloglucan. The genomic analysis

  18. Extensive diversification of IgD-, IgY-, and truncated IgY(δFc)-encoding genes in the red-eared turtle (Trachemys scripta elegans).

    Science.gov (United States)

    Li, Lingxiao; Wang, Tao; Sun, Yi; Cheng, Gang; Yang, Hui; Wei, Zhiguo; Wang, Ping; Hu, Xiaoxiang; Ren, Liming; Meng, Qingyong; Zhang, Ran; Guo, Ying; Hammarström, Lennart; Li, Ning; Zhao, Yaofeng

    2012-10-15

    IgY(ΔFc), containing only CH1 and CH2 domains, is expressed in the serum of some birds and reptiles, such as ducks and turtles. The duck IgY(ΔFc) is produced by the same υ gene that expresses the intact IgY form (CH1-4) using different transcriptional termination sites. In this study, we show that intact IgY and IgY(ΔFc) are encoded by distinct genes in the red-eared turtle (Trachemys scripta elegans). At least eight IgY and five IgY(ΔFc) transcripts were found in a single turtle. Together with Southern blotting, our data suggest that multiple genes encoding both IgY forms are present in the turtle genome. Both of the IgY forms were detected in the serum using rabbit polyclonal Abs. In addition, we show that multiple copies of the turtle δ gene are present in the genome and that alternative splicing is extensively involved in the generation of both the secretory and membrane-bound forms of the IgD H chain transcripts. Although a single μ gene was identified, the α gene was not identified in this species.

  19. Genetics and Molecular Biology of Epstein-Barr Virus-Encoded BART MicroRNA: A Paradigm for Viral Modulation of Host Immune Response Genes and Genome Stability

    Directory of Open Access Journals (Sweden)

    David H. Dreyfus

    2017-01-01

    Full Text Available Epstein-Barr virus, a ubiquitous human herpesvirus, is associated through epidemiologic evidence with common autoimmune syndromes and cancers. However, specific genetic mechanisms of pathogenesis have been difficult to identify. In this review, the author summarizes evidence that recently discovered noncoding RNAs termed microRNA encoded by Epstein-Barr virus BARF (BamHI A right frame termed BART (BamHI A right transcripts are modulators of human immune response genes and genome stability in infected and bystander cells. BART expression is apparently regulated by complex feedback loops with the host immune response regulatory NF-κB transcription factors. EBV-encoded BZLF-1 (ZEBRA protein could also regulate BART since ZEBRA contains a terminal region similar to ankyrin proteins such as IκBα that regulate host NF-κB. BALF-2 (BamHI A left frame transcript, a viral homologue of the immunoglobulin and T cell receptor gene recombinase RAG-1 (recombination-activating gene-1, may also be coregulated with BART since BALF-2 regulatory sequences are located near the BART locus. Viral-encoded microRNA and viral mRNA transferred to bystander cells through vesicles, defective viral particles, or other mechanisms suggest a new paradigm in which bystander or hit-and-run mechanisms enable the virus to transiently or chronically alter human immune response genes as well as the stability of the human genome.

  20. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling

    Science.gov (United States)

    Sato, Yukuto; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-01-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis. PMID:26578810

  1. Draft genome sequence of Actinotignum schaalii DSM 15541T: Genetic insights into the lifestyle, cell fitness and virulence.

    Directory of Open Access Journals (Sweden)

    Atteyet F Yassin

    Full Text Available The permanent draft genome sequence of Actinotignum schaalii DSM 15541T is presented. The annotated genome includes 2,130,987 bp, with 1777 protein-coding and 58 rRNA-coding genes. Genome sequence analysis revealed absence of genes encoding for: components of the PTS systems, enzymes of the TCA cycle, glyoxylate shunt and gluconeogensis. Genomic data revealed that A. schaalii is able to oxidize carbohydrates via glycolysis, the nonoxidative pentose phosphate and the Entner-Doudoroff pathways. Besides, the genome harbors genes encoding for enzymes involved in the conversion of pyruvate to lactate, acetate and ethanol, which are found to be the end products of carbohydrate fermentation. The genome contained the gene encoding Type I fatty acid synthase required for de novo FAS biosynthesis. The plsY and plsX genes encoding the acyltransferases necessary for phosphatidic acid biosynthesis were absent from the genome. The genome harbors genes encoding enzymes responsible for isoprene biosynthesis via the mevalonate (MVA pathway. Genes encoding enzymes that confer resistance to reactive oxygen species (ROS were identified. In addition, A. schaalii harbors genes that protect the genome against viral infections. These include restriction-modification (RM systems, type II toxin-antitoxin (TA, CRISPR-Cas and abortive infection system. A. schaalii genome also encodes several virulence factors that contribute to adhesion and internalization of this pathogen such as the tad genes encoding proteins required for pili assembly, the nanI gene encoding exo-alpha-sialidase, genes encoding heat shock proteins and genes encoding type VII secretion system. These features are consistent with anaerobic and pathogenic lifestyles. Finally, resistance to ciprofloxacin occurs by mutation in chromosomal genes that encode the subunits of DNA-gyrase (GyrA and topisomerase IV (ParC enzymes, while resistant to metronidazole was due to the frxA gene, which encodes NADPH

  2. The agents of natural genome editing.

    Science.gov (United States)

    Witzany, Guenther

    2011-06-01

    The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.

  3. Multiplicity of genome equivalents in the radiation-resistant bacterium Micrococcus radiodurans.

    Science.gov (United States)

    Hansen, M T

    1978-01-01

    The complexity of the genome of Micrococcus radiodurans was determined to be (2.0 +/- 0.3) X 10(9) daltons by DNA renaturation kinetics. The number of genome equivalents of DNA per cell was calculated from the complexity and the content of DNA. A lower limit of four genome equivalents per cell was approached with decreasing growth rate. Thus, no haploid stage appeared to be realized in this organism. The replication time was estimated from the kinetics and amount of residual DNA synthesis after inhibiting initiation of new rounds of replication. From this, the redundancy of terminal genetic markers was calculated to vary with growth rate from four to approximately eight copies per cell. All genetic material, including the least abundant, is thus multiply represented in each cell. The potential significance of the maintenance in each cell of multiple gene copies is discussed in relation to the extreme radiation resistance of M. radiodurans. PMID:649572

  4. A Relational Encoding of a Conceptual Model with Multiple Temporal Dimensions

    Science.gov (United States)

    Gubiani, Donatella; Montanari, Angelo

    The theoretical interest and the practical relevance of a systematic treatment of multiple temporal dimensions is widely recognized in the database and information system communities. Nevertheless, most relational databases have no temporal support at all. A few of them provide a limited support, in terms of temporal data types and predicates, constructors, and functions for the management of time values (borrowed from the SQL standard). One (resp., two) temporal dimensions are supported by historical and transaction-time (resp., bitemporal) databases only. In this paper, we provide a relational encoding of a conceptual model featuring four temporal dimensions, namely, the classical valid and transaction times, plus the event and availability times. We focus our attention on the distinctive technical features of the proposed temporal extension of the relation model. In the last part of the paper, we briefly show how to implement it in a standard DBMS.

  5. Multiple origins of interdependent endosymbiotic complexes in a genus of cicadas.

    Science.gov (United States)

    Łukasik, Piotr; Nazario, Katherine; Van Leuven, James T; Campbell, Matthew A; Meyer, Mariah; Michalik, Anna; Pessacq, Pablo; Simon, Chris; Veloso, Claudio; McCutcheon, John P

    2018-01-09

    Bacterial endosymbionts that provide nutrients to hosts often have genomes that are extremely stable in structure and gene content. In contrast, the genome of the endosymbiont Hodgkinia cicadicola has fractured into multiple distinct lineages in some species of the cicada genus Tettigades To better understand the frequency, timing, and outcomes of Hodgkinia lineage splitting throughout this cicada genus, we sampled cicadas over three field seasons in Chile and performed genomics and microscopy on representative samples. We found that a single ancestral Hodgkinia lineage has split at least six independent times in Tettigades over the last 4 million years, resulting in complexes of between two and six distinct Hodgkinia lineages per host. Individual genomes in these symbiotic complexes differ dramatically in relative abundance, genome size, organization, and gene content. Each Hodgkinia lineage retains a small set of core genes involved in genetic information processing, but the high level of gene loss experienced by all genomes suggests that extensive sharing of gene products among symbiont cells must occur. In total, Hodgkinia complexes that consist of multiple lineages encode nearly complete sets of genes present on the ancestral single lineage and presumably perform the same functions as symbionts that have not undergone splitting. However, differences in the timing of the splits, along with dissimilar gene loss patterns on the resulting genomes, have led to very different outcomes of lineage splitting in extant cicadas.

  6. Comparative genomic characterization of three Streptococcus parauberis strains in fish pathogen, as assessed by wide-genome analyses.

    Directory of Open Access Journals (Sweden)

    Seong-Won Nho

    Full Text Available Streptococcus parauberis, which is the main causative agent of streptococcosis among olive flounder (Paralichthys olivaceus in northeast Asia, can be distinctly divided into two groups (type I and type II by an agglutination test. Here, the whole genome sequences of two Japanese strains (KRS-02083 and KRS-02109 were determined and compared with the previously determined genome of a Korean strain (KCTC 11537. The genomes of S. parauberis are intermediate in size and have lower GC contents than those of other streptococci. We annotated 2,236 and 2,048 genes in KRS-02083 and KRS-02109, respectively. Our results revealed that the three S. parauberis strains contain different genomic insertions and deletions. In particular, the genomes of Korean and Japanese strains encode different factors for sugar utilization; the former encodes the phosphotransferase system (PTS for sorbose, whereas the latter encodes proteins for lactose hydrolysis, respectively. And the KRS-02109 strain, specifically, was the type II strain found to be able to resist phage infection through the clustered regularly interspaced short palindromic repeats (CRISPR/Cas system and which might contribute valuably to serologically distribution. Thus, our genome-wide association study shows that polymorphisms can affect pathogen responses, providing insight into biological/biochemical pathways and phylogenetic diversity.

  7. Complete Genome Sequence of Bradyrhizobium sp. S23321: Insights into Symbiosis Evolution in Soil Oligotrophs

    Science.gov (United States)

    Okubo, Takashi; Tsukui, Takahiro; Maita, Hiroko; Okamoto, Shinobu; Oshima, Kenshiro; Fujisawa, Takatomo; Saito, Akihiro; Futamata, Hiroyuki; Hattori, Reiko; Shimomura, Yumi; Haruta, Shin; Morimoto, Sho; Wang, Yong; Sakai, Yoriko; Hattori, Masahira; Aizawa, Shin-ichi; Nagashima, Kenji V. P.; Masuda, Sachiko; Hattori, Tsutomu; Yamashita, Akifumi; Bao, Zhihua; Hayatsu, Masahito; Kajiya-Kanegae, Hiromi; Yoshinaga, Ikuo; Sakamoto, Kazunori; Toyota, Koki; Nakao, Mitsuteru; Kohara, Mitsuyo; Anda, Mizue; Niwa, Rieko; Jung-Hwan, Park; Sameshima-Saito, Reiko; Tokuda, Shin-ichi; Yamamoto, Sumiko; Yamamoto, Syuji; Yokoyama, Tadashi; Akutsu, Tomoko; Nakamura, Yasukazu; Nakahira-Yanaka, Yuka; Hoshino, Yuko Takada; Hirakawa, Hideki; Mitsui, Hisayuki; Terasawa, Kimihiro; Itakura, Manabu; Sato, Shusei; Ikeda-Ohtsubo, Wakako; Sakakura, Natsuko; Kaminuma, Eli; Minamisawa, Kiwamu

    2012-01-01

    Bradyrhizobium sp. S23321 is an oligotrophic bacterium isolated from paddy field soil. Although S23321 is phylogenetically close to Bradyrhizobium japonicum USDA110, a legume symbiont, it is unable to induce root nodules in siratro, a legume often used for testing Nod factor-dependent nodulation. The genome of S23321 is a single circular chromosome, 7,231,841 bp in length, with an average GC content of 64.3%. The genome contains 6,898 potential protein-encoding genes, one set of rRNA genes, and 45 tRNA genes. Comparison of the genome structure between S23321 and USDA110 showed strong colinearity; however, the symbiosis islands present in USDA110 were absent in S23321, whose genome lacked a chaperonin gene cluster (groELS3) for symbiosis regulation found in USDA110. A comparison of sequences around the tRNA-Val gene strongly suggested that S23321 contains an ancestral-type genome that precedes the acquisition of a symbiosis island by horizontal gene transfer. Although S23321 contains a nif (nitrogen fixation) gene cluster, the organization, homology, and phylogeny of the genes in this cluster were more similar to those of photosynthetic bradyrhizobia ORS278 and BTAi1 than to those on the symbiosis island of USDA110. In addition, we found genes encoding a complete photosynthetic system, many ABC transporters for amino acids and oligopeptides, two types (polar and lateral) of flagella, multiple respiratory chains, and a system for lignin monomer catabolism in the S23321 genome. These features suggest that S23321 is able to adapt to a wide range of environments, probably including low-nutrient conditions, with multiple survival strategies in soil and rhizosphere. PMID:22452844

  8. Functional diversification upon leader protease domain duplication in the Citrus tristeza virus genome: Role of RNA sequences and the encoded proteins.

    Science.gov (United States)

    Kang, Sung-Hwan; Atallah, Osama O; Sun, Yong-Duo; Folimonova, Svetlana Y

    2018-01-15

    Viruses from the family Closteroviridae show an example of intra-genome duplications of more than one gene. In addition to the hallmark coat protein gene duplication, several members possess a tandem duplication of papain-like leader proteases. In this study, we demonstrate that domains encoding the L1 and L2 proteases in the Citrus tristeza virus genome underwent a significant functional divergence at the RNA and protein levels. We show that the L1 protease is crucial for viral accumulation and establishment of initial infection, whereas its coding region is vital for virus transport. On the other hand, the second protease is indispensable for virus infection of its natural citrus host, suggesting that L2 has evolved an important adaptive function that mediates virus interaction with the woody host. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. The complete mitochondrial genome sequence of Eimeria innocua (Eimeriidae, Coccidia, Apicomplexa).

    Science.gov (United States)

    Hafeez, Mian Abdul; Vrba, Vladimir; Barta, John Robert

    2016-07-01

    The complete mitochondrial genome of Eimeria innocua KR strain (Eimeriidae, Coccidia, Apicomplexa) was sequenced. This coccidium infects turkeys (Meleagris gallopavo), Bobwhite quails (Colinus virginianus), and Grey partridges (Perdix perdix). Genome organization and gene contents were comparable with other Eimeria spp. infecting galliform birds. The circular-mapping mt genome of E. innocua is 6247 bp in length with three protein-coding genes (cox1, cox3, and cytb), 19 gene fragments encoding large subunit (LSU) rRNA and 14 gene fragments encoding small subunit (SSU) rRNA. Like other Apicomplexa, no tRNA was encoded. The mitochondrial genome of E. innocua confirms its close phylogenetic affinities to Eimeria dispersa.

  10. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species

    Science.gov (United States)

    Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...

  11. Inter- and intra-specific pan-genomes of Borrelia burgdorferi sensu lato: genome stability and adaptive radiation

    Science.gov (United States)

    2013-01-01

    Background Lyme disease is caused by spirochete bacteria from the Borrelia burgdorferi sensu lato (B. burgdorferi s.l.) species complex. To reconstruct the evolution of B. burgdorferi s.l. and identify the genomic basis of its human virulence, we compared the genomes of 23 B. burgdorferi s.l. isolates from Europe and the United States, including B. burgdorferi sensu stricto (B. burgdorferi s.s., 14 isolates), B. afzelii (2), B. garinii (2), B. “bavariensis” (1), B. spielmanii (1), B. valaisiana (1), B. bissettii (1), and B. “finlandensis” (1). Results Robust B. burgdorferi s.s. and B. burgdorferi s.l. phylogenies were obtained using genome-wide single-nucleotide polymorphisms, despite recombination. Phylogeny-based pan-genome analysis showed that the rate of gene acquisition was higher between species than within species, suggesting adaptive speciation. Strong positive natural selection drives the sequence evolution of lipoproteins, including chromosomally-encoded genes 0102 and 0404, cp26-encoded ospC and b08, and lp54-encoded dbpA, a07, a22, a33, a53, a65. Computer simulations predicted rapid adaptive radiation of genomic groups as population size increases. Conclusions Intra- and inter-specific pan-genome sizes of B. burgdorferi s.l. expand linearly with phylogenetic diversity. Yet gene-acquisition rates in B. burgdorferi s.l. are among the lowest in bacterial pathogens, resulting in high genome stability and few lineage-specific genes. Genome adaptation of B. burgdorferi s.l. is driven predominantly by copy-number and sequence variations of lipoprotein genes. New genomic groups are likely to emerge if the current trend of B. burgdorferi s.l. population expansion continues. PMID:24112474

  12. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  13. Relaxation of selective constraints causes independent selenoprotein extinction in insect genomes.

    Directory of Open Access Journals (Sweden)

    Charles E Chapple

    Full Text Available BACKGROUND: Selenoproteins are a diverse family of proteins notable for the presence of the 21st amino acid, selenocysteine. Until very recently, all metazoan genomes investigated encoded selenoproteins, and these proteins had therefore been believed to be essential for animal life. Challenging this assumption, recent comparative analyses of insect genomes have revealed that some insect genomes appear to have lost selenoprotein genes. METHODOLOGY/PRINCIPAL FINDINGS: In this paper we investigate in detail the fate of selenoproteins, and that of selenoprotein factors, in all available arthropod genomes. We use a variety of in silico comparative genomics approaches to look for known selenoprotein genes and factors involved in selenoprotein biosynthesis. We have found that five insect species have completely lost the ability to encode selenoproteins and that selenoprotein loss in these species, although so far confined to the Endopterygota infraclass, cannot be attributed to a single evolutionary event, but rather to multiple, independent events. Loss of selenoproteins and selenoprotein factors is usually coupled to the deletion of the entire no-longer functional genomic region, rather than to sequence degradation and consequent pseudogenisation. Such dynamics of gene extinction are consistent with the high rate of genome rearrangements observed in Drosophila. We have also found that, while many selenoprotein factors are concomitantly lost with the selenoproteins, others are present and conserved in all investigated genomes, irrespective of whether they code for selenoproteins or not, suggesting that they are involved in additional, non-selenoprotein related functions. CONCLUSIONS/SIGNIFICANCE: Selenoproteins have been independently lost in several insect species, possibly as a consequence of the relaxation in insects of the selective constraints acting across metazoans to maintain selenoproteins. The dispensability of selenoproteins in insects may

  14. Diverse Lifestyles and Strategies of Plant Pathogenesis Encoded in the Genomes of Eighteen Dothideomycetes Fungi

    Energy Technology Data Exchange (ETDEWEB)

    Ohm, Robin A.; Feau, Nicolas; Henrissat, Bernard; Schoch, Conrad L.; Horwitz, Benjamin A.; Barry, Kerrie W.; Condon, Bradford J.; Copeland, Alex C.; Dhillon, Braham; Glaser, Fabian; Hesse, Cedar N.; Kosti, Idit; LaButti, Kurt; Lindquist, Erika A.; Lucas, Susan; Salamov, Asaf A.; Bradshaw, Rosie E.; Ciuffetti, Lynda; Hamelin, Richard C.; Kema, Gert H. J.; Lawrence, Christopher; Scott, James A.; Spatafora, Joseph W.; Turgeon, B. Gillian; Wit, Pierre J. G. M. de; Zhong, Shaobin; Goodwin, Stephen B.; Grigoriev, Igor V.

    2012-02-29

    The class Dothideomycetes is one of the largest groups of fungi with a high level of ecological diversity including many plant pathogens infecting a broad range of hosts. Here, we compare genome features of 18 members of this class, including 6 necrotrophs, 9 (hemi)biotrophs and 3 saprotrophs, to analyze genome structure, evolution, and the diverse strategies of pathogenesis. The Dothideomycetes most likely evolved from a common ancestor more than 280 million years ago. The 18 genome sequences differ dramatically in size due to variation in repetitive content, but show much less variation in number of (core) genes. Gene order appears to have been rearranged mostly within chromosomal boundaries by multiple inversions, in extant genomes frequently demarcated by adjacent simple repeats. Several Dothideomycetes contain one or more gene-poor, transposable element (TE)-rich putatively dispensable chromosomes of unknown function. The 18 Dothideomycetes offer an extensive catalogue of genes involved in cellulose degradation, proteolysis, secondary metabolism, and cysteine-rich small secreted proteins. Ancestors of the two major orders of plant pathogens in the Dothideomycetes, the Capnodiales and Pleosporales, may have had different modes of pathogenesis, with the former having fewer of these genes than the latter. Many of these genes are enriched in proximity to transposable elements, suggesting faster evolution because of the effects of repeat induced point (RIP) mutations. A syntenic block of genes, including oxidoreductases, is conserved in most Dothideomycetes and upregulated during infection in L. maculans, suggesting a possible function in response to oxidative stress.

  15. Characterization of the legumains encoded by the genome of Theobroma cacao L.

    Science.gov (United States)

    Santana, Juliano Oliveira; Freire, Laís; de Sousa, Aurizangela Oliveira; Fontes Soares, Virgínia Lúcia; Gramacho, Karina Peres; Pirovani, Carlos Priminho

    2016-01-01

    Legumains are cysteine proteases related to plant development, protein degradation, programmed cell death, and defense against pathogens. In this study, we have identified and characterized three legumains encoded by Theobroma cacao genome through in silico analyses, three-dimensional modeling, genetic expression pattern in different tissues and as a response to the inoculation of Moniliophthora perniciosa fungus. The three proteins were named TcLEG3, TcLEG6, and TcLEG9. Histidine and cysteine residue which are part of the catalytic site were conserved among the proteins, and they remained parallel in the loop region in the 3D modeling. Three-dimensional modeling showed that the propeptide, which is located in the terminal C region of legumains blocks the catalytic cleft. Comparing dendrogram data with the relative expression analysis, indicated that TcLEG3 is related to the seed legumain group, TcLEG6 is related with the group of embryogenesis activities, and protein TcLEG9, with processes regarding the vegetative group. Furthermore, the expression analyses proposes a significant role for the three legumains during the development of Theobroma cacao and in its interaction with M. perniciosa. Copyright © 2015 Universidade Estadual de Santa Cruz, CNPJ: 40738999/0001-95. Published by Elsevier Masson SAS.. All rights reserved.

  16. Identification of endogenous retroviral reading frames in the human genome

    Directory of Open Access Journals (Sweden)

    Wiuf Carsten

    2004-10-01

    Full Text Available Abstract Background Human endogenous retroviruses (HERVs comprise a large class of repetitive retroelements. Most HERVs are ancient and invaded our genome at least 25 million years ago, except for the evolutionary young HERV-K group. The far majority of the encoded genes are degenerate due to mutational decay and only a few non-HERV-K loci are known to retain intact reading frames. Additional intact HERV genes may exist, since retroviral reading frames have not been systematically annotated on a genome-wide scale. Results By clustering of hits from multiple BLAST searches using known retroviral sequences we have mapped 1.1% of the human genome as retrovirus related. The coding potential of all identified HERV regions were analyzed by annotating viral open reading frames (vORFs and we report 7836 loci as verified by protein homology criteria. Among 59 intact or almost-intact viral polyproteins scattered around the human genome we have found 29 envelope genes including two novel gammaretroviral types. One encodes a protein similar to a recently discovered zebrafish retrovirus (ZFERV while another shows partial, C-terminal, homology to Syncytin (HERV-W/FRD. Conclusions This compilation of HERV sequences and their coding potential provide a useful tool for pursuing functional analysis such as RNA expression profiling and effects of viral proteins, which may, in turn, reveal a role for HERVs in human health and disease. All data are publicly available through a database at http://www.retrosearch.dk.

  17. Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop.

    Science.gov (United States)

    Hatakeyama, Masaomi; Aluri, Sirisha; Balachadran, Mathi Thumilan; Sivarajan, Sajeevan Radha; Patrignani, Andrea; Grüter, Simon; Poveda, Lucy; Shimizu-Inatsugi, Rie; Baeten, John; Francoijs, Kees-Jan; Nataraja, Karaba N; Reddy, Yellodu A Nanja; Phadnis, Shamprasad; Ravikumar, Ramapura L; Schlapbach, Ralph; Sreeman, Sheshshayee M; Shimizu, Kentaro K

    2017-09-05

    Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length >2.6 Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  18. Prioritizing multiple therapeutic targets in parallel using automated DNA-encoded library screening

    Science.gov (United States)

    Machutta, Carl A.; Kollmann, Christopher S.; Lind, Kenneth E.; Bai, Xiaopeng; Chan, Pan F.; Huang, Jianzhong; Ballell, Lluis; Belyanskaya, Svetlana; Besra, Gurdyal S.; Barros-Aguirre, David; Bates, Robert H.; Centrella, Paolo A.; Chang, Sandy S.; Chai, Jing; Choudhry, Anthony E.; Coffin, Aaron; Davie, Christopher P.; Deng, Hongfeng; Deng, Jianghe; Ding, Yun; Dodson, Jason W.; Fosbenner, David T.; Gao, Enoch N.; Graham, Taylor L.; Graybill, Todd L.; Ingraham, Karen; Johnson, Walter P.; King, Bryan W.; Kwiatkowski, Christopher R.; Lelièvre, Joël; Li, Yue; Liu, Xiaorong; Lu, Quinn; Lehr, Ruth; Mendoza-Losana, Alfonso; Martin, John; McCloskey, Lynn; McCormick, Patti; O'Keefe, Heather P.; O'Keeffe, Thomas; Pao, Christina; Phelps, Christopher B.; Qi, Hongwei; Rafferty, Keith; Scavello, Genaro S.; Steiginga, Matt S.; Sundersingh, Flora S.; Sweitzer, Sharon M.; Szewczuk, Lawrence M.; Taylor, Amy; Toh, May Fern; Wang, Juan; Wang, Minghui; Wilkins, Devan J.; Xia, Bing; Yao, Gang; Zhang, Jean; Zhou, Jingye; Donahue, Christine P.; Messer, Jeffrey A.; Holmes, David; Arico-Muendel, Christopher C.; Pope, Andrew J.; Gross, Jeffrey W.; Evindar, Ghotas

    2017-07-01

    The identification and prioritization of chemically tractable therapeutic targets is a significant challenge in the discovery of new medicines. We have developed a novel method that rapidly screens multiple proteins in parallel using DNA-encoded library technology (ELT). Initial efforts were focused on the efficient discovery of antibacterial leads against 119 targets from Acinetobacter baumannii and Staphylococcus aureus. The success of this effort led to the hypothesis that the relative number of ELT binders alone could be used to assess the ligandability of large sets of proteins. This concept was further explored by screening 42 targets from Mycobacterium tuberculosis. Active chemical series for six targets from our initial effort as well as three chemotypes for DHFR from M. tuberculosis are reported. The findings demonstrate that parallel ELT selections can be used to assess ligandability and highlight opportunities for successful lead and tool discovery.

  19. Genome-to-genome analysis highlights the effect of the human innate and adaptive immune systems on the hepatitis C virus.

    Science.gov (United States)

    Ansari, M Azim; Pedergnana, Vincent; L C Ip, Camilla; Magri, Andrea; Von Delft, Annette; Bonsall, David; Chaturvedi, Nimisha; Bartha, Istvan; Smith, David; Nicholson, George; McVean, Gilean; Trebes, Amy; Piazza, Paolo; Fellay, Jacques; Cooke, Graham; Foster, Graham R; Hudson, Emma; McLauchlan, John; Simmonds, Peter; Bowden, Rory; Klenerman, Paul; Barnes, Eleanor; Spencer, Chris C A

    2017-05-01

    Outcomes of hepatitis C virus (HCV) infection and treatment depend on viral and host genetic factors. Here we use human genome-wide genotyping arrays and new whole-genome HCV viral sequencing technologies to perform a systematic genome-to-genome study of 542 individuals who were chronically infected with HCV, predominantly genotype 3. We show that both alleles of genes encoding human leukocyte antigen molecules and genes encoding components of the interferon lambda innate immune system drive viral polymorphism. Additionally, we show that IFNL4 genotypes determine HCV viral load through a mechanism dependent on a specific amino acid residue in the HCV NS5A protein. These findings highlight the interplay between the innate immune system and the viral genome in HCV control.

  20. A model for visual memory encoding.

    Directory of Open Access Journals (Sweden)

    Rodolphe Nenert

    Full Text Available Memory encoding engages multiple concurrent and sequential processes. While the individual processes involved in successful encoding have been examined in many studies, a sequence of events and the importance of modules associated with memory encoding has not been established. For this reason, we sought to perform a comprehensive examination of the network for memory encoding using data driven methods and to determine the directionality of the information flow in order to build a viable model of visual memory encoding. Forty healthy controls ages 19-59 performed a visual scene encoding task. FMRI data were preprocessed using SPM8 and then processed using independent component analysis (ICA with the reliability of the identified components confirmed using ICASSO as implemented in GIFT. The directionality of the information flow was examined using Granger causality analyses (GCA. All participants performed the fMRI task well above the chance level (>90% correct on both active and control conditions and the post-fMRI testing recall revealed correct memory encoding at 86.33 ± 5.83%. ICA identified involvement of components of five different networks in the process of memory encoding, and the GCA allowed for the directionality of the information flow to be assessed, from visual cortex via ventral stream to the attention network and then to the default mode network (DMN. Two additional networks involved in this process were the cerebellar and the auditory-insular network. This study provides evidence that successful visual memory encoding is dependent on multiple modules that are part of other networks that are only indirectly related to the main process. This model may help to identify the node(s of the network that are affected by a specific disease processes and explain the presence of memory encoding difficulties in patients in whom focal or global network dysfunction exists.

  1. A model for visual memory encoding.

    Science.gov (United States)

    Nenert, Rodolphe; Allendorfer, Jane B; Szaflarski, Jerzy P

    2014-01-01

    Memory encoding engages multiple concurrent and sequential processes. While the individual processes involved in successful encoding have been examined in many studies, a sequence of events and the importance of modules associated with memory encoding has not been established. For this reason, we sought to perform a comprehensive examination of the network for memory encoding using data driven methods and to determine the directionality of the information flow in order to build a viable model of visual memory encoding. Forty healthy controls ages 19-59 performed a visual scene encoding task. FMRI data were preprocessed using SPM8 and then processed using independent component analysis (ICA) with the reliability of the identified components confirmed using ICASSO as implemented in GIFT. The directionality of the information flow was examined using Granger causality analyses (GCA). All participants performed the fMRI task well above the chance level (>90% correct on both active and control conditions) and the post-fMRI testing recall revealed correct memory encoding at 86.33 ± 5.83%. ICA identified involvement of components of five different networks in the process of memory encoding, and the GCA allowed for the directionality of the information flow to be assessed, from visual cortex via ventral stream to the attention network and then to the default mode network (DMN). Two additional networks involved in this process were the cerebellar and the auditory-insular network. This study provides evidence that successful visual memory encoding is dependent on multiple modules that are part of other networks that are only indirectly related to the main process. This model may help to identify the node(s) of the network that are affected by a specific disease processes and explain the presence of memory encoding difficulties in patients in whom focal or global network dysfunction exists.

  2. PSP: rapid identification of orthologous coding genes under positive selection across multiple closely related prokaryotic genomes.

    Science.gov (United States)

    Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping

    2013-12-27

    With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.

  3. Characterization of Urtica dioica agglutinin isolectins and the encoding gene family.

    Science.gov (United States)

    Does, M P; Ng, D K; Dekker, H L; Peumans, W J; Houterman, P M; Van Damme, E J; Cornelissen, B J

    1999-01-01

    Urtica dioica agglutinin (UDA) has previously been found in roots and rhizomes of stinging nettles as a mixture of UDA-isolectins. Protein and cDNA sequencing have shown that mature UDA is composed of two hevein domains and is processed from a precursor protein. The precursor contains a signal peptide, two in-tandem hevein domains, a hinge region and a carboxyl-terminal chitinase domain. Genomic fragments encoding precursors for UDA-isolectins have been amplified by five independent polymerase chain reactions on genomic DNA from stinging nettle ecotype Weerselo. One amplified gene was completely sequenced. As compared to the published cDNA sequence, the genomic sequence contains, besides two basepair substitutions, two introns located at the same positions as in other plant chitinases. By partial sequence analysis of 40 amplified genes, 16 different genes were identified which encode seven putative UDA-isolectins. The deduced amino acid sequences share 78.9-98.9% identity. In extracts of roots and rhizomes of stinging nettle ecotype Weerselo six out of these seven isolectins were detected by mass spectrometry. One of them is an acidic form, which has not been identified before. Our results demonstrate that UDA is encoded by a large gene family.

  4. Multiple recent horizontal transfers of a large genomic region in cheese making fungi.

    Science.gov (United States)

    Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

    2014-01-01

    While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti--called Wallaby--present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes.

  5. Cloning, expression and characterisation of a novel gene encoding ...

    African Journals Online (AJOL)

    微软用户

    2012-01-12

    Jan 12, 2012 ... ... characterisation of a novel gene encoding a chemosensory protein from Bemisia ... The genomic DNA sequence comparisons revealed a 1490 bp intron ... have several conserved sequence motifs, including the. N-terminal ...

  6. Diverse lifestyles and strategies of plant pathogenesis encoded in the genomes of eighteen Dothideomycetes fungi.

    Directory of Open Access Journals (Sweden)

    Robin A Ohm

    Full Text Available The class Dothideomycetes is one of the largest groups of fungi with a high level of ecological diversity including many plant pathogens infecting a broad range of hosts. Here, we compare genome features of 18 members of this class, including 6 necrotrophs, 9 (hemibiotrophs and 3 saprotrophs, to analyze genome structure, evolution, and the diverse strategies of pathogenesis. The Dothideomycetes most likely evolved from a common ancestor more than 280 million years ago. The 18 genome sequences differ dramatically in size due to variation in repetitive content, but show much less variation in number of (core genes. Gene order appears to have been rearranged mostly within chromosomal boundaries by multiple inversions, in extant genomes frequently demarcated by adjacent simple repeats. Several Dothideomycetes contain one or more gene-poor, transposable element (TE-rich putatively dispensable chromosomes of unknown function. The 18 Dothideomycetes offer an extensive catalogue of genes involved in cellulose degradation, proteolysis, secondary metabolism, and cysteine-rich small secreted proteins. Ancestors of the two major orders of plant pathogens in the Dothideomycetes, the Capnodiales and Pleosporales, may have had different modes of pathogenesis, with the former having fewer of these genes than the latter. Many of these genes are enriched in proximity to transposable elements, suggesting faster evolution because of the effects of repeat induced point (RIP mutations. A syntenic block of genes, including oxidoreductases, is conserved in most Dothideomycetes and upregulated during infection in L. maculans, suggesting a possible function in response to oxidative stress.

  7. Advances in Genomics of Entomopathogenic Fungi.

    Science.gov (United States)

    Wang, J B; St Leger, R J; Wang, C

    2016-01-01

    Fungi are the commonest pathogens of insects and crucial regulators of insect populations. The rapid advance of genome technologies has revolutionized our understanding of entomopathogenic fungi with multiple Metarhizium spp. sequenced, as well as Beauveria bassiana, Cordyceps militaris, and Ophiocordyceps sinensis among others. Phylogenomic analysis suggests that the ancestors of many of these fungi were plant endophytes or pathogens, with entomopathogenicity being an acquired characteristic. These fungi now occupy a wide range of habitats and hosts, and their genomes have provided a wealth of information on the evolution of virulence-related characteristics, as well as the protein families and genomic structure associated with ecological and econutritional heterogeneity, genome evolution, and host range diversification. In particular, their evolutionary transition from plant pathogens or endophytes to insect pathogens provides a novel perspective on how new functional mechanisms important for host switching and virulence are acquired. Importantly, genomic resources have helped make entomopathogenic fungi ideal model systems for answering basic questions in parasitology, entomology, and speciation. At the same time, identifying the selective forces that act upon entomopathogen fitness traits could underpin both the development of new mycoinsecticides and further our understanding of the natural roles of these fungi in nature. These roles frequently include mutualistic relationships with plants. Genomics has also facilitated the rapid identification of genes encoding biologically useful molecules, with implications for the development of pharmaceuticals and the use of these fungi as bioreactors. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Asymmetrical distribution of non-conserved regulatory sequences at PHOX2B is reflected at the ENCODE loci and illuminates a possible genome-wide trend

    Directory of Open Access Journals (Sweden)

    McCallion Andrew S

    2009-01-01

    Full Text Available Abstract Background Transcriptional regulatory elements are central to development and interspecific phenotypic variation. Current regulatory element prediction tools rely heavily upon conservation for prediction of putative elements. Recent in vitro observations from the ENCODE project combined with in vivo analyses at the zebrafish phox2b locus suggests that a significant fraction of regulatory elements may fall below commonly applied metrics of conservation. We propose to explore these observations in vivo at the human PHOX2B locus, and also evaluate the potential evidence for genome-wide applicability of these observations through a novel analysis of extant data. Results Transposon-based transgenic analysis utilizing a tiling path proximal to human PHOX2B in zebrafish recapitulates the observations at the zebrafish phox2b locus of both conserved and non-conserved regulatory elements. Analysis of human sequences conserved with previously identified zebrafish phox2b regulatory elements demonstrates that the orthologous sequences exhibit overlapping regulatory control. Additionally, analysis of non-conserved sequences scattered over 135 kb 5' to PHOX2B, provides evidence of non-conserved regulatory elements positively biased with close proximity to the gene. Furthermore, we provide a novel analysis of data from the ENCODE project, finding a non-uniform distribution of regulatory elements consistent with our in vivo observations at PHOX2B. These observations remain largely unchanged when one accounts for the sequence repeat content of the assayed intervals, when the intervals are sub-classified by biological role (developmental versus non-developmental, or by gene density (gene desert versus non-gene desert. Conclusion While regulatory elements frequently display evidence of evolutionary conservation, a fraction appears to be undetected by current metrics of conservation. In vivo observations at the PHOX2B locus, supported by our analyses of in

  9. Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Ma, Shuangge

    2012-01-01

    Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092

  10. Differential distribution of a SINE element in the Entamoeba histolytica and Entamoeba dispar genomes: Role of the LINE-encoded endonuclease

    Directory of Open Access Journals (Sweden)

    Gupta Abhishek K

    2011-05-01

    Full Text Available Abstract Background Entamoeba histolytica and Entamoeba dispar are closely related protistan parasites but while E. histolytica can be invasive, E. dispar is completely non pathogenic. Transposable elements constitute a significant portion of the genome in these species; there being three families of LINEs and SINEs. These elements can profoundly influence the expression of neighboring genes. Thus their genomic location can have important phenotypic consequences. A genome-wide comparison of the location of these elements in the E. histolytica and E. dispar genomes has not been carried out. It is also not known whether the retrotransposition machinery works similarly in both species. The present study was undertaken to address these issues. Results Here we extracted all genomic occurrences of full-length copies of EhSINE1 in the E. histolytica genome and matched them with the homologous regions in E. dispar, and vice versa, wherever it was possible to establish synteny. We found that only about 20% of syntenic sites were occupied by SINE1 in both species. We checked whether the different genomic location in the two species was due to differences in the activity of the LINE-encoded endonuclease which is required for nicking the target site. We found that the endonucleases of both species were essentially very similar, both in their kinetic properties and in their substrate sequence specificity. Hence the differential distribution of SINEs in these species is not likely to be influenced by the endonuclease. Further we found that the physical properties of the DNA sequences adjoining the insertion sites were similar in both species. Conclusions Our data shows that the basic retrotransposition machinery is conserved in these sibling species. SINEs may indeed have occupied all of the insertion sites in the genome of the common ancestor of E. histolytica and E. dispar but these may have been subsequently lost from some locations. Alternatively, SINE

  11. Genome-wide associations of gene expression variation in humans.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  12. Genome-Wide Associations of Gene Expression Variation in Humans.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  13. Genetic variants in nuclear-encoded mitochondrial genes influence AIDS progression.

    Directory of Open Access Journals (Sweden)

    Sher L Hendrickson

    2010-09-01

    Full Text Available The human mitochondrial genome includes only 13 coding genes while nuclear-encoded genes account for 99% of proteins responsible for mitochondrial morphology, redox regulation, and energetics. Mitochondrial pathogenesis occurs in HIV patients and genetically, mitochondrial DNA haplogroups with presumed functional differences have been associated with differential AIDS progression.Here we explore whether single nucleotide polymorphisms (SNPs within 904 of the estimated 1,500 genes that specify nuclear-encoded mitochondrial proteins (NEMPs influence AIDS progression among HIV-1 infected patients. We examined NEMPs for association with the rate of AIDS progression using genotypes generated by an Affymetrix 6.0 genotyping array of 1,455 European American patients from five US AIDS cohorts. Successfully genotyped SNPs gave 50% or better haplotype coverage for 679 of known NEMP genes. With a Bonferroni adjustment for the number of genes and tests examined, multiple SNPs within two NEMP genes showed significant association with AIDS progression: acyl-CoA synthetase medium-chain family member 4 (ACSM4 on chromosome 12 and peroxisomal D3,D2-enoyl-CoA isomerase (PECI on chromosome 6.Our previous studies on mitochondrial DNA showed that European haplogroups with presumed functional differences were associated with AIDS progression and HAART mediated adverse events. The modest influences of nuclear-encoded mitochondrial genes found in the current study add support to the idea that mitochondrial function plays a role in AIDS pathogenesis.

  14. Heterogeneic dynamics of the structures of multiple gene clusters in two pathogenetically different lines originating from the same phytoplasma.

    Science.gov (United States)

    Arashida, Ryo; Kakizawa, Shigeyuki; Hoshi, Ayaka; Ishii, Yoshiko; Jung, Hee-Young; Kagiwada, Satoshi; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

    2008-04-01

    Phytoplasmas are phloem-limited plant pathogens that are transmitted by insect vectors and are associated with diseases in hundreds of plant species. Despite their small sizes, phytoplasma genomes have repeat-rich sequences, which are due to several genes that are encoded as multiple copies. These multiple genes exist in a gene cluster, the potential mobile unit (PMU). PMUs are present at several distinct regions in the phytoplasma genome. The multicopy genes encoded by PMUs (herein named mobile unit genes [MUGs]) and similar genes elsewhere in the genome (herein named fundamental genes [FUGs]) are likely to have the same function based on their annotations. In this manuscript we show evidence that MUGs and FUGs do not cluster together within the same clade. Each MUG is in a cluster with a short branch length, suggesting that MUGs are recently diverged paralogs, whereas the origin of FUGs is different from that of MUGs. We also compared the genome structures around the lplA gene in two derivative lines of the 'Candidatus Phytoplasma asteris' OY strain, the severe-symptom line W (OY-W) and the mild-symptom line M (OY-M). The gene organizations of the nucleotide sequences upstream of the lplA genes of OY-W and OY-M were dramatically different. The tra5 insertion sequence, an element of PMUs, was found only in this region in OY-W. These results suggest that transposition of entire PMUs and PMU sections has occurred frequently in the OY phytoplasma genome. The difference in the pathogenicities of OY-W and OY-M might be caused by the duplication and transposition of PMUs, followed by genome rearrangement.

  15. Genomic instability--an evolving hallmark of cancer.

    Science.gov (United States)

    Negrini, Simona; Gorgoulis, Vassilis G; Halazonetis, Thanos D

    2010-03-01

    Genomic instability is a characteristic of most cancers. In hereditary cancers, genomic instability results from mutations in DNA repair genes and drives cancer development, as predicted by the mutator hypothesis. In sporadic (non-hereditary) cancers the molecular basis of genomic instability remains unclear, but recent high-throughput sequencing studies suggest that mutations in DNA repair genes are infrequent before therapy, arguing against the mutator hypothesis for these cancers. Instead, the mutation patterns of the tumour suppressor TP53 (which encodes p53), ataxia telangiectasia mutated (ATM) and cyclin-dependent kinase inhibitor 2A (CDKN2A; which encodes p16INK4A and p14ARF) support the oncogene-induced DNA replication stress model, which attributes genomic instability and TP53 and ATM mutations to oncogene-induced DNA damage.

  16. An evaluation of multiple annealing and looping based genome amplification using a synthetic bacterial community

    KAUST Repository

    Wang, Yong; Gao, Zhaoming; Xu, Ying; Li, Guangyu; He, Lisheng; Qian, Peiyuan

    2016-01-01

    -generation-sequencing technology. Using a synthetic bacterial community, the amplification efficiency of the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) kit that is originally developed to amplify the single-cell genomic DNA of mammalian organisms

  17. Hepatitis A virus-encoded miRNAs attenuate the accumulation of viral genomic RNAs in infected cells.

    Science.gov (United States)

    Shi, Jiandong; Sun, Jing; Wu, Meini; Hu, Ningzhu; Hu, Yunzhang

    2016-06-01

    The establishment of persistent infection with hepatitis A virus (HAV) is the common result of most HAV/cell culture systems. Previous observations show that the synthesis of viral RNAs is reduced during infection. However, the underlying mechanism is poorly understood. We characterized three HAV-encoded miRNAs in our previous study. In this study, we aim to investigate the impact of these miRNAs on the accumulation of viral RNAs. The results indicated that the synthesis of viral genomic RNAs was dramatically reduced (more than 75 % reduction, P viral miRNA mimics. Conversely, they were significantly increased (more than 3.3-fold addition, P viral miRNA inhibitors. The luciferase reporter assay of miRNA targets showed that viral miRNAs were fully complementary to specific sites of the viral plus or minus strand RNA and strongly inhibited their expressions. Further data showed that the relative abundance of viral genomic RNA fragments that contain miRNA targets was also dramatically reduced (more than 80 % reduction, P viral miRNAs were overexpressed with miRNA mimics. In contrast, they were significantly increased (approximately 2-fold addition, P viral miRNAs were inhibited with miRNA inhibitors. In conclusion, these data suggest a possible mechanism for the reduction of viral RNA synthesis during HAV infection. Thus, we propose that it is likely that RNA virus-derived miRNA could serve as a self-mediated feedback regulator during infection.

  18. The dnd operon for DNA phosphorothioation modification system in Escherichia coli is located in diverse genomic islands.

    Science.gov (United States)

    Ho, Wing Sze; Ou, Hong-Yu; Yeo, Chew Chieng; Thong, Kwai Lin

    2015-03-17

    Strains of Escherichia coli that are non-typeable by pulsed-field gel electrophoresis (PFGE) due to in-gel degradation can influence their molecular epidemiological data. The DNA degradation phenotype (Dnd(+)) is mediated by the dnd operon that encode enzymes catalyzing the phosphorothioation of DNA, rendering the modified DNA susceptible to oxidative cleavage during a PFGE run. In this study, a PCR assay was developed to detect the presence of the dnd operon in Dnd(+) E. coli strains and to improve their typeability. Investigations into the genetic environments of the dnd operon in various E. coli strains led to the discovery that the dnd operon is harboured in various diverse genomic islands. The dndBCDE genes (dnd operon) were detected in all Dnd(+) E. coli strains by PCR. The addition of thiourea improved the typeability of Dnd(+) E. coli strains to 100% using PFGE and the Dnd(+) phenotype can be observed in both clonal and genetically diverse E. coli strains. Genomic analysis of 101 dnd operons from genome sequences of Enterobacteriaceae revealed that the dnd operons of the same bacterial species were generally clustered together in the phylogenetic tree. Further analysis of dnd operons of 52 E. coli genomes together with their respective immediate genetic environments revealed a total of 7 types of genetic organizations, all of which were found to be associated with genomic islands designated dnd-encoding GIs. The dnd-encoding GIs displayed mosaic structure and the genomic context of the 7 islands (with 1 representative genome from each type of genetic organization) were also highly variable, suggesting multiple recombination events. This is also the first report where two dnd operons were found within a strain although the biological implication is unknown. Surprisingly, dnd operons were frequently found in pathogenic E. coli although their link with virulence has not been explored. Genomic islands likely play an important role in facilitating the horizontal

  19. Negative base encoding in optical linear algebra processors

    Science.gov (United States)

    Perlee, C.; Casasent, D.

    1986-01-01

    In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.

  20. Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park.

    Science.gov (United States)

    Podar, Mircea; Makarova, Kira S; Graham, David E; Wolf, Yuri I; Koonin, Eugene V; Reysenbach, Anna-Louise

    2013-04-22

    A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia.

  1. Genome and Transcriptome of Clostridium phytofermentans, Catalyst for the Direct Conversion of Plant Feedstocks to Fuels.

    Directory of Open Access Journals (Sweden)

    Elsa Petit

    Full Text Available Clostridium phytofermentans was isolated from forest soil and is distinguished by its capacity to directly ferment plant cell wall polysaccharides into ethanol as the primary product, suggesting that it possesses unusual catabolic pathways. The objective of the present study was to understand the molecular mechanisms of biomass conversion to ethanol in a single organism, Clostridium phytofermentans, by analyzing its complete genome and transcriptome during growth on plant carbohydrates. The saccharolytic versatility of C. phytofermentans is reflected in a diversity of genes encoding ATP-binding cassette sugar transporters and glycoside hydrolases, many of which may have been acquired through horizontal gene transfer. These genes are frequently organized as operons that may be controlled individually by the many transcriptional regulators identified in the genome. Preferential ethanol production may be due to high levels of expression of multiple ethanol dehydrogenases and additional pathways maximizing ethanol yield. The genome also encodes three different proteinaceous bacterial microcompartments with the capacity to compartmentalize pathways that divert fermentation intermediates to various products. These characteristics make C. phytofermentans an attractive resource for improving the efficiency and speed of biomass conversion to biofuels.

  2. A highly divergent gene cluster in honey bees encodes a novel silk family.

    Science.gov (United States)

    Sutherland, Tara D; Campbell, Peter M; Weisman, Sarah; Trueman, Holly E; Sriskantha, Alagacone; Wanjura, Wolfgang J; Haritos, Victoria S

    2006-11-01

    The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1-4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-rich amid low GC intergenic regions. The genes encode similar proteins that are highly helical and predicted to form unusually tight coiled coils. Despite the similarity in size, structure, and composition of the encoded proteins, the genes have low primary sequence identity. We propose that the four fiber genes have arisen from gene duplication events but have subsequently diverged significantly. The silk-associated genes encode proteins likely to act as a glue (AmelSA1) and involved in silk processing (AmelSA2). Although the silks of honey bees and silkmoths both originate in larval labial glands, the silk proteins are completely different in their primary, secondary, and tertiary structures as well as the genomic arrangement of the genes encoding them. This implies independent evolutionary origins for these functionally related proteins.

  3. An evolvable oestrogen receptor activity sensor: development of a modular system for integrating multiple genes into the yeast genome

    NARCIS (Netherlands)

    Fox, J.E.; Bridgham, J.T.; Bovee, T.F.H.; Thornton, J.W.

    2007-01-01

    To study a gene interaction network, we developed a gene-targeting strategy that allows efficient and stable genomic integration of multiple genetic constructs at distinct target loci in the yeast genome. This gene-targeting strategy uses a modular plasmid with a recyclable selectable marker and a

  4. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments

    Energy Technology Data Exchange (ETDEWEB)

    Cotmore, S.F.; McKie, V.C.; Anderson, L.J.; Astell, C.R.; Tattersall, P.

    1986-11-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights for 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus.

  5. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments

    International Nuclear Information System (INIS)

    Cotmore, S.F.; McKie, V.C.; Anderson, L.J.; Astell, C.R.; Tattersall, P.

    1986-01-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights for 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus

  6. The genome of Aeromonas salmonicida subsp. salmonicida A449: insights into the evolution of a fish pathogen

    Directory of Open Access Journals (Sweden)

    Murphy Colleen

    2008-09-01

    Full Text Available Abstract Background Aeromonas salmonicida subsp. salmonicida is a Gram-negative bacterium that is the causative agent of furunculosis, a bacterial septicaemia of salmonid fish. While other species of Aeromonas are opportunistic pathogens or are found in commensal or symbiotic relationships with animal hosts, A. salmonicida subsp. salmonicida causes disease in healthy fish. The genome sequence of A. salmonicida was determined to provide a better understanding of the virulence factors used by this pathogen to infect fish. Results The nucleotide sequences of the A. salmonicida subsp. salmonicida A449 chromosome and two large plasmids are characterized. The chromosome is 4,702,402 bp and encodes 4388 genes, while the two large plasmids are 166,749 and 155,098 bp with 178 and 164 genes, respectively. Notable features are a large inversion in the chromosome and, in one of the large plasmids, the presence of a Tn21 composite transposon containing mercury resistance genes and an In2 integron encoding genes for resistance to streptomycin/spectinomycin, quaternary ammonia compounds, sulphonamides and chloramphenicol. A large number of genes encoding potential virulence factors were identified; however, many appear to be pseudogenes since they contain insertion sequences, frameshifts or in-frame stop codons. A total of 170 pseudogenes and 88 insertion sequences (of ten different types are found in the A. salmonicida genome. Comparison with the A. hydrophila ATCC 7966T genome reveals multiple large inversions in the chromosome as well as an approximately 9% difference in gene content indicating instances of single gene or operon loss or gain. A limited number of the pseudogenes found in A. salmonicida A449 were investigated in other Aeromonas strains and species. While nearly all the pseudogenes tested are present in A. salmonicida subsp. salmonicida strains, only about 25% were found in other A. salmonicida subspecies and none were detected in other

  7. Genome network medicine: innovation to overcome huge challenges in cancer therapy.

    Science.gov (United States)

    Roukos, Dimitrios H

    2014-01-01

    The post-ENCODE era shapes now a new biomedical research direction for understanding transcriptional and signaling networks driving gene expression and core cellular processes such as cell fate, survival, and apoptosis. Over the past half century, the Francis Crick 'central dogma' of single n gene/protein-phenotype (trait/disease) has defined biology, human physiology, disease, diagnostics, and drugs discovery. However, the ENCODE project and several other genomic studies using high-throughput sequencing technologies, computational strategies, and imaging techniques to visualize regulatory networks, provide evidence that transcriptional process and gene expression are regulated by highly complex dynamic molecular and signaling networks. This Focus article describes the linear experimentation-based limitations of diagnostics and therapeutics to cure advanced cancer and the need to move on from reductionist to network-based approaches. With evident a wide genomic heterogeneity, the power and challenges of next-generation sequencing (NGS) technologies to identify a patient's personal mutational landscape for tailoring the best target drugs in the individual patient are discussed. However, the available drugs are not capable of targeting aberrant signaling networks and research on functional transcriptional heterogeneity and functional genome organization is poorly understood. Therefore, the future clinical genome network medicine aiming at overcoming multiple problems in the new fields of regulatory DNA mapping, noncoding RNA, enhancer RNAs, and dynamic complexity of transcriptional circuitry are also discussed expecting in new innovation technology and strong appreciation of clinical data and evidence-based medicine. The problematic and potential solutions in the discovery of next-generation, molecular, and signaling circuitry-based biomarkers and drugs are explored. © 2013 Wiley Periodicals, Inc.

  8. How to kill the honey bee larva: genomic potential and virulence mechanisms of Paenibacillus larvae.

    Directory of Open Access Journals (Sweden)

    Marvin Djukic

    Full Text Available Paenibacillus larvae, a Gram positive bacterial pathogen, causes American Foulbrood (AFB, which is the most serious infectious disease of honey bees. In order to investigate the genomic potential of P. larvae, two strains belonging to two different genotypes were sequenced and used for comparative genome analysis. The complete genome sequence of P. larvae strain DSM 25430 (genotype ERIC II consisted of 4,056,006 bp and harbored 3,928 predicted protein-encoding genes. The draft genome sequence of P. larvae strain DSM 25719 (genotype ERIC I comprised 4,579,589 bp and contained 4,868 protein-encoding genes. Both strains harbored a 9.7 kb plasmid and encoded a large number of virulence-associated proteins such as toxins and collagenases. In addition, genes encoding large multimodular enzymes producing nonribosomally peptides or polyketides were identified. In the genome of strain DSM 25719 seven toxin associated loci were identified and analyzed. Five of them encoded putatively functional toxins. The genome of strain DSM 25430 harbored several toxin loci that showed similarity to corresponding loci in the genome of strain DSM 25719, but were non-functional due to point mutations or disruption by transposases. Although both strains cause AFB, significant differences between the genomes were observed including genome size, number and composition of transposases, insertion elements, predicted phage regions, and strain-specific island-like regions. Transposases, integrases and recombinases are important drivers for genome plasticity. A total of 390 and 273 mobile elements were found in strain DSM 25430 and strain DSM 25719, respectively. Comparative genomics of both strains revealed acquisition of virulence factors by horizontal gene transfer and provided insights into evolution and pathogenicity.

  9. How to kill the honey bee larva: genomic potential and virulence mechanisms of Paenibacillus larvae.

    Science.gov (United States)

    Djukic, Marvin; Brzuszkiewicz, Elzbieta; Fünfhaus, Anne; Voss, Jörn; Gollnow, Kathleen; Poppinga, Lena; Liesegang, Heiko; Garcia-Gonzalez, Eva; Genersch, Elke; Daniel, Rolf

    2014-01-01

    Paenibacillus larvae, a Gram positive bacterial pathogen, causes American Foulbrood (AFB), which is the most serious infectious disease of honey bees. In order to investigate the genomic potential of P. larvae, two strains belonging to two different genotypes were sequenced and used for comparative genome analysis. The complete genome sequence of P. larvae strain DSM 25430 (genotype ERIC II) consisted of 4,056,006 bp and harbored 3,928 predicted protein-encoding genes. The draft genome sequence of P. larvae strain DSM 25719 (genotype ERIC I) comprised 4,579,589 bp and contained 4,868 protein-encoding genes. Both strains harbored a 9.7 kb plasmid and encoded a large number of virulence-associated proteins such as toxins and collagenases. In addition, genes encoding large multimodular enzymes producing nonribosomally peptides or polyketides were identified. In the genome of strain DSM 25719 seven toxin associated loci were identified and analyzed. Five of them encoded putatively functional toxins. The genome of strain DSM 25430 harbored several toxin loci that showed similarity to corresponding loci in the genome of strain DSM 25719, but were non-functional due to point mutations or disruption by transposases. Although both strains cause AFB, significant differences between the genomes were observed including genome size, number and composition of transposases, insertion elements, predicted phage regions, and strain-specific island-like regions. Transposases, integrases and recombinases are important drivers for genome plasticity. A total of 390 and 273 mobile elements were found in strain DSM 25430 and strain DSM 25719, respectively. Comparative genomics of both strains revealed acquisition of virulence factors by horizontal gene transfer and provided insights into evolution and pathogenicity.

  10. Genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia

    Directory of Open Access Journals (Sweden)

    Anastasiia Kovaliova

    2017-03-01

    Full Text Available Here we report the draft genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia. The draft genome has a size of 4.9 Mb and encodes multiple K+-transporters and proton-consuming decarboxylases. The phylogenetic analysis based on concatenated ribosomal proteins revealed that strain DV clusters together with the acid-tolerant Desulfovibrio sp. TomC and Desulfovibrio magneticus. The draft genome sequence and annotation have been deposited at GenBank under the accession number MLBG00000000.

  11. Bacillus halodurans Strain C125 Encodes and Synthesizes Enzymes from Both Known Pathways To Form dUMP Directly from Cytosine Deoxyribonucleotides

    DEFF Research Database (Denmark)

    Oehlenschlæger, Christian Berg; Løvgreen, Monika Nøhr; Reinauer, Eva

    2015-01-01

    Analysis of the genome of Bacillus halodurans strain C125 indicated that two pathways leading from a cytosine deoxyribonucleotide to dUMP, used for dTMP synthesis, were encoded by the genome of the bacterium. The genes that were responsible, the comEB gene and the dcdB gene, encoding dCMP deaminase...

  12. Genome sequence of Salinisphaera shabanensis, a gammaproteobacterium from the harsh, variable environment of the brine-seawater interface of the Shaban Deep in the Red Sea.

    KAUST Repository

    Antunes, Andre

    2011-09-01

    We present the genome of Salinisphaera shabanensis, isolated from a brine-seawater interface and representing a new order within the Gammaproteobacteria. Its adaptations to physicochemical and nutrient availability fluctuations include six genes encoding heavy metal-translocating P-type ATPases and multiple genes involved in iron uptake, siderophore production, and poly-β-hydroxybutyrate synthesis.

  13. Genome sequence of Salinisphaera shabanensis, a gammaproteobacterium from the harsh, variable environment of the brine-seawater interface of the Shaban Deep in the Red Sea.

    KAUST Repository

    Antunes, Andre; Alam, Intikhab; Bajic, Vladimir B.; Stingl, Ulrich

    2011-01-01

    We present the genome of Salinisphaera shabanensis, isolated from a brine-seawater interface and representing a new order within the Gammaproteobacteria. Its adaptations to physicochemical and nutrient availability fluctuations include six genes encoding heavy metal-translocating P-type ATPases and multiple genes involved in iron uptake, siderophore production, and poly-β-hydroxybutyrate synthesis.

  14. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  15. Evolution in quantum leaps: multiple combinatorial transfers of HPI and other genetic modules in Enterobacteriaceae.

    Directory of Open Access Journals (Sweden)

    Armand Paauw

    Full Text Available Horizontal gene transfer is a key step in the evolution of Enterobacteriaceae. By acquiring virulence determinants of foreign origin, commensals can evolve into pathogens. In Enterobacteriaceae, horizontal transfer of these virulence determinants is largely dependent on transfer by plasmids, phages, genomic islands (GIs and genomic modules (GMs. The High Pathogenicity Island (HPI is a GI encoding virulence genes that can be transferred between different Enterobacteriaceae. We investigated the HPI because it was present in an Enterobacter hormaechei outbreak strain (EHOS. Genome sequence analysis showed that the EHOS contained an integration site for mobile elements and harbored two GIs and three putative GMs, including a new variant of the HPI (HPI-ICEEh1. We demonstrate, for the first time, that combinatorial transfers of GIs and GMs between Enterobacter cloacae complex isolates must have occurred. Furthermore, the excision and circularization of several combinations of the GIs and GMs was demonstrated. Because of its flexibility, the multiple integration site of mobile DNA can be considered an integration hotspot (IHS that increases the genomic plasticity of the bacterium. Multiple combinatorial transfers of diverse combinations of the HPI and other genomic elements among Enterobacteriaceae may accelerate the generation of new pathogenic strains.

  16. Genome Sequence of the Thermophilic Cyanobacterium Thermosynechococcus sp. Strain NK55a.

    Energy Technology Data Exchange (ETDEWEB)

    Stolyar, Sergey; Liu, Zhenfeng; Thiel, Vera; Tomsho, Lynn P.; Pinel, Nicolas; Nelson, William C.; Lindemann, Stephen R.; Romine, Margaret F.; Haruta, Shin; Schuster, Stephan C.; Bryant, Donald A.; Fredrickson, Jim K.

    2014-01-02

    The genome of the unicellular cyanobacterium, Thermosynechococcus sp. strain NK55a, isolated from Nakabusa hot spring, comprises a single, circular, 2.5-Mb chromosome. The genome is predicted to encode 2358 protein coding genes, including genes for all typical cyanobacterial photosynthetic and metabolic functions. No genes encoding hydrogenases or nitrogenase were identified.

  17. Experimental Induction of Genome Chaos.

    Science.gov (United States)

    Ye, Christine J; Liu, Guo; Heng, Henry H

    2018-01-01

    Genome chaos, or karyotype chaos, represents a powerful survival strategy for somatic cells under high levels of stress/selection. Since the genome context, not the gene content, encodes the genomic blueprint of the cell, stress-induced rapid and massive reorganization of genome topology functions as a very important mechanism for genome (karyotype) evolution. In recent years, the phenomenon of genome chaos has been confirmed by various sequencing efforts, and many different terms have been coined to describe different subtypes of the chaotic genome including "chromothripsis," "chromoplexy," and "structural mutations." To advance this exciting field, we need an effective experimental system to induce and characterize the karyotype reorganization process. In this chapter, an experimental protocol to induce chaotic genomes is described, following a brief discussion of the mechanism and implication of genome chaos in cancer evolution.

  18. Suppression of cotton leaf curl disease symptoms in Gossypium hirsutum through over expression of host-encoded miRNAs.

    Science.gov (United States)

    Akmal, Mohd; Baig, Mirza S; Khan, Jawaid A

    2017-12-10

    Cotton leaf curl disease (CLCuD), a major factor resulting in the enormous yield losses in cotton crop, is caused by a distinct monopartite begomovirus in association with Cotton leaf curl Multan betasatellite (CLCuMB). Micro(mi)RNAs are known to regulate gene expression in eukaryotes, including antiviral defense in plants. In a previous study, we had computationally identified a set of cotton miRNAs, which were shown to have potential targets in the genomes of Cotton leaf curl Multan virus (CLCuMuV) and CLCuMB at multiple loci. In the current study, effect of Gossypium arboreum-encoded miRNAs on the genome of CLCuMuV and CLCuMB was investigated in planta. Two computationally predicted cotton-encoded miRNAs (miR398 and miR2950) that showed potential to bind multiple Open Reading Frames (ORFs; C1, C4, V1, and non- coding intergenic region) of CLCuMuV, and (βC1) of CLCuMB were selected. Functional validation of miR398 and miR2950 was done by overexpression approach in G. hirsutum var. HS6. A total of ten in vitro cotton plants were generated from independent events and subjected to biological and molecular analyses. Presence of the respective Precursor (pre)-miRNA was confirmed through PCR and Southern blotting, and their expression level was assessed by semi quantitative RT-PCR, Real Time quantitative PCR and northern hybridization in the PCR-positive lines. Southern hybridization revealed 2-4 copy integration of T-DNA in the genome of the transformed lines. Remarkably, expression of pre-miRNAs was shown up to 5.8-fold higher in the transgenic (T 0 ) lines as revealed by Real Time PCR. The virus resistance was monitored following inoculation of the transgenic cotton lines with viruliferous whitefly (Bemisia tabaci) insect vector. After inoculation, four of the transgenic lines remained apparently symptom free. While a very low titre of viral DNA could be detected by Rolling circle amplification, betasatellite responsible for symptom induction could not be detected

  19. RESEARCH ARTICLE Ne2 encodes protein(s) and the altered ...

    Indian Academy of Sciences (India)

    friendly method for increasing sustainable global food security. .... This qualitative difference suggests that Ne2 could encode one or two or three of ... things, common wheat must have at least three types of chloroplast-genomes (A, D, and B).

  20. Quantitative genome re-sequencing defines multiple mutations conferring chloroquine resistance in rodent malaria

    Science.gov (United States)

    2012-01-01

    Background Drug resistance in the malaria parasite Plasmodium falciparum severely compromises the treatment and control of malaria. A knowledge of the critical mutations conferring resistance to particular drugs is important in understanding modes of drug action and mechanisms of resistances. They are required to design better therapies and limit drug resistance. A mutation in the gene (pfcrt) encoding a membrane transporter has been identified as a principal determinant of chloroquine resistance in P. falciparum, but we lack a full account of higher level chloroquine resistance. Furthermore, the determinants of resistance in the other major human malaria parasite, P. vivax, are not known. To address these questions, we investigated the genetic basis of chloroquine resistance in an isogenic lineage of rodent malaria parasite P. chabaudi in which high level resistance to chloroquine has been progressively selected under laboratory conditions. Results Loci containing the critical genes were mapped by Linkage Group Selection, using a genetic cross between the high-level chloroquine-resistant mutant and a genetically distinct sensitive strain. A novel high-resolution quantitative whole-genome re-sequencing approach was used to reveal three regions of selection on chr11, chr03 and chr02 that appear progressively at increasing drug doses on three chromosomes. Whole-genome sequencing of the chloroquine-resistant parent identified just four point mutations in different genes on these chromosomes. Three mutations are located at the foci of the selection valleys and are therefore predicted to confer different levels of chloroquine resistance. The critical mutation conferring the first level of chloroquine resistance is found in aat1, a putative aminoacid transporter. Conclusions Quantitative trait loci conferring selectable phenotypes, such as drug resistance, can be mapped directly using progressive genome-wide linkage group selection. Quantitative genome-wide short

  1. Partial replicas of uv-irradiated bacteriophage T4 genomes and their role in multiplicity reactivation

    International Nuclear Information System (INIS)

    Rayssiguier, C.; Kozinski, A.W.; Doermann, A.H.

    1980-01-01

    A physicochemical study was made of the replication and transmission of uv-irradiated T4 genomes. The data presented in this paper justify the following conclusions. (i) For both low and high multiplicity of infection there was abundant replication from uv-irradiated parental templates. It exceeded by far the efficiency predicted by the hypothesis that a single lethal hit completely prevents replication of the killed phage DNA: i.e., some dead phage particles must replicate parts of their DNA. (ii) Replication of the uv-irradiated DNA was repetitive as shown by density reversal experiments. (iii) Newly synthesized progeny DNA originating from uv-irradiated templates appeared as significantly shorter segments of the genomes than progeny DNA produced from non-uv-irradiated templates. A good correlation existed between the number of uv hits and the number of random cuts that would be needed to reduce replication fragments to the length observed. (iv) The contribution of uv-irradiated parental DNA among progeny phage in multiplicity reactivation was disposed in shorter subunits than was the DNA from unirradiated parental phage. It is important to emphasize that it was mainly in the form of replicative hybrid. These conclusions appear to justify excluding interparental recombination as a prerequisite for multiplicity reactivation. They lead directly to some form of partial replica hypothesis for multiplicity reactivation

  2. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

    Science.gov (United States)

    Belyi, Vladimir A; Levine, Arnold J; Skalka, Anna Marie

    2010-07-29

    Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected), later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important biological

  3. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

    Directory of Open Access Journals (Sweden)

    Vladimir A Belyi

    2010-07-01

    Full Text Available Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected, later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important

  4. The genome of Pelobacter carbinolicus reveals surprising metabolic capabilities and physiological features

    Energy Technology Data Exchange (ETDEWEB)

    Aklujkar, Muktak [University of Massachusetts, Amherst; Haveman, Shelley [University of Massachusetts, Amherst; DiDonatoJr, Raymond [University of Massachusetts, Amherst; Chertkov, Olga [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Land, Miriam L [ORNL; Brown, Peter [University of Massachusetts, Amherst; Lovley, Derek [University of Massachusetts, Amherst

    2012-01-01

    Background: The bacterium Pelobacter carbinolicus is able to grow by fermentation, syntrophic hydrogen/formate transfer, or electron transfer to sulfur from short-chain alcohols, hydrogen or formate; it does not oxidize acetate and is not known to ferment any sugars or grow autotrophically. The genome of P. carbinolicus was sequenced in order to understand its metabolic capabilities and physiological features in comparison with its relatives, acetate-oxidizing Geobacter species. Results: Pathways were predicted for catabolism of known substrates: 2,3-butanediol, acetoin, glycerol, 1,2-ethanediol, ethanolamine, choline and ethanol. Multiple isozymes of 2,3-butanediol dehydrogenase, ATP synthase and [FeFe]-hydrogenase were differentiated and assigned roles according to their structural properties and genomic contexts. The absence of asparagine synthetase and the presence of a mutant tRNA for asparagine encoded among RNA-active enzymes suggest that P. carbinolicus may make asparaginyl-tRNA in a novel way. Catabolic glutamate dehydrogenases were discovered, implying that the tricarboxylic acid (TCA) cycle can function catabolically. A phosphotransferase system for uptake of sugars was discovered, along with enzymes that function in 2,3-butanediol production. Pyruvate: ferredoxin/flavodoxin oxidoreductase was identified as a potential bottleneck in both the supply of oxaloacetate for oxidation of acetate by the TCA cycle and the connection of glycolysis to production of ethanol. The P. carbinolicus genome was found to encode autotransporters and various appendages, including three proteins with similarity to the geopilin of electroconductive nanowires. Conclusions: Several surprising metabolic capabilities and physiological features were predicted from the genome of P. carbinolicus, suggesting that it is more versatile than anticipated.

  5. Systematic Dissection of Sequence Elements Controlling σ70 Promoters Using a Genomically-Encoded Multiplexed Reporter Assay in E. coli.

    Science.gov (United States)

    Urtecho, Guillaume; Tripp, Arielle D; Insigne, Kimberly; Kim, Hwangbeom; Kosuri, Sriram

    2018-02-01

    Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In E. coli , decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange (RMCE) system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10,898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our dataset using a simple log-linear statistical model. Neural network models can explain greater than 95% of the variance in our dataset, and show the increased power is due to nonlinear interactions of other elements such as the spacer, background, and UP elements.

  6. Genome-wide identification and characterization of stress-associated protein (SAP gene family encoding A20/AN1 zinc-finger proteins in Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Zhou Yong

    2018-01-01

    Full Text Available Stress associated proteins (SAPs play important roles in developmental processes, responses to various stresses and hormone stimulation in plants. However, little is known about the SAP gene family in Medicago truncatula. In this study, a total of 17 MtSAP genes encoding A20/AN1 zinc-finger proteins were characterized. Out of these 17 genes, 15 were distributed over all 8 chromosomes at different densities, and two segmental duplication events were detected. The phylogenetic analysis of these proteins and their orthologs from Arabidopsis and rice suggested that they could be classified into five out of the seven groups of SAP family genes, with genes in the same group showing similar structures and conserved domains. The cis-elements of the MtSAP promoters were studied, and many cis-elements related to stress and plant hormone responses were identified. We also investigated the stress-responsive expression patterns of the MtSAP genes under various stresses, including drought, exposure to NaCl and cold. The qRT-PCR results showed that numerous MtSAP genes exhibited transcriptional responses to multiple abiotic stresses. These results lay the foundation for further functional characterization of SAP genes. To the best of our knowledge, this is the first report of a genome-wide analysis of the SAP gene family in M. truncatula.

  7. Multiple models for Rosaceae genomics.

    Science.gov (United States)

    Shulaev, Vladimir; Korban, Schuyler S; Sosinski, Bryon; Abbott, Albert G; Aldwinckle, Herb S; Folta, Kevin M; Iezzoni, Amy; Main, Dorrie; Arús, Pere; Dandekar, Abhaya M; Lewers, Kim; Brown, Susan K; Davis, Thomas M; Gardiner, Susan E; Potter, Daniel; Veilleux, Richard E

    2008-07-01

    The plant family Rosaceae consists of over 100 genera and 3,000 species that include many important fruit, nut, ornamental, and wood crops. Members of this family provide high-value nutritional foods and contribute desirable aesthetic and industrial products. Most rosaceous crops have been enhanced by human intervention through sexual hybridization, asexual propagation, and genetic improvement since ancient times, 4,000 to 5,000 B.C. Modern breeding programs have contributed to the selection and release of numerous cultivars having significant economic impact on the U.S. and world markets. In recent years, the Rosaceae community, both in the United States and internationally, has benefited from newfound organization and collaboration that have hastened progress in developing genetic and genomic resources for representative crops such as apple (Malus spp.), peach (Prunus spp.), and strawberry (Fragaria spp.). These resources, including expressed sequence tags, bacterial artificial chromosome libraries, physical and genetic maps, and molecular markers, combined with genetic transformation protocols and bioinformatics tools, have rendered various rosaceous crops highly amenable to comparative and functional genomics studies. This report serves as a synopsis of the resources and initiatives of the Rosaceae community, recent developments in Rosaceae genomics, and plans to apply newly accumulated knowledge and resources toward breeding and crop improvement.

  8. Unprecedented loss of ammonia assimilation capability in a urease-encoding bacterial mutualist

    Directory of Open Access Journals (Sweden)

    Wernegreen Jennifer J

    2010-12-01

    Full Text Available Abstract Background Blochmannia are obligately intracellular bacterial mutualists of ants of the tribe Camponotini. Blochmannia perform key nutritional functions for the host, including synthesis of several essential amino acids. We used Illumina technology to sequence the genome of Blochmannia associated with Camponotus vafer. Results Although Blochmannia vafer retains many nutritional functions, it is missing glutamine synthetase (glnA, a component of the nitrogen recycling pathway encoded by the previously sequenced B. floridanus and B. pennsylvanicus. With the exception of Ureaplasma, B. vafer is the only sequenced bacterium to date that encodes urease but lacks the ability to assimilate ammonia into glutamine or glutamate. Loss of glnA occurred in a deletion hotspot near the putative replication origin. Overall, compared to the likely gene set of their common ancestor, 31 genes are missing or eroded in B. vafer, compared to 28 in B. floridanus and four in B. pennsylvanicus. Three genes (queA, visC and yggS show convergent loss or erosion, suggesting relaxed selection for their functions. Eight B. vafer genes contain frameshifts in homopolymeric tracts that may be corrected by transcriptional slippage. Two of these encode DNA replication proteins: dnaX, which we infer is also frameshifted in B. floridanus, and dnaG. Conclusions Comparing the B. vafer genome with B. pennsylvanicus and B. floridanus refines the core genes shared within the mutualist group, thereby clarifying functions required across ant host species. This third genome also allows us to track gene loss and erosion in a phylogenetic context to more fully understand processes of genome reduction.

  9. Genome of the opportunistic pathogen Streptococcus sanguinis.

    Science.gov (United States)

    Xu, Ping; Alves, Joao M; Kitten, Todd; Brown, Arunsri; Chen, Zhenming; Ozaki, Luiz S; Manque, Patricio; Ge, Xiuchun; Serrano, Myrna G; Puiu, Daniela; Hendricks, Stephanie; Wang, Yingping; Chaplin, Michael D; Akan, Doruk; Paik, Sehmi; Peterson, Darrell L; Macrina, Francis L; Buck, Gregory A

    2007-04-01

    The genome of Streptococcus sanguinis is a circular DNA molecule consisting of 2,388,435 bp and is 177 to 590 kb larger than the other 21 streptococcal genomes that have been sequenced. The G+C content of the S. sanguinis genome is 43.4%, which is considerably higher than the G+C contents of other streptococci. The genome encodes 2,274 predicted proteins, 61 tRNAs, and four rRNA operons. A 70-kb region encoding pathways for vitamin B(12) biosynthesis and degradation of ethanolamine and propanediol was apparently acquired by horizontal gene transfer. The gene complement suggests new hypotheses for the pathogenesis and virulence of S. sanguinis and differs from the gene complements of other pathogenic and nonpathogenic streptococci. In particular, S. sanguinis possesses a remarkable abundance of putative surface proteins, which may permit it to be a primary colonizer of the oral cavity and agent of streptococcal endocarditis and infection in neutropenic patients.

  10. Characterization of large-insert DNA libraries from soil for environmental genomic studies of Archaea

    DEFF Research Database (Denmark)

    Treusch, Alexander H; Kletzin, Arnulf; Raddatz, Guenter

    2004-01-01

    Complex genomic libraries are increasingly being used to retrieve complete genes, operons or large genomic fragments directly from environmental samples, without the need to cultivate the respective microorganisms. We report on the construction of three large-insert fosmid libraries in total...... (approximately 1% each) have been captured in our libraries. The diversity of putative protein-encoding genes, as reflected by their distribution into different COG clusters, was comparable to that encoded in complete genomes of cultivated microorganisms. A huge variety of genomic fragments has been captured...

  11. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. A vaccine encoding conserved promiscuous HIV CD4 epitopes induces broad T cell responses in mice transgenic to multiple common HLA class II molecules.

    Directory of Open Access Journals (Sweden)

    Susan Pereira Ribeiro

    Full Text Available Current HIV vaccine approaches are focused on immunogens encoding whole HIV antigenic proteins that mainly elicit cytotoxic CD8+ responses. Mounting evidence points toward a critical role for CD4+ T cells in the control of immunodeficiency virus replication, probably due to cognate help. Vaccine-induced CD4+ T cell responses might, therefore, have a protective effect in HIV replication. In addition, successful vaccines may have to elicit responses to multiple epitopes in a high proportion of vaccinees, to match the highly variable circulating strains of HIV. Using rational vaccine design, we developed a DNA vaccine encoding 18 algorithm-selected conserved, "promiscuous" (multiple HLA-DR-binding B-subtype HIV CD4 epitopes - previously found to be frequently recognized by HIV-infected patients. We assessed the ability of the vaccine to induce broad T cell responses in the context of multiple HLA class II molecules using different strains of HLA class II- transgenic mice (-DR2, -DR4, -DQ6 and -DQ8. Mice displayed CD4+ and CD8+ T cell responses of significant breadth and magnitude, and 16 out of the 18 encoded epitopes were recognized. By virtue of inducing broad responses against conserved CD4+ T cell epitopes that can be recognized in the context of widely diverse, common HLA class II alleles, this vaccine concept may cope both with HIV genetic variability and increased population coverage. The vaccine may thus be a source of cognate help for HIV-specific CD8+ T cells elicited by conventional immunogens, in a wide proportion of vaccinees.

  13. A Bayesian method and its variational approximation for prediction of genomic breeding values in multiple traits

    Directory of Open Access Journals (Sweden)

    Hayashi Takeshi

    2013-01-01

    Full Text Available Abstract Background Genomic selection is an effective tool for animal and plant breeding, allowing effective individual selection without phenotypic records through the prediction of genomic breeding value (GBV. To date, genomic selection has focused on a single trait. However, actual breeding often targets multiple correlated traits, and, therefore, joint analysis taking into consideration the correlation between traits, which might result in more accurate GBV prediction than analyzing each trait separately, is suitable for multi-trait genomic selection. This would require an extension of the prediction model for single-trait GBV to multi-trait case. As the computational burden of multi-trait analysis is even higher than that of single-trait analysis, an effective computational method for constructing a multi-trait prediction model is also needed. Results We described a Bayesian regression model incorporating variable selection for jointly predicting GBVs of multiple traits and devised both an MCMC iteration and variational approximation for Bayesian estimation of parameters in this multi-trait model. The proposed Bayesian procedures with MCMC iteration and variational approximation were referred to as MCBayes and varBayes, respectively. Using simulated datasets of SNP genotypes and phenotypes for three traits with high and low heritabilities, we compared the accuracy in predicting GBVs between multi-trait and single-trait analyses as well as between MCBayes and varBayes. The results showed that, compared to single-trait analysis, multi-trait analysis enabled much more accurate GBV prediction for low-heritability traits correlated with high-heritability traits, by utilizing the correlation structure between traits, while the prediction accuracy for uncorrelated low-heritability traits was comparable or less with multi-trait analysis in comparison with single-trait analysis depending on the setting for prior probability that a SNP has zero

  14. Performance analysis of spectral-phase-encoded optical code-division multiple-access system regarding the incorrectly decoded signal as a nonstationary random process

    Science.gov (United States)

    Yan, Meng; Yao, Minyu; Zhang, Hongming

    2005-11-01

    The performance of a spectral-phase-encoded (SPE) optical code-division multiple-access (OCDMA) system is analyzed. Regarding the incorrectly decoded signal (IDS) as a nonstationary random process, we derive a novel probability distribution for it. The probability distribution of the IDS is considered a chi-squared distribution with degrees of freedom r=1, which is more reasonable and accurate than in previous work. The bit error rate (BER) of an SPE OCDMA system under multiple-access interference is evaluated. Numerical results show that the system can sustain very low BER even when there are multiple simultaneous users, and as the code length becomes longer or the initial pulse becomes shorter, the system performs better.

  15. A guild of 45 CRISPR-associated (Cas protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes.

    Directory of Open Access Journals (Sweden)

    Daniel H Haft

    2005-11-01

    Full Text Available Clustered regularly interspaced short palindromic repeats (CRISPRs are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21-37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer "immunity" against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated.

  16. Horizontal antimicrobial resistance transfer drives epidemics of multiple Shigella species.

    Science.gov (United States)

    Baker, Kate S; Dallman, Timothy J; Field, Nigel; Childs, Tristan; Mitchell, Holly; Day, Martin; Weill, François-Xavier; Lefèvre, Sophie; Tourdjman, Mathieu; Hughes, Gwenda; Jenkins, Claire; Thomson, Nicholas

    2018-04-13

    Horizontal gene transfer has played a role in developing the global public health crisis of antimicrobial resistance (AMR). However, the dynamics of AMR transfer through bacterial populations and its direct impact on human disease is poorly elucidated. Here, we study parallel epidemic emergences of multiple Shigella species, a priority AMR organism, in men who have sex with men to gain insight into AMR emergence and spread. Using genomic epidemiology, we show that repeated horizontal transfer of a single AMR plasmid among Shigella enhanced existing and facilitated new epidemics. These epidemic patterns contrasted with slighter, slower increases in disease caused by organisms with vertically inherited (chromosomally encoded) AMR. This demonstrates that horizontal transfer of AMR directly affects epidemiological outcomes of globally important AMR pathogens and highlights the need for integration of genomic analyses into all areas of AMR research, surveillance and management.

  17. Rumen microbial genomics

    International Nuclear Information System (INIS)

    Morrison, M.; Nelson, K.E.

    2005-01-01

    Improving microbial degradation of plant cell wall polysaccharides remains one of the highest priority goals for all livestock enterprises, including the cattle herds and draught animals of developing countries. The North American Consortium for Genomics of Fibrolytic Ruminal Bacteria was created to promote the sequencing and comparative analysis of rumen microbial genomes, offering the potential to fully assess the genetic potential in a functional and comparative fashion. It has been found that the Fibrobacter succinogenes genome encodes many more endoglucanases and cellodextrinases than previously isolated, and several new processive endoglucanases have been identified by genome and proteomic analysis of Ruminococcus albus, in addition to a variety of strategies for its adhesion to fibre. The ramifications of acquiring genome sequence data for rumen microorganisms are profound, including the potential to elucidate and overcome the biochemical, ecological or physiological processes that are rate limiting for ruminal fibre degradation. (author)

  18. Complete Mitochondrial Genome of the Medicinal Mushroom Ganoderma lucidum

    Science.gov (United States)

    Chen, Haimei; Chen, Xiangdong; Lan, Jin; Liu, Chang

    2013-01-01

    Ganoderma lucidum is one of the well-known medicinal basidiomycetes worldwide. The mitochondrion, referred to as the second genome, is an organelle found in most eukaryotic cells and participates in critical cellular functions. Elucidating the structure and function of this genome is important to understand completely the genetic contents of G. lucidum. In this study, we assembled the mitochondrial genome of G. lucidum and analyzed the differential expressions of its encoded genes across three developmental stages. The mitochondrial genome is a typical circular DNA molecule of 60,630 bp with a GC content of 26.67%. Genome annotation identified genes that encode 15 conserved proteins, 27 tRNAs, small and large rRNAs, four homing endonucleases, and two hypothetical proteins. Except for genes encoding trnW and two hypothetical proteins, all genes were located on the positive strand. For the repeat structure analysis, eight forward, two inverted, and three tandem repeats were detected. A pair of fragments with a total length around 5.5 kb was found in both the nuclear and mitochondrial genomes, which suggests the possible transfer of DNA sequences between two genomes. RNA-Seq data for samples derived from three stages, namely, mycelia, primordia, and fruiting bodies, were mapped to the mitochondrial genome and qualified. The protein-coding genes were expressed higher in mycelia or primordial stages compared with those in the fruiting bodies. The rRNA abundances were significantly higher in all three stages. Two regions were transcribed but did not contain any identified protein or tRNA genes. Furthermore, three RNA-editing sites were detected. Genome synteny analysis showed that significant genome rearrangements occurred in the mitochondrial genomes. This study provides valuable information on the gene contents of the mitochondrial genome and their differential expressions at various developmental stages of G. lucidum. The results contribute to the understanding of the

  19. mEBT: multiple-matching Evidence-based Translator of Murine Genomic Responses for Human Immunity Studies.

    Science.gov (United States)

    Tae, Donghyun; Seok, Junhee

    2018-05-29

    In this paper, we introduce multiple-matching Evidence-based Translator (mEBT) to discover genomic responses from murine expression data for human immune studies, which are significant in the given condition of mice and likely have similar responses in the corresponding condition of human. mEBT is evaluated over multiple data sets and shows improved inter-species agreement. mEBT is expected to be useful for research groups who use murine models to study human immunity. http://cdal.korea.ac.kr/mebt/. jseok14@korea.ac.kr. Supplementary data are available at Bioinformatics online.

  20. Online unsupervised formation of cell assemblies for the encoding of multiple cognitive maps.

    Science.gov (United States)

    Salihoglu, Utku; Bersini, Hugues; Yamaguchi, Yoko; Molter, Colin

    2009-01-01

    Since their introduction sixty years ago, cell assemblies have proved to be a powerful paradigm for brain information processing. After their introduction in artificial intelligence, cell assemblies became commonly used in computational neuroscience as a neural substrate for content addressable memories. However, the mechanisms underlying their formation are poorly understood and, so far, there is no biologically plausible algorithms which can explain how external stimuli can be online stored in cell assemblies. We addressed this question in a previous paper [Salihoglu, U., Bersini, H., Yamaguchi, Y., Molter, C., (2009). A model for the cognitive map formation: Application of the retroaxonal theory. In Proc. IEEE international joint conference on neural networks], were, based on biologically plausible mechanisms, a novel unsupervised algorithm for online cell assemblies' creation was developed. The procedure involved simultaneously, a fast Hebbian/anti-Hebbian learning of the network's recurrent connections for the creation of new cell assemblies, and a slower feedback signal which stabilized the cell assemblies by learning the feedforward input connections. Here, we first quantify the role played by the retroaxonal feedback mechanism. Then, we show how multiple cognitive maps, composed by a set of orthogonal input stimuli, can be encoded in the network. As a result, when facing a previously learned input, the system is able to retrieve the cognitive map it belongs to. As a consequence, ambiguous inputs which could belong to multiple cognitive maps can be disambiguated by the knowledge of the context, i.e. the cognitive map.

  1. Comparative genomics of toxigenic and non-toxigenic Staphylococcus hyicus

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Pamp, Sünje Johanna; Andresen, Lars Ole

    2016-01-01

    The most common causative agent of exudative epidermitis (EE) in pigs is Staphylococcus hyicus. S. hyicus can be grouped into toxigenic and non-toxigenic strains based on their ability to cause EE in pigs and specific virulence genes have been identified. A genome wide comparison between non......-toxigenic and toxigenic strains has never been performed. In this study, we sequenced eleven toxigenic and six non-toxigenic S. hyicus strains and performed comparative genomic and phylogenetic analysis. Our analyses revealed two genomic regions encoding genes that were predominantly found in toxigenic strains...... (polymorphic toxin) and was associated with the gene encoding ExhA. A clear differentiation between toxigenic and non-toxigenic strains based on genomic and phylogenetic analyses was not apparent. The results of this study support the observation that exfoliative toxins of S. hyicus and S. aureus are located...

  2. Multiple Genome Sequences of Lactobacillus plantarum Strains

    OpenAIRE

    Kafka, Thomas A.; Geissler, Andreas J.; Vogel, Rudi F.

    2017-01-01

    ABSTRACT We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses.

  3. Genome-Wide Identification and Expression Analysis of WRKY Transcription Factors under Multiple Stresses in Brassica napus.

    Science.gov (United States)

    He, Yajun; Mao, Shaoshuai; Gao, Yulong; Zhu, Liying; Wu, Daoming; Cui, Yixin; Li, Jiana; Qian, Wei

    2016-01-01

    WRKY transcription factors play important roles in responses to environmental stress stimuli. Using a genome-wide domain analysis, we identified 287 WRKY genes with 343 WRKY domains in the sequenced genome of Brassica napus, 139 in the A sub-genome and 148 in the C sub-genome. These genes were classified into eight groups based on phylogenetic analysis. In the 343 WRKY domains, a total of 26 members showed divergence in the WRKY domain, and 21 belonged to group I. This finding suggested that WRKY genes in group I are more active and variable compared with genes in other groups. Using genome-wide identification and analysis of the WRKY gene family in Brassica napus, we observed genome duplication, chromosomal/segmental duplications and tandem duplication. All of these duplications contributed to the expansion of the WRKY gene family. The duplicate segments that were detected indicated that genome duplication events occurred in the two diploid progenitors B. rapa and B. olearecea before they combined to form B. napus. Analysis of the public microarray database and EST database for B. napus indicated that 74 WRKY genes were induced or preferentially expressed under stress conditions. According to the public QTL data, we identified 77 WRKY genes in 31 QTL regions related to various stress tolerance. We further evaluated the expression of 26 BnaWRKY genes under multiple stresses by qRT-PCR. Most of the genes were induced by low temperature, salinity and drought stress, indicating that the WRKYs play important roles in B. napus stress responses. Further, three BnaWRKY genes were strongly responsive to the three multiple stresses simultaneously, which suggests that these 3 WRKY may have multi-functional roles in stress tolerance and can potentially be used in breeding new rapeseed cultivars. We also found six tandem repeat pairs exhibiting similar expression profiles under the various stress conditions, and three pairs were mapped in the stress related QTL regions

  4. Genome-Wide Identification and Expression Analysis of WRKY Transcription Factors under Multiple Stresses in Brassica napus.

    Directory of Open Access Journals (Sweden)

    Yajun He

    Full Text Available WRKY transcription factors play important roles in responses to environmental stress stimuli. Using a genome-wide domain analysis, we identified 287 WRKY genes with 343 WRKY domains in the sequenced genome of Brassica napus, 139 in the A sub-genome and 148 in the C sub-genome. These genes were classified into eight groups based on phylogenetic analysis. In the 343 WRKY domains, a total of 26 members showed divergence in the WRKY domain, and 21 belonged to group I. This finding suggested that WRKY genes in group I are more active and variable compared with genes in other groups. Using genome-wide identification and analysis of the WRKY gene family in Brassica napus, we observed genome duplication, chromosomal/segmental duplications and tandem duplication. All of these duplications contributed to the expansion of the WRKY gene family. The duplicate segments that were detected indicated that genome duplication events occurred in the two diploid progenitors B. rapa and B. olearecea before they combined to form B. napus. Analysis of the public microarray database and EST database for B. napus indicated that 74 WRKY genes were induced or preferentially expressed under stress conditions. According to the public QTL data, we identified 77 WRKY genes in 31 QTL regions related to various stress tolerance. We further evaluated the expression of 26 BnaWRKY genes under multiple stresses by qRT-PCR. Most of the genes were induced by low temperature, salinity and drought stress, indicating that the WRKYs play important roles in B. napus stress responses. Further, three BnaWRKY genes were strongly responsive to the three multiple stresses simultaneously, which suggests that these 3 WRKY may have multi-functional roles in stress tolerance and can potentially be used in breeding new rapeseed cultivars. We also found six tandem repeat pairs exhibiting similar expression profiles under the various stress conditions, and three pairs were mapped in the stress related

  5. G-InforBIO: integrated system for microbial genomics

    Directory of Open Access Journals (Sweden)

    Abe Takashi

    2006-08-01

    Full Text Available Abstract Background Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. Results The G-InforBIO system is a novel tool for genome data management and sequence analysis. The system can import genome data encoded as eXtensible Markup Language documents as formatted text documents, including annotations and sequences, from DNA Data Bank of Japan and GenBank encoded as flat files. The genome database is constructed automatically after importing, and the database can be exported as documents formatted with eXtensible Markup Language or tab-deliminated text. Users can retrieve data from the database by keyword searches, edit annotation data of genes, and process data with G-InforBIO. In addition, information in the G-InforBIO database can be analyzed seamlessly with nine different software programs, including programs for clustering and homology analyses. Conclusion The G-InforBIO system simplifies genome analyses by integrating several available software programs to allow efficient handling and manipulation of genome data. G-InforBIO is freely available from the download site.

  6. Ion torrent personal genome machine sequencing for genomic typing of Neisseria meningitidis for rapid determination of multiple layers of typing information.

    Science.gov (United States)

    Vogel, Ulrich; Szczepanowski, Rafael; Claus, Heike; Jünemann, Sebastian; Prior, Karola; Harmsen, Dag

    2012-06-01

    Neisseria meningitidis causes invasive meningococcal disease in infants, toddlers, and adolescents worldwide. DNA sequence-based typing, including multilocus sequence typing, analysis of genetic determinants of antibiotic resistance, and sequence typing of vaccine antigens, has become the standard for molecular epidemiology of the organism. However, PCR of multiple targets and consecutive Sanger sequencing provide logistic constraints to reference laboratories. Taking advantage of the recent development of benchtop next-generation sequencers (NGSs) and of BIGSdb, a database accommodating and analyzing genome sequence data, we therefore explored the feasibility and accuracy of Ion Torrent Personal Genome Machine (PGM) sequencing for genomic typing of meningococci. Three strains from a previous meningococcus serogroup B community outbreak were selected to compare conventional typing results with data generated by semiconductor chip-based sequencing. In addition, sequencing of the meningococcal type strain MC58 provided information about the general performance of the technology. The PGM technology generated sequence information for all target genes addressed. The results were 100% concordant with conventional typing results, with no further editing being necessary. In addition, the amount of typing information, i.e., nucleotides and target genes analyzed, could be substantially increased by the combined use of genome sequencing and BIGSdb compared to conventional methods. In the near future, affordable and fast benchtop NGS machines like the PGM might enable reference laboratories to switch to genomic typing on a routine basis. This will reduce workloads and rapidly provide information for laboratory surveillance, outbreak investigation, assessment of vaccine preventability, and antibiotic resistance gene monitoring.

  7. Multiple roles of genome-attached bacteriophage terminal proteins

    International Nuclear Information System (INIS)

    Redrejo-Rodríguez, Modesto; Salas, Margarita

    2014-01-01

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer

  8. Multiple roles of genome-attached bacteriophage terminal proteins

    Energy Technology Data Exchange (ETDEWEB)

    Redrejo-Rodríguez, Modesto; Salas, Margarita, E-mail: msalas@cbm.csic.es

    2014-11-15

    Protein-primed replication constitutes a generalized mechanism to initiate DNA or RNA synthesis in linear genomes, including viruses, gram-positive bacteria, linear plasmids and mobile elements. By this mechanism a specific amino acid primes replication and becomes covalently linked to the genome ends. Despite the fact that TPs lack sequence homology, they share a similar structural arrangement, with the priming residue in the C-terminal half of the protein and an accumulation of positively charged residues at the N-terminal end. In addition, various bacteriophage TPs have been shown to have DNA-binding capacity that targets TPs and their attached genomes to the host nucleoid. Furthermore, a number of bacteriophage TPs from different viral families and with diverse hosts also contain putative nuclear localization signals and localize in the eukaryotic nucleus, which could lead to the transport of the attached DNA. This suggests a possible role of bacteriophage TPs in prokaryote-to-eukaryote horizontal gene transfer. - Highlights: • Protein-primed genome replication constitutes a strategy to initiate DNA or RNA synthesis in linear genomes. • Bacteriophage terminal proteins (TPs) are covalently attached to viral genomes by their primary function priming DNA replication. • TPs are also DNA-binding proteins and target phage genomes to the host nucleoid. • TPs can also localize in the eukaryotic nucleus and may have a role in phage-mediated interkingdom gene transfer.

  9. The complete genome sequence of Bacillus velezensis strain GH1-13 reveals agriculturally beneficial properties and a unique plasmid.

    Science.gov (United States)

    Kim, Sang Yoon; Song, Hajin; Sang, Mee Kyung; Weon, Hang-Yeon; Song, Jaekyeong

    2017-10-10

    The bacterial strain Bacillus velezensis GH1-13, isolated from rice paddy soil in Korea, has been shown to promote plant growth and have strong antagonistic activities against pathogens. Here, we report the complete genome sequence of GH1-13, revealing that it possesses a single 4,071,980-bp circular chromosome with 46.2% GC-content. The chromosome encodes 3,930 genes, and we have also identified a unique plasmid in the strain that encodes a further 104 genes (71,628bp and 31.7% GC-content). The genome was found to contain various enzyme-encoding operons, including indole-3-acetic acid (IAA) biosynthesis proteins, 2,3-butanediol dehydrogenase, various non-ribosomal peptide synthetases, and several polyketide synthases. These properties are responsible for the promotion of plant growth and the biosynthesis of secondary metabolites. They therefore have multiple beneficial effects that could be applied to agriculture. Through curing, we found that the unique plasmid of GH1-13 has important roles in the production of phytohormones, such as IAA, and in shaping phenotypic and physiological characteristics. The plasmid therefore likely influences the biological activities of GH1-13. The complete genome sequence of B. velezensis GH1-13 contributes to our understanding of this beneficial strain and will encourage research into its development for agricultural or biotechnological applications, enhancing productivity and crop quality. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Bioinformatics decoding the genome

    CERN Multimedia

    CERN. Geneva; Deutsch, Sam; Michielin, Olivier; Thomas, Arthur; Descombes, Patrick

    2006-01-01

    Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological ...

  11. Genomic diversity of Lactobacillus salivarius

    OpenAIRE

    Raftis, Emma J.

    2015-01-01

    Lactobacillus salivarius is unusual among the lactobacilli due to its multireplicon genome architecture. The circular megaplasmids harboured by L. salivarius strains encode strain-specific traits for intestinal survival and probiotic activity. L. salivarius strains are increasingly being exploited for their probiotic properties in humans and animals. In terms of probiotic strain selection, it is important to have an understanding of the level of genomic diversity present in this species. Comp...

  12. Staphylococcal SCCmec elements encode an active MCM-like helicase and thus may be replicative

    Energy Technology Data Exchange (ETDEWEB)

    Mir-Sanchis, Ignacio; Roman, Christina A.; Misiura, Agnieszka; Pigli, Ying Z.; Boyle-Vavra, Susan; Rice , Phoebe A. (UC)

    2016-08-29

    Methicillin-resistant Staphylococcus aureus (MRSA) is a public-health threat worldwide. Although the mobile genomic island responsible for this phenotype, staphylococcal cassette chromosome (SCC), has been thought to be nonreplicative, we predicted DNA-replication-related functions for some of the conserved proteins encoded by SCC. We show that one of these, Cch, is homologous to the self-loading initiator helicases of an unrelated family of genomic islands, that it is an active 3'-to-5' helicase and that the adjacent ORF encodes a single-stranded DNA–binding protein. Our 2.9-Å crystal structure of intact Cch shows that it forms a hexameric ring. Cch, like the archaeal and eukaryotic MCM-family replicative helicases, belongs to the pre–sensor II insert clade of AAA+ ATPases. Additionally, we found that SCC elements are part of a broader family of mobile elements, all of which encode a replication initiator upstream of their recombinases. Replication after excision would enhance the efficiency of horizontal gene transfer.

  13. The genomic structure of the human UFO receptor.

    Science.gov (United States)

    Schulz, A S; Schleithoff, L; Faust, M; Bartram, C R; Janssen, J W

    1993-02-01

    Using a DNA transfection-tumorigenicity assay we have recently identified the UFO oncogene. It encodes a tyrosine kinase receptor characterized by the juxtaposition of two immunoglobulin-like and two fibronectin type III repeats in its extracellular domain. Here we describe the genomic organization of the human UFO locus. The UFO receptor is encoded by 20 exons that are distributed over a region of 44 kb. Different isoforms of UFO mRNA are generated by alternative splicing of exon 10 and differential usage of two imperfect polyadenylation sites resulting in the presence or absence of 1.5-kb 3' untranslated sequences. Primer extension and S1 nuclease analyses revealed multiple transcriptional initiation sites including a major site 169 bp upstream of the translation start site. The promoter region is GC rich, lacks TATA and CAAT boxes, but contains potential recognition sites for a variety of trans-acting factors, including Sp1, AP-2 and the cyclic AMP response element-binding protein. Proto-UFO and its oncogenic counterpart exhibit identical cDNA and promoter regions sequences. Possible modes of UFO activation are discussed.

  14. Complete genome sequence of pronghorn virus, a pestivirus

    Science.gov (United States)

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  15. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

    Science.gov (United States)

    Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

    2005-01-01

    The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.

  16. Data Encoding using Periodic Nano-Optical Features

    Science.gov (United States)

    Vosoogh-Grayli, Siamack

    Successful trials have been made through a designed algorithm to quantize, compress and optically encode unsigned 8 bit integer values in the form of images using Nano optical features. The periodicity of the Nano-scale features (Nano-gratings) have been designed and investigated both theoretically and experimentally to create distinct states of variation (three on states and one off state). The use of easy to manufacture and machine readable encoded data in secured authentication media has been employed previously in bar-codes for bi-state (binary) models and in color barcodes for multiple state models. This work has focused on implementing 4 states of variation for unit information through periodic Nano-optical structures that separate an incident wavelength into distinct colors (variation states) in order to create an encoding system. Compared to barcodes and magnetic stripes in secured finite length storage media the proposed system encodes and stores more data. The benefits of multiple states of variation in an encoding unit are 1) increased numerically representable range 2) increased storage density and 3) decreased number of typical set elements for any ergodic or semi-ergodic source that emits these encoding units. A thorough investigation has targeted the effects of the use of multi-varied state Nano-optical features on data storage density and consequent data transmission rates. The results show that use of Nano-optical features for encoding data yields a data storage density of circa 800 Kbits/in2 via the implementation of commercially available high resolution flatbed scanner systems for readout. Such storage density is far greater than commercial finite length secured storage media such as Barcode family with maximum practical density of 1kbits/in2 and highest density magnetic stripe cards with maximum density circa 3 Kbits/in2. The numerically representable range of the proposed encoding unit for 4 states of variation is [0 255]. The number of

  17. Identification of the gene encoding the 65-kilodalton DNA-binding protein of herpes simplex virus type 1

    International Nuclear Information System (INIS)

    Parris, D.S.; Cross, A.; Orr, A.; Frame, M.C.; Murphy, M.; McGeoch, D.J.; Marsden, H.S.; Haarr, L.

    1988-01-01

    Hybrid arrest of in vitro translation was used to localize the region of the herpes simplex virus type 1 genome encoding the 65-kilodalton DNA-binding protein (65K DBP ) to between genome coordinates 0.592 and 0.649. Knowledge of the DNA sequence of this region allowed us to identify three open reading frames as likely candidates for the gene encoding 65K DBP . Two independent approaches were used to determine which of these three open reading frames encoded the protein. For the first approach a monoclonal antibody, MAb 6898, which reacted specifically with 65K DBP , was isolated. This antibody was used, with the techniques of hybrid arrest of in vitro translation and in vitro translation of selected mRNA, to identify the gene encoding 65K DBP . The second approach involved preparation of antisera directed against oligopeptides corresponding to regions of the predicted amino acid sequence of this gene. These antisera reacted specifically with 65K DBP , thus confirming the gene assignment

  18. Comparative genome-based identification of a cell wall-anchored protein from Lactobacillus plantarum increases adhesion of Lactococcus lactis to human epithelial cells.

    Science.gov (United States)

    Zhang, Bo; Zuo, Fanglei; Yu, Rui; Zeng, Zhu; Ma, Huiqin; Chen, Shangwu

    2015-09-15

    Adhesion to host cells is considered important for Lactobacillus plantarum as well as other lactic acid bacteria (LAB) to persist in human gut and thus exert probiotic effects. Here, we sequenced the genome of Lt. plantarum strain NL42 originating from a traditional Chinese dairy product, performed comparative genomic analysis and characterized a novel adhesion factor. The genome of NL42 was highly divergent from its closest neighbors, especially in six large genomic regions. NL42 harbors a total of 42 genes encoding adhesion-associated proteins; among them, cwaA encodes a protein containing multiple domains, including five cell wall surface anchor repeat domains and an LPxTG-like cell wall anchor motif. Expression of cwaA in Lactococcus lactis significantly increased its autoaggregation and hydrophobicity, and conferred the new ability to adhere to human colonic epithelial HT-29 cells by targeting cellular surface proteins, and not carbohydrate moieties, for CwaA adhesion. In addition, the recombinant Lc. lactis inhibited adhesion of Staphylococcus aureus and Escherichia coli to HT-29 cells, mainly by exclusion. We conclude that CwaA is a novel adhesion factor in Lt. plantarum and a potential candidate for improving the adhesion ability of probiotics or other bacteria of interest.

  19. Interdependence of bacterial cell division and genome segregation and its potential in drug development.

    Science.gov (United States)

    Misra, Hari S; Maurya, Ganesh K; Chaudhary, Reema; Misra, Chitra S

    2018-03-01

    Cell division and genome segregation are mutually interdependent processes, which are tightly linked with bacterial multiplication. Mechanisms underlying cell division and the cellular machinery involved are largely conserved across bacteria. Segregation of genome elements on the other hand, follows different pathways depending upon its type and the functional components encoded on these elements. Small molecules, that are known to inhibit cell division and/or resolution of intertwined circular chromosome and maintenace of DNA topology have earlier been tested as antibacterial agents. The utility of such drugs in controlling bacterial infections has witnessed only partial success, possibly due to functional redundancy associated with targeted components. However, in due course, literature has grown with newer information. This review has brought forth some recent findings on bacterial cell division with special emphasis on crosstalk between cell division and genome segregation that could be explored as novel targets in drug development. Copyright © 2018 Elsevier GmbH. All rights reserved.

  20. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

    Directory of Open Access Journals (Sweden)

    Adam Alexander Thil Smith

    2012-05-01

    Full Text Available Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes, a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short. The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

  1. Bacillus subtilis genome diversity.

    Science.gov (United States)

    Earl, Ashlee M; Losick, Richard; Kolter, Roberto

    2007-02-01

    Microarray-based comparative genomic hybridization (M-CGH) is a powerful method for rapidly identifying regions of genome diversity among closely related organisms. We used M-CGH to examine the genome diversity of 17 strains belonging to the nonpathogenic species Bacillus subtilis. Our M-CGH results indicate that there is considerable genetic heterogeneity among members of this species; nearly one-third of Bsu168-specific genes exhibited variability, as measured by the microarray hybridization intensities. The variable loci include those encoding proteins involved in antibiotic production, cell wall synthesis, sporulation, and germination. The diversity in these genes may reflect this organism's ability to survive in diverse natural settings.

  2. Comparative genomics and stx phage characterization of LEE-negative Shiga toxin-producing Escherichia coli

    Directory of Open Access Journals (Sweden)

    Susan Renee Steyert

    2012-11-01

    Full Text Available Infection by Escherichia coli and Shigella species are among the leading causes of death due to diarrheal disease in the world. Shiga toxin producing Escherichia coli (STEC that do not encode the locus of enterocyte effacement (LEE-negative STEC often possess Shiga toxin gene variants and have been isolated from humans and a variety of animal sources. In this study, we compare the genomes of nine LEE-negative STEC harboring various stx alleles with four complete reference LEE-positive STEC isolates. Compared to a representative collection of prototype E. coli and Shigella isolates representing each of the pathotypes, the whole genome phylogeny demonstrated that these isolates are diverse. Whole genome comparative analysis of the 13 genomes revealed that in addition to the absence of the LEE pathogenicity island, phage encoded genes including non-LEE encoded effectors, were absent from all nine LEE-negative STEC genomes. Several plasmid-encoded virulence factors reportedly identified in LEE-negative STEC isolates were identified in only a subset of the nine LEE-negative isolates further confirming the diversity of this group. In combination with whole genome analysis, we characterized the lambdoid phages harboring the various stx alleles and determined their genomic insertion sites. Although the integrase gene sequence corresponded with genomic location, it was not correlated with stx variant, further highlighting the mosaic nature of these phages. The transcription of these phages in different genomic backgrounds was examined. Expression of the Shiga toxin genes, stx1 and/or stx2, as well as the Q genes, were examined with quantitative reverse transcriptase polymerase chain reaction (qRT-PCR assays. A wide range of basal and induced toxin induction was observed. Overall, this is a first significant foray into the genome space of this unexplored group of emerging and divergent pathogens.

  3. Identification of human microRNA-like sequences embedded within the protein-encoding genes of the human immunodeficiency virus.

    Directory of Open Access Journals (Sweden)

    Bryan Holland

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are highly conserved, short (18-22 nts, non-coding RNA molecules that regulate gene expression by binding to the 3' untranslated regions (3'UTRs of mRNAs. While numerous cellular microRNAs have been associated with the progression of various diseases including cancer, miRNAs associated with retroviruses have not been well characterized. Herein we report identification of microRNA-like sequences in coding regions of several HIV-1 genomes. RESULTS: Based on our earlier proteomics and bioinformatics studies, we have identified 8 cellular miRNAs that are predicted to bind to the mRNAs of multiple proteins that are dysregulated during HIV-infection of CD4+ T-cells in vitro. In silico analysis of the full length and mature sequences of these 8 miRNAs and comparisons with all the genomic and subgenomic sequences of HIV-1 strains in global databases revealed that the first 18/18 sequences of the mature hsa-miR-195 sequence (including the short seed sequence, matched perfectly (100%, or with one nucleotide mismatch, within the envelope (env genes of five HIV-1 genomes from Africa. In addition, we have identified 4 other miRNA-like sequences (hsa-miR-30d, hsa-miR-30e, hsa-miR-374a and hsa-miR-424 within the env and the gag-pol encoding regions of several HIV-1 strains, albeit with reduced homology. Mapping of the miRNA-homologues of env within HIV-1 genomes localized these sequence to the functionally significant variable regions of the env glycoprotein gp120 designated V1, V2, V4 and V5. CONCLUSIONS: We conclude that microRNA-like sequences are embedded within the protein-encoding regions of several HIV-1 genomes. Given that the V1 to V5 regions of HIV-1 envelopes contain specific, well-characterized domains that are critical for immune responses, virus neutralization and disease progression, we propose that the newly discovered miRNA-like sequences within the HIV-1 genomes may have evolved to self-regulate survival of the

  4. The Complete Genome and Phenome of a Community-Acquired Acinetobacter baumannii

    Science.gov (United States)

    Farrugia, Daniel N.; Elbourne, Liam D. H.; Hassan, Karl A.; Eijkelkamp, Bart A.; Tetu, Sasha G.; Brown, Melissa H.; Shah, Bhumika S.; Peleg, Anton Y.; Mabbutt, Bridget C.; Paulsen, Ian T.

    2013-01-01

    Many sequenced strains of Acinetobacter baumannii are established nosocomial pathogens capable of resistance to multiple antimicrobials. Community-acquired A. baumannii in contrast, comprise a minor proportion of all A. baumannii infections and are highly susceptible to antimicrobial treatment. However, these infections also present acute clinical manifestations associated with high reported rates of mortality. We report the complete 3.70 Mbp genome of A. baumannii D1279779, previously isolated from the bacteraemic infection of an Indigenous Australian; this strain represents the first community-acquired A. baumannii to be sequenced. Comparative analysis of currently published A. baumannii genomes identified twenty-four accessory gene clusters present in D1279779. These accessory elements were predicted to encode a range of functions including polysaccharide biosynthesis, type I DNA restriction-modification, and the metabolism of novel carbonaceous and nitrogenous compounds. Conversely, twenty genomic regions present in previously sequenced A. baumannii strains were absent in D1279779, including gene clusters involved in the catabolism of 4-hydroxybenzoate and glucarate, and the A. baumannii antibiotic resistance island, known to bestow resistance to multiple antimicrobials in nosocomial strains. Phenomic analysis utilising the Biolog Phenotype Microarray system indicated that A. baumannii D1279779 can utilise a broader range of carbon and nitrogen sources than international clone I and clone II nosocomial isolates. However, D1279779 was more sensitive to antimicrobial compounds, particularly beta-lactams, tetracyclines and sulphonamides. The combined genomic and phenomic analyses have provided insight into the features distinguishing A. baumannii isolated from community-acquired and nosocomial infections. PMID:23527001

  5. A universal genomic coordinate translator for comparative genomics.

    Science.gov (United States)

    Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

    2014-06-30

    Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across

  6. The Genome of the Basidiomycetous Yeast and Human Pathogen Cryptococcus neoformans

    Science.gov (United States)

    Loftus, Brendan J.; Fung, Eula; Roncaglia, Paola; Rowley, Don; Amedeo, Paolo; Bruno, Dan; Vamathevan, Jessica; Miranda, Molly; Anderson, Iain J.; Fraser, James A.; Allen, Jonathan E.; Bosdet, Ian E.; Brent, Michael R.; Chiu, Readman; Doering, Tamara L.; Donlin, Maureen J.; D’Souza, Cletus A.; Fox, Deborah S.; Grinberg, Viktoriya; Fu, Jianmin; Fukushima, Marilyn; Haas, Brian J.; Huang, James C.; Janbon, Guilhem; Jones, Steven J. M.; Koo, Hean L.; Krzywinski, Martin I.; Kwon-Chung, June K.; Lengeler, Klaus B.; Maiti, Rama; Marra, Marco A.; Marra, Robert E.; Mathewson, Carrie A.; Mitchell, Thomas G.; Pertea, Mihaela; Riggs, Florenta R.; Salzberg, Steven L.; Schein, Jacqueline E.; Shvartsbeyn, Alla; Shin, Heesun; Shumway, Martin; Specht, Charles A.; Suh, Bernard B.; Tenney, Aaron; Utterback, Terry R.; Wickes, Brian L.; Wortman, Jennifer R.; Wye, Natasja H.; Kronstad, James W.; Lodge, Jennifer K.; Heitman, Joseph; Davis, Ronald W.; Fraser, Claire M.; Hyman, Richard W.

    2012-01-01

    Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its ~20-megabase genome, which contains ~6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes. PMID:15653466

  7. Complete Genome Sequence of Staphylococcus epidermidis 1457.

    Science.gov (United States)

    Galac, Madeline R; Stam, Jason; Maybank, Rosslyn; Hinkle, Mary; Mack, Dietrich; Rohde, Holger; Roth, Amanda L; Fey, Paul D

    2017-06-01

    Staphylococcus epidermidis 1457 is a frequently utilized strain that is amenable to genetic manipulation and has been widely used for biofilm-related research. We report here the whole-genome sequence of this strain, which encodes 2,277 protein-coding genes and 81 RNAs within its 2.4-Mb genome and plasmid. Copyright © 2017 Galac et al.

  8. Genome projects and the functional-genomic era.

    Science.gov (United States)

    Sauer, Sascha; Konthur, Zoltán; Lehrach, Hans

    2005-12-01

    The problems we face today in public health as a result of the -- fortunately -- increasing age of people and the requirements of developing countries create an urgent need for new and innovative approaches in medicine and in agronomics. Genomic and functional genomic approaches have a great potential to at least partially solve these problems in the future. Important progress has been made by procedures to decode genomic information of humans, but also of other key organisms. The basic comprehension of genomic information (and its transfer) should now give us the possibility to pursue the next important step in life science eventually leading to a basic understanding of biological information flow; the elucidation of the function of all genes and correlative products encoded in the genome, as well as the discovery of their interactions in a molecular context and the response to environmental factors. As a result of the sequencing projects, we are now able to ask important questions about sequence variation and can start to comprehensively study the function of expressed genes on different levels such as RNA, protein or the cell in a systematic context including underlying networks. In this article we review and comment on current trends in large-scale systematic biological research. A particular emphasis is put on technology developments that can provide means to accomplish the tasks of future lines of functional genomics.

  9. Comparative Genomic Analysis of Neutrophilic Iron(II Oxidizer Genomes for Candidate Genes in Extracellular Electron Transfer

    Directory of Open Access Journals (Sweden)

    Shaomei He

    2017-08-01

    Full Text Available Extracellular electron transfer (EET is recognized as a key biochemical process in circumneutral pH Fe(II-oxidizing bacteria (FeOB. In this study, we searched for candidate EET genes in 73 neutrophilic FeOB genomes, among which 43 genomes are complete or close-to-complete and the rest have estimated genome completeness ranging from 5 to 91%. These neutrophilic FeOB span members of the microaerophilic, anaerobic phototrophic, and anaerobic nitrate-reducing FeOB groups. We found that many microaerophilic and several anaerobic FeOB possess homologs of Cyc2, an outer membrane cytochrome c originally identified in Acidithiobacillus ferrooxidans. The “porin-cytochrome c complex” (PCC gene clusters homologous to MtoAB/PioAB are present in eight FeOB, accounting for 19% of complete and close-to-complete genomes examined, whereas PCC genes homologous to OmbB-OmaB-OmcB in Geobacter sulfurreducens are absent. Further, we discovered gene clusters that may potentially encode two novel PCC types. First, a cluster (tentatively named “PCC3” encodes a porin, an extracellular and a periplasmic cytochrome c with remarkably large numbers of heme-binding motifs. Second, a cluster (tentatively named “PCC4” encodes a porin and three periplasmic multiheme cytochromes c. A conserved inner membrane protein (IMP encoded in PCC3 and PCC4 gene clusters might be responsible for translocating electrons across the inner membrane. Other bacteria possessing PCC3 and PCC4 are mostly Proteobacteria isolated from environments with a potential niche for Fe(II oxidation. In addition to cytochrome c, multicopper oxidase (MCO genes potentially involved in Fe(II oxidation were also identified. Notably, candidate EET genes were not found in some FeOB, especially the anaerobic ones, probably suggesting EET genes or Fe(II oxidation mechanisms are different from the searched models. Overall, based on current EET models, the search extends our understanding of bacterial EET and

  10. SignalSpider: Probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles

    KAUST Repository

    Wong, Kachun

    2014-09-05

    Motivation: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) measures the genome-wide occupancy of transcription factors in vivo. Different combinations of DNA-binding protein occupancies may result in a gene being expressed in different tissues or at different developmental stages. To fully understand the functions of genes, it is essential to develop probabilistic models on multiple ChIP-Seq profiles to decipher the combinatorial regulatory mechanisms by multiple transcription factors. Results: In this work, we describe a probabilistic model (SignalSpider) to decipher the combinatorial binding events of multiple transcription factors. Comparing with similar existing methods, we found SignalSpider performs better in clustering promoter and enhancer regions. Notably, SignalSpider can learn higher-order combinatorial patterns from multiple ChIP-Seq profiles. We have applied SignalSpider on the normalized ChIP-Seq profiles from the ENCODE consortium and learned model instances. We observed different higher-order enrichment and depletion patterns across sets of proteins. Those clustering patterns are supported by Gene Ontology (GO) enrichment, evolutionary conservation and chromatin interaction enrichment, offering biological insights for further focused studies. We also proposed a specific enrichment map visualization method to reveal the genome-wide transcription factor combinatorial patterns from the models built, which extend our existing fine-scale knowledge on gene regulation to a genome-wide level. Availability and implementation: The matrix-algebra-optimized executables and source codes are available at the authors\\' websites: http://www.cs.toronto.edu/∼wkc/SignalSpider. Contact: Supplementary information: Supplementary data are available at Bioinformatics online.

  11. Wavelength-encoded OCDMA system using opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  12. Wavelength-encoded OCDMA system using opto-VLSI processors

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  13. MicroRNA-encoding long non-coding RNAs

    Directory of Open Access Journals (Sweden)

    Zhu Xiaopeng

    2008-05-01

    Full Text Available Abstract Background Recent analysis of the mouse transcriptional data has revealed the existence of ~34,000 messenger-like non-coding RNAs (ml-ncRNAs. Whereas the functional properties of these ml-ncRNAs are beginning to be unravelled, no functional information is available for the large majority of these transcripts. Results A few ml-ncRNA have been shown to have genomic loci that overlap with microRNA loci, leading us to suspect that a fraction of ml-ncRNA may encode microRNAs. We therefore developed an algorithm (PriMir for specifically detecting potential microRNA-encoding transcripts in the entire set of 34,030 mouse full-length ml-ncRNAs. In combination with mouse-rat sequence conservation, this algorithm detected 97 (80 of them were novel strong miRNA-encoding candidates, and for 52 of these we obtained experimental evidence for the existence of their corresponding mature microRNA by microarray and stem-loop RT-PCR. Sequence analysis of the microRNA-encoding RNAs revealed an internal motif, whose presence correlates strongly (R2 = 0.9, P-value = 2.2 × 10-16 with the occurrence of stem-loops with characteristics of known pre-miRNAs, indicating the presence of a larger number microRNA-encoding RNAs (from 300 up to 800 in the ml-ncRNAs population. Conclusion Our work highlights a unique group of ml-ncRNAs and offers clues to their functions.

  14. Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation

    Science.gov (United States)

    Andlauer, Till F. M.; Buck, Dorothea; Antony, Gisela; Bayas, Antonios; Bechmann, Lukas; Berthele, Achim; Chan, Andrew; Gasperi, Christiane; Gold, Ralf; Graetz, Christiane; Haas, Jürgen; Hecker, Michael; Infante-Duarte, Carmen; Knop, Matthias; Kümpfel, Tania; Limmroth, Volker; Linker, Ralf A.; Loleit, Verena; Luessi, Felix; Meuth, Sven G.; Mühlau, Mark; Nischwitz, Sandra; Paul, Friedemann; Pütz, Michael; Ruck, Tobias; Salmen, Anke; Stangel, Martin; Stellmann, Jan-Patrick; Stürner, Klarissa H.; Tackenberg, Björn; Then Bergh, Florian; Tumani, Hayrettin; Warnke, Clemens; Weber, Frank; Wiendl, Heinz; Wildemann, Brigitte; Zettl, Uwe K.; Ziemann, Ulf; Zipp, Frauke; Arloth, Janine; Weber, Peter; Radivojkov-Blagojevic, Milena; Scheinhardt, Markus O.; Dankowski, Theresa; Bettecken, Thomas; Lichtner, Peter; Czamara, Darina; Carrillo-Roa, Tania; Binder, Elisabeth B.; Berger, Klaus; Bertram, Lars; Franke, Andre; Gieger, Christian; Herms, Stefan; Homuth, Georg; Ising, Marcus; Jöckel, Karl-Heinz; Kacprowski, Tim; Kloiber, Stefan; Laudes, Matthias; Lieb, Wolfgang; Lill, Christina M.; Lucae, Susanne; Meitinger, Thomas; Moebus, Susanne; Müller-Nurasyid, Martina; Nöthen, Markus M.; Petersmann, Astrid; Rawal, Rajesh; Schminke, Ulf; Strauch, Konstantin; Völzke, Henry; Waldenberger, Melanie; Wellmann, Jürgen; Porcu, Eleonora; Mulas, Antonella; Pitzalis, Maristella; Sidore, Carlo; Zara, Ilenia; Cucca, Francesco; Zoledziewska, Magdalena; Ziegler, Andreas; Hemmer, Bernhard; Müller-Myhsok, Bertram

    2016-01-01

    We conducted a genome-wide association study (GWAS) on multiple sclerosis (MS) susceptibility in German cohorts with 4888 cases and 10,395 controls. In addition to associations within the major histocompatibility complex (MHC) region, 15 non-MHC loci reached genome-wide significance. Four of these loci are novel MS susceptibility loci. They map to the genes L3MBTL3, MAZ, ERG, and SHMT1. The lead variant at SHMT1 was replicated in an independent Sardinian cohort. Products of the genes L3MBTL3, MAZ, and ERG play important roles in immune cell regulation. SHMT1 encodes a serine hydroxymethyltransferase catalyzing the transfer of a carbon unit to the folate cycle. This reaction is required for regulation of methylation homeostasis, which is important for establishment and maintenance of epigenetic signatures. Our GWAS approach in a defined population with limited genetic substructure detected associations not found in larger, more heterogeneous cohorts, thus providing new clues regarding MS pathogenesis. PMID:27386562

  15. Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation.

    Science.gov (United States)

    Andlauer, Till F M; Buck, Dorothea; Antony, Gisela; Bayas, Antonios; Bechmann, Lukas; Berthele, Achim; Chan, Andrew; Gasperi, Christiane; Gold, Ralf; Graetz, Christiane; Haas, Jürgen; Hecker, Michael; Infante-Duarte, Carmen; Knop, Matthias; Kümpfel, Tania; Limmroth, Volker; Linker, Ralf A; Loleit, Verena; Luessi, Felix; Meuth, Sven G; Mühlau, Mark; Nischwitz, Sandra; Paul, Friedemann; Pütz, Michael; Ruck, Tobias; Salmen, Anke; Stangel, Martin; Stellmann, Jan-Patrick; Stürner, Klarissa H; Tackenberg, Björn; Then Bergh, Florian; Tumani, Hayrettin; Warnke, Clemens; Weber, Frank; Wiendl, Heinz; Wildemann, Brigitte; Zettl, Uwe K; Ziemann, Ulf; Zipp, Frauke; Arloth, Janine; Weber, Peter; Radivojkov-Blagojevic, Milena; Scheinhardt, Markus O; Dankowski, Theresa; Bettecken, Thomas; Lichtner, Peter; Czamara, Darina; Carrillo-Roa, Tania; Binder, Elisabeth B; Berger, Klaus; Bertram, Lars; Franke, Andre; Gieger, Christian; Herms, Stefan; Homuth, Georg; Ising, Marcus; Jöckel, Karl-Heinz; Kacprowski, Tim; Kloiber, Stefan; Laudes, Matthias; Lieb, Wolfgang; Lill, Christina M; Lucae, Susanne; Meitinger, Thomas; Moebus, Susanne; Müller-Nurasyid, Martina; Nöthen, Markus M; Petersmann, Astrid; Rawal, Rajesh; Schminke, Ulf; Strauch, Konstantin; Völzke, Henry; Waldenberger, Melanie; Wellmann, Jürgen; Porcu, Eleonora; Mulas, Antonella; Pitzalis, Maristella; Sidore, Carlo; Zara, Ilenia; Cucca, Francesco; Zoledziewska, Magdalena; Ziegler, Andreas; Hemmer, Bernhard; Müller-Myhsok, Bertram

    2016-06-01

    We conducted a genome-wide association study (GWAS) on multiple sclerosis (MS) susceptibility in German cohorts with 4888 cases and 10,395 controls. In addition to associations within the major histocompatibility complex (MHC) region, 15 non-MHC loci reached genome-wide significance. Four of these loci are novel MS susceptibility loci. They map to the genes L3MBTL3, MAZ, ERG, and SHMT1. The lead variant at SHMT1 was replicated in an independent Sardinian cohort. Products of the genes L3MBTL3, MAZ, and ERG play important roles in immune cell regulation. SHMT1 encodes a serine hydroxymethyltransferase catalyzing the transfer of a carbon unit to the folate cycle. This reaction is required for regulation of methylation homeostasis, which is important for establishment and maintenance of epigenetic signatures. Our GWAS approach in a defined population with limited genetic substructure detected associations not found in larger, more heterogeneous cohorts, thus providing new clues regarding MS pathogenesis.

  16. Evidence of ancient genome reduction in red algae (Rhodophyta).

    Science.gov (United States)

    Qiu, Huan; Price, Dana C; Yang, Eun Chan; Yoon, Hwan Su; Bhattacharya, Debashish

    2015-08-01

    Red algae (Rhodophyta) comprise a monophyletic eukaryotic lineage of ~6,500 species with a fossil record that extends back 1.2 billion years. A surprising aspect of red algal evolution is that sequenced genomes encode a relatively limited gene inventory (~5-10 thousand genes) when compared with other free-living algae or to other eukaryotes. This suggests that the common ancestor of red algae may have undergone extensive genome reduction, which can result from lineage specialization to a symbiotic or parasitic lifestyle or adaptation to an extreme or oligotrophic environment. We gathered genome and transcriptome data from a total of 14 red algal genera that represent the major branches of this phylum to study genome evolution in Rhodophyta. Analysis of orthologous gene gains and losses identifies two putative major phases of genome reduction: (i) in the stem lineage leading to all red algae resulting in the loss of major functions such as flagellae and basal bodies, the glycosyl-phosphatidylinositol anchor biosynthesis pathway, and the autophagy regulation pathway; and (ii) in the common ancestor of the extremophilic Cyanidiophytina. Red algal genomes are also characterized by the recruitment of hundreds of bacterial genes through horizontal gene transfer that have taken on multiple functions in shared pathways and have replaced eukaryotic gene homologs. Our results suggest that Rhodophyta may trace their origin to a gene depauperate ancestor. Unlike plants, it appears that a limited gene inventory is sufficient to support the diversification of a major eukaryote lineage that possesses sophisticated multicellular reproductive structures and an elaborate triphasic sexual cycle. © 2015 Phycological Society of America.

  17. A "candidate-interactome" aggregate analysis of genome-wide association data in multiple sclerosis

    DEFF Research Database (Denmark)

    Mechelli, Rosella; Umeton, Renato; Policano, Claudia

    2013-01-01

    of genes whose products are known to physically interact with environmental factors that may be relevant for disease pathogenesis) analysis of genome-wide association data in multiple sclerosis. We looked for statistical enrichment of associations among interactomes that, at the current state of knowledge......, may be representative of gene-environment interactions of potential, uncertain or unlikely relevance for multiple sclerosis pathogenesis: Epstein-Barr virus, human immunodeficiency virus, hepatitis B virus, hepatitis C virus, cytomegalovirus, HHV8-Kaposi sarcoma, H1N1-influenza, JC virus, human innate...... immunity interactome for type I interferon, autoimmune regulator, vitamin D receptor, aryl hydrocarbon receptor and a panel of proteins targeted by 70 innate immune-modulating viral open reading frames from 30 viral species. Interactomes were either obtained from the literature or were manually curated...

  18. Dynamic evolution of Geranium mitochondrial genomes through multiple horizontal and intracellular gene transfers.

    Science.gov (United States)

    Park, Seongjun; Grewe, Felix; Zhu, Andan; Ruhlman, Tracey A; Sabir, Jamal; Mower, Jeffrey P; Jansen, Robert K

    2015-10-01

    The exchange of genetic material between cellular organelles through intracellular gene transfer (IGT) or between species by horizontal gene transfer (HGT) has played an important role in plant mitochondrial genome evolution. The mitochondrial genomes of Geraniaceae display a number of unusual phenomena including highly accelerated rates of synonymous substitutions, extensive gene loss and reduction in RNA editing. Mitochondrial DNA sequences assembled for 17 species of Geranium revealed substantial reduction in gene and intron content relative to the ancestor of the Geranium lineage. Comparative analyses of nuclear transcriptome data suggest that a number of these sequences have been functionally relocated to the nucleus via IGT. Evidence for rampant HGT was detected in several Geranium species containing foreign organellar DNA from diverse eudicots, including many transfers from parasitic plants. One lineage has experienced multiple, independent HGT episodes, many of which occurred within the past 5.5 Myr. Both duplicative and recapture HGT were documented in Geranium lineages. The mitochondrial genome of Geranium brycei contains at least four independent HGT tracts that are absent in its nearest relative. Furthermore, G. brycei mitochondria carry two copies of the cox1 gene that differ in intron content, providing insight into contrasting hypotheses on cox1 intron evolution. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  19. Statistical Methods in Integrative Genomics

    Science.gov (United States)

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  20. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2007-11-01

    Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile

  1. Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence

    Science.gov (United States)

    Chen, Lei; Pospíšilová, Petra; Strouhal, Michal; Qin, Xiang; Mikalová, Lenka; Norris, Steven J.; Muzny, Donna M.; Gibbs, Richard A.; Fulton, Lucinda L.; Sodergren, Erica; Weinstock, George M.; Šmajs, David

    2012-01-01

    Background The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. Methodology/Principal Findings To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function. Conclusions/Significance Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics. PMID:22292095

  2. Investigating the Relatedness of Enteroinvasive Escherichia coli to Other E. coli and Shigella Isolates by Using Comparative Genomics.

    Science.gov (United States)

    Hazen, Tracy H; Leonard, Susan R; Lampel, Keith A; Lacher, David W; Maurelli, Anthony T; Rasko, David A

    2016-08-01

    Enteroinvasive Escherichia coli (EIEC) is a unique pathovar that has a pathogenic mechanism nearly indistinguishable from that of Shigella species. In contrast to isolates of the four Shigella species, which are widespread and can be frequent causes of human illness, EIEC causes far fewer reported illnesses each year. In this study, we analyzed the genome sequences of 20 EIEC isolates, including 14 first described in this study. Phylogenomic analysis of the EIEC genomes demonstrated that 17 of the isolates are present in three distinct lineages that contained only EIEC genomes, compared to reference genomes from each of the E. coli pathovars and Shigella species. Comparative genomic analysis identified genes that were unique to each of the three identified EIEC lineages. While many of the EIEC lineage-specific genes have unknown functions, those with predicted functions included a colicin and putative proteins involved in transcriptional regulation or carbohydrate metabolism. In silico detection of the Shigella virulence plasmid (pINV), which is essential for the invasion of host cells, demonstrated that a form of pINV was present in nearly all EIEC genomes, but the Mxi-Spa-Ipa region of the plasmid that encodes the invasion-associated proteins was absent from several of the EIEC isolates. The comparative genomic findings in this study support the hypothesis that multiple EIEC lineages have evolved independently from multiple distinct lineages of E. coli via the acquisition of the Shigella virulence plasmid and, in some cases, the Shigella pathogenicity islands. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  3. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  4. Survey of the rubber tree genome reveals a high number of cysteine protease-encoding genes homologous to Arabidopsis SAG12.

    Science.gov (United States)

    Zou, Zhi; Liu, Jianting; Yang, Lifu; Xie, Guishui

    2017-01-01

    Arabidopsis thaliana SAG12, a senescence-specific gene encoding a cysteine protease, is widely used as a molecular marker for the study of leaf senescence. To date, its potential orthologues have been isolated from several plant species such as Brassica napus and Nicotiana tabacum. However, little information is available in rubber tree (Hevea brasiliensis), a rubber-producing plant of the Euphorbiaceae family. This study presents the identification of SAG12-like genes from the rubber tree genome. Results showed that an unexpected high number of 17 rubber orthologues with a single intron were found, contrasting the single copy with two introns in Arabidopsis. The gene expansion was also observed in another two Euphorbiaceae plants, castor bean (Ricinus communis) and physic nut (Jatropha curcas), both of which contain 8 orthologues. In accordance with no occurrence of recent whole-genome duplication (WGD) events, most duplicates in castor and physic nut were resulted from tandem duplications. In contrast, the duplicated HbSAG12H genes were derived from tandem duplications as well as the recent WGD. Expression analysis showed that most HbSAG12H genes were lowly expressed in examined tissues except for root and male flower. Furthermore, HbSAG12H1 exhibits a strictly senescence-associated expression pattern in rubber tree leaves, and thus can be used as a marker gene for the study of senescence mechanism in Hevea.

  5. The Calyptogena magnifica chemoautotrophic symbiont genome

    Energy Technology Data Exchange (ETDEWEB)

    Newton, I.L.; Woyke, T.; Auchtung, T.A.; Dilly, G.F.; Dutton,R.J.; Fisher, M.C.; Fontanez, K.M.; Lau, E.; Stewart, F.J.; Richardson,P.M.; Barry, K.W.; Saunders, E.; Detter, J.C.; Wu, D.; Eisen, J.A.; Cavanaugh, C.M.

    2007-03-01

    Chemoautotrophic endosymbionts are the metabolic cornerstone of hydrothermal vent communities, providing invertebrate hosts with nearly all of their nutrition. The Calyptogena magnifica (Bivalvia: Vesicomyidae) symbiont, Candidatus Ruthia magnifica, is the first intracellular sulfur-oxidizing endosymbiont to have its genome sequenced, revealing a suite of metabolic capabilities. The genome encodes major chemoautotrophic pathways as well as pathways for biosynthesis of vitamins, cofactors, and all 20 amino acids required by the clam.

  6. Evolution and Morphogenesis of Simulated Modular Robots: A Comparison Between a Direct and Generative Encoding

    DEFF Research Database (Denmark)

    Veenstra, Frank; Faina, Andres; Risi, Sebastian

    2017-01-01

    Modular robots oer an important benet in evolutionary robotics, which is to quickly evaluate evolved morphologies and control systems in reality. However, articial evolution of simulated modular robotics is a dicult and time consuming task requiring signicant computational power. While articial...... evolution in virtual creatures has made use of powerful generative encodings, here we investigate how a generative encoding and direct encoding compare for the evolution of locomotion in modular robots when the number of robotic modules changes. Simulating less modules would decrease the size of the genome...

  7. Multiple Models for Rosaceae Genomics[OA

    Science.gov (United States)

    Shulaev, Vladimir; Korban, Schuyler S.; Sosinski, Bryon; Abbott, Albert G.; Aldwinckle, Herb S.; Folta, Kevin M.; Iezzoni, Amy; Main, Dorrie; Arús, Pere; Dandekar, Abhaya M.; Lewers, Kim; Brown, Susan K.; Davis, Thomas M.; Gardiner, Susan E.; Potter, Daniel; Veilleux, Richard E.

    2008-01-01

    The plant family Rosaceae consists of over 100 genera and 3,000 species that include many important fruit, nut, ornamental, and wood crops. Members of this family provide high-value nutritional foods and contribute desirable aesthetic and industrial products. Most rosaceous crops have been enhanced by human intervention through sexual hybridization, asexual propagation, and genetic improvement since ancient times, 4,000 to 5,000 B.C. Modern breeding programs have contributed to the selection and release of numerous cultivars having significant economic impact on the U.S. and world markets. In recent years, the Rosaceae community, both in the United States and internationally, has benefited from newfound organization and collaboration that have hastened progress in developing genetic and genomic resources for representative crops such as apple (Malus spp.), peach (Prunus spp.), and strawberry (Fragaria spp.). These resources, including expressed sequence tags, bacterial artificial chromosome libraries, physical and genetic maps, and molecular markers, combined with genetic transformation protocols and bioinformatics tools, have rendered various rosaceous crops highly amenable to comparative and functional genomics studies. This report serves as a synopsis of the resources and initiatives of the Rosaceae community, recent developments in Rosaceae genomics, and plans to apply newly accumulated knowledge and resources toward breeding and crop improvement. PMID:18487361

  8. Long- and short-term selective forces on malaria parasite genomes

    DEFF Research Database (Denmark)

    Nygaard, Sanne; Braunstein, Alexander; Malsen, Gareth

    2010-01-01

    Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ~23 Mb genomes encoding ~5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes ha...

  9. The Carcinogenic Liver Fluke, Clonorchis sinensis: New Assembly, Reannotation and Analysis of the Genome and Characterization of Tissue Transcriptomes

    Science.gov (United States)

    Wang, Xiaoyun; Liu, Hailiang; Chen, Yangyi; Guo, Lei; Luo, Fang; Sun, Jiufeng; Mao, Qiang; Liang, Pei; Xie, Zhizhi; Zhou, Chenhui; Tian, Yanli; Lv, Xiaoli; Huang, Lisi; Zhou, Juanjuan; Hu, Yue; Li, Ran; Zhang, Fan; Lei, Huali; Li, Wenfang; Hu, Xuchu; Liang, Chi; Xu, Jin; Li, Xuerong; Yu, Xinbing

    2013-01-01

    Clonorchis sinensis (C. sinensis), an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis). Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis. PMID:23382950

  10. The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes.

    Directory of Open Access Journals (Sweden)

    Yan Huang

    Full Text Available Clonorchis sinensis (C. sinensis, an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis. Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis.

  11. WormBase: Annotating many nematode genomes.

    Science.gov (United States)

    Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

    2012-01-01

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

  12. An evaluation of multiple annealing and looping based genome amplification using a synthetic bacterial community

    KAUST Repository

    Wang, Yong

    2016-02-23

    The low biomass in environmental samples is a major challenge for microbial metagenomic studies. The amplification of a genomic DNA was frequently applied to meeting the minimum requirement of the DNA for a high-throughput next-generation-sequencing technology. Using a synthetic bacterial community, the amplification efficiency of the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) kit that is originally developed to amplify the single-cell genomic DNA of mammalian organisms is examined. The DNA template of 10 pg in each reaction of the MALBAC amplification may generate enough DNA for Illumina sequencing. Using 10 pg and 100 pg templates for each reaction set, the MALBAC kit shows a stable and homogeneous amplification as indicated by the highly consistent coverage of the reads from the two amplified samples on the contigs assembled by the original unamplified sample. Although GenomePlex whole genome amplification kit allows one to generate enough DNA using 100 pg of template in each reaction, the minority of the mixed bacterial species is not linearly amplified. For both of the kits, the GC-rich regions of the genomic DNA are not efficiently amplified as suggested by the low coverage of the contigs with the high GC content. The high efficiency of the MALBAC kit is supported for the amplification of environmental microbial DNA samples, and the concerns on its application are also raised to bacterial species with the high GC content.

  13. Evidence for site-specific occupancy of the mitochondrial genome by nuclear transcription factors.

    Directory of Open Access Journals (Sweden)

    Georgi K Marinov

    Full Text Available Mitochondria contain their own circular genome, with mitochondria-specific transcription and replication systems and corresponding regulatory proteins. All of these proteins are encoded in the nuclear genome and are post-translationally imported into mitochondria. In addition, several nuclear transcription factors have been reported to act in mitochondria, but there has been no comprehensive mapping of their occupancy patterns and it is not clear how many other factors may also be found in mitochondria. Here we address these questions by using ChIP-seq data from the ENCODE, mouseENCODE and modENCODE consortia for 151 human, 31 mouse and 35 C. elegans factors. We identified 8 human and 3 mouse transcription factors with strong localized enrichment over the mitochondrial genome that was usually associated with the corresponding recognition sequence motif. Notably, these sites of occupancy are often the sites with highest ChIP-seq signal intensity within both the nuclear and mitochondrial genomes and are thus best explained as true binding events to mitochondrial DNA, which exist in high copy number in each cell. We corroborated these findings by immunocytochemical staining evidence for mitochondrial localization. However, we were unable to find clear evidence for mitochondrial binding in ENCODE and other publicly available ChIP-seq data for most factors previously reported to localize there. As the first global analysis of nuclear transcription factors binding in mitochondria, this work opens the door to future studies that probe the functional significance of the phenomenon.

  14. Klebsiella pneumoniae asparagine tDNAs are integration hotspots for different genomic islands encoding microcin E492 production determinants and other putative virulence factors present in hypervirulent strains

    Directory of Open Access Journals (Sweden)

    Andrés Esteban Marcoleta

    2016-06-01

    Full Text Available Due to the developing of multi-resistant and invasive hypervirulent strains, Klebsiella pneumoniae has become one of the most urgent bacterial pathogen threats in the last years. Genomic comparison of a growing number of sequenced isolates has allowed the identification of putative virulence factors, proposed to be acquirable mainly through horizontal gene transfer. In particular, those related with synthesizing the antibacterial peptide microcin E492 (MccE492 and salmochelin siderophores were found to be highly prevalent among hypervirulent strains. The determinants for the production of both molecules were first reported as part of a 13-kbp segment of K. pneumoniae RYC492 chromosome, and were cloned and characterized in E. coli. However, the genomic context of this segment in K. pneumoniae remained uncharacterized.In this work we provided experimental and bioinformatics evidence indicating that the MccE492 cluster is part of a highly conserved 23-kbp genomic island (GI named GIE492, that was integrated in a specific asparagine-tRNA gene (asn-tDNA and was found in a high proportion of isolates from liver abscesses sampled around the world. This element resulted to be unstable and its excision frequency increased after treating bacteria with mytomicin C and upon the overexpression of the island-encoded integrase. Besides the MccE492 genetic cluster, it invariably included an integrase-coding gene, at least 7 protein-coding genes of unknown function, and a putative transfer origin that possibly allows this GI to be mobilized through conjugation. In addition, we analyzed the asn-tDNA loci of all the available K. pneumoniae assembled chromosomes to evaluate them as GI-integration sites. Remarkably, 73% of the strains harbored at least one GI integrated in one of the four asn-tDNA present in this species, confirming them as integration hotspots. Each of these tDNAs was occupied with different frequencies, although they were 100% identical. Also, we

  15. Genome Sequence of Azospirillum brasilense CBG497 and Comparative Analyses of Azospirillum Core and Accessory Genomes provide Insight into Niche Adaptation

    Science.gov (United States)

    Wisniewski-Dyé, Florence; Lozano, Luis; Acosta-Cruz, Erika; Borland, Stéphanie; Drogue, Benoît; Prigent-Combaret, Claire; Rouy, Zoé; Barbe, Valérie; Mendoza Herrera, Alberto; González, Victor; Mavingui, Patrick

    2012-01-01

    Bacteria of the genus Azospirillum colonize roots of important cereals and grasses, and promote plant growth by several mechanisms, notably phytohormone synthesis. The genomes of several Azospirillum strains belonging to different species, isolated from various host plants and locations, were recently sequenced and published. In this study, an additional genome of an A. brasilense strain, isolated from maize grown on an alkaline soil in the northeast of Mexico, strain CBG497, was obtained. Comparative genomic analyses were performed on this new genome and three other genomes (A. brasilense Sp245, A. lipoferum 4B and Azospirillum sp. B510). The Azospirillum core genome was established and consists of 2,328 proteins, representing between 30% to 38% of the total encoded proteins within a genome. It is mainly chromosomally-encoded and contains 74% of genes of ancestral origin shared with some aquatic relatives. The non-ancestral part of the core genome is enriched in genes involved in signal transduction, in transport and in metabolism of carbohydrates and amino-acids, and in surface properties features linked to adaptation in fluctuating environments, such as soil and rhizosphere. Many genes involved in colonization of plant roots, plant-growth promotion (such as those involved in phytohormone biosynthesis), and properties involved in rhizosphere adaptation (such as catabolism of phenolic compounds, uptake of iron) are restricted to a particular strain and/or species, strongly suggesting niche-specific adaptation. PMID:24705077

  16. Genome Sequence of Azospirillum brasilense CBG497 and Comparative Analyses of Azospirillum Core and Accessory Genomes provide Insight into Niche Adaptation

    Directory of Open Access Journals (Sweden)

    Victor González

    2012-09-01

    Full Text Available Bacteria of the genus Azospirillum colonize roots of important cereals and grasses, and promote plant growth by several mechanisms, notably phytohormone synthesis. The genomes of several Azospirillum strains belonging to different species, isolated from various host plants and locations, were recently sequenced and published. In this study, an additional genome of an A. brasilense strain, isolated from maize grown on an alkaline soil in the northeast of Mexico, strain CBG497, was obtained. Comparative genomic analyses were performed on this new genome and three other genomes (A. brasilense Sp245, A. lipoferum 4B and Azospirillum sp. B510. The Azospirillum core genome was established and consists of 2,328 proteins, representing between 30% to 38% of the total encoded proteins within a genome. It is mainly chromosomally-encoded and contains 74% of genes of ancestral origin shared with some aquatic relatives. The non-ancestral part of the core genome is enriched in genes involved in signal transduction, in transport and in metabolism of carbohydrates and amino-acids, and in surface properties features linked to adaptation in fluctuating environments, such as soil and rhizosphere. Many genes involved in colonization of plant roots, plant-growth promotion (such as those involved in phytohormone biosynthesis, and properties involved in rhizosphere adaptation (such as catabolism of phenolic compounds, uptake of iron are restricted to a particular strain and/or species, strongly suggesting niche-specific adaptation.

  17. Deep sequencing of foot-and-mouth disease virus reveals RNA sequences involved in genome packaging.

    Science.gov (United States)

    Logan, Grace; Newman, Joseph; Wright, Caroline F; Lasecka-Dykes, Lidia; Haydon, Daniel T; Cottam, Eleanor M; Tuthill, Tobias J

    2017-10-18

    Non-enveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals and human. In the picornavirus family of non-enveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses. Importance In order to transmit their genetic material to a new host, non-enveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many non-enveloped RNA viruses the requirements for this critical part of the viral life cycle remain poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple and transferable to the

  18. Prostate cancer risk locus at 8q24 as a regulatory hub by physical interactions with multiple genomic loci across the genome.

    Science.gov (United States)

    Du, Meijun; Yuan, Tiezheng; Schilter, Kala F; Dittmar, Rachel L; Mackinnon, Alexander; Huang, Xiaoyi; Tschannen, Michael; Worthey, Elizabeth; Jacob, Howard; Xia, Shu; Gao, Jianzhong; Tillmans, Lori; Lu, Yan; Liu, Pengyuan; Thibodeau, Stephen N; Wang, Liang

    2015-01-01

    Chromosome 8q24 locus contains regulatory variants that modulate genetic risk to various cancers including prostate cancer (PC). However, the biological mechanism underlying this regulation is not well understood. Here, we developed a chromosome conformation capture (3C)-based multi-target sequencing technology and systematically examined three PC risk regions at the 8q24 locus and their potential regulatory targets across human genome in six cell lines. We observed frequent physical contacts of this risk locus with multiple genomic regions, in particular, inter-chromosomal interaction with CD96 at 3q13 and intra-chromosomal interaction with MYC at 8q24. We identified at least five interaction hot spots within the predicted functional regulatory elements at the 8q24 risk locus. We also found intra-chromosomal interaction genes PVT1, FAM84B and GSDMC and inter-chromosomal interaction gene CXorf36 in most of the six cell lines. Other gene regions appeared to be cell line-specific, such as RRP12 in LNCaP, USP14 in DU-145 and SMIN3 in lymphoblastoid cell line. We further found that the 8q24 functional domains more likely interacted with genomic regions containing genes enriched in critical pathways such as Wnt signaling and promoter motifs such as E2F1 and TCF3. This result suggests that the risk locus may function as a regulatory hub by physical interactions with multiple genes important for prostate carcinogenesis. Further understanding genetic effect and biological mechanism of these chromatin interactions will shed light on the newly discovered regulatory role of the risk locus in PC etiology and progression. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Draft Genome Sequence of Ezakiella peruensis Strain M6.X2, a Human Gut Gram-Positive Anaerobic Coccus.

    Science.gov (United States)

    Diop, Awa; Diop, Khoudia; Tomei, Enora; Raoult, Didier; Fenollar, Florence; Fournier, Pierre-Edouard

    2018-03-01

    We report here the draft genome sequence of Ezakiella peruensis strain M6.X2 T The draft genome is 1,672,788 bp long and harbors 1,589 predicted protein-encoding genes, including 26 antibiotic resistance genes with 1 gene encoding vancomycin resistance. The genome also exhibits 1 clustered regularly interspaced short palindromic repeat region and 333 genes acquired by horizontal gene transfer. Copyright © 2018 Diop et al.

  20. Evaluation of multiple approaches to identify genome-wide polymorphisms in closely related genotypes of sweet cherry (Prunus avium L.

    Directory of Open Access Journals (Sweden)

    Seanna Hewitt

    Full Text Available Identification of genetic polymorphisms and subsequent development of molecular markers is important for marker assisted breeding of superior cultivars of economically important species. Sweet cherry (Prunus avium L. is an economically important non-climacteric tree fruit crop in the Rosaceae family and has undergone a genetic bottleneck due to breeding, resulting in limited genetic diversity in the germplasm that is utilized for breeding new cultivars. Therefore, it is critical to recognize the best platforms for identifying genome-wide polymorphisms that can help identify, and consequently preserve, the diversity in a genetically constrained species. For the identification of polymorphisms in five closely related genotypes of sweet cherry, a gel-based approach (TRAP, reduced representation sequencing (TRAPseq, a 6k cherry SNParray, and whole genome sequencing (WGS approaches were evaluated in the identification of genome-wide polymorphisms in sweet cherry cultivars. All platforms facilitated detection of polymorphisms among the genotypes with variable efficiency. In assessing multiple SNP detection platforms, this study has demonstrated that a combination of appropriate approaches is necessary for efficient polymorphism identification, especially between closely related cultivars of a species. The information generated in this study provides a valuable resource for future genetic and genomic studies in sweet cherry, and the insights gained from the evaluation of multiple approaches can be utilized for other closely related species with limited genetic diversity in the breeding germplasm. Keywords: Polymorphisms, Prunus avium, Next-generation sequencing, Target region amplification polymorphism (TRAP, Genetic diversity, SNParray, Reduced representation sequencing, Whole genome sequencing (WGS

  1. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    Science.gov (United States)

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  2. Widespread occurrence of organelle genome-encoded 5S rRNAs including permuted molecules.

    Science.gov (United States)

    Valach, Matus; Burger, Gertraud; Gray, Michael W; Lang, B Franz

    2014-12-16

    5S Ribosomal RNA (5S rRNA) is a universal component of ribosomes, and the corresponding gene is easily identified in archaeal, bacterial and nuclear genome sequences. However, organelle gene homologs (rrn5) appear to be absent from most mitochondrial and several chloroplast genomes. Here, we re-examine the distribution of organelle rrn5 by building mitochondrion- and plastid-specific covariance models (CMs) with which we screened organelle genome sequences. We not only recover all organelle rrn5 genes annotated in GenBank records, but also identify more than 50 previously unrecognized homologs in mitochondrial genomes of various stramenopiles, red algae, cryptomonads, malawimonads and apusozoans, and surprisingly, in the apicoplast (highly derived plastid) genomes of the coccidian pathogens Toxoplasma gondii and Eimeria tenella. Comparative modeling of RNA secondary structure reveals that mitochondrial 5S rRNAs from brown algae adopt a permuted triskelion shape that has not been seen elsewhere. Expression of the newly predicted rrn5 genes is confirmed experimentally in 10 instances, based on our own and published RNA-Seq data. This study establishes that particularly mitochondrial 5S rRNA has a much broader taxonomic distribution and a much larger structural variability than previously thought. The newly developed CMs will be made available via the Rfam database and the MFannot organelle genome annotator. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Genome-based insights into the resistome and mobilome of multidrug-resistant Aeromonas sp. ARM81 isolated from wastewater.

    Science.gov (United States)

    Adamczuk, Marcin; Dziewit, Lukasz

    2017-01-01

    The draft genome of multidrug-resistant Aeromonas sp. ARM81 isolated from a wastewater treatment plant in Warsaw (Poland) was obtained. Sequence analysis revealed multiple genes conferring resistance to aminoglycosides, β-lactams or tetracycline. Three different β-lactamase genes were identified, including an extended-spectrum β-lactamase gene bla PER-1 . The antibiotic susceptibility was experimentally tested. Genome sequencing also allowed us to investigate the plasmidome and transposable mobilome of ARM81. Four plasmids, of which two carry phenotypic modules (i.e., genes encoding a zinc transporter ZitB and a putative glucosyltransferase), and 28 putative transposase genes were identified. The mobility of three insertion sequences (isoforms of previously identified elements ISAs12, ISKpn9 and ISAs26) was confirmed using trap plasmids.

  4. An Emerging Tick-Borne Disease of Humans Is Caused by a Subset of Strains with Conserved Genome Structure

    Science.gov (United States)

    Barbet, Anthony F.; Al-Khedery, Basima; Stuen, Snorre; Granquist, Erik G.; Felsheim, Roderick F.; Munderloh, Ulrike G.

    2013-01-01

    The prevalence of tick-borne diseases is increasing worldwide. One such emerging disease is human anaplasmosis. The causative organism, Anaplasma phagocytophilum, is known to infect multiple animal species and cause human fatalities in the U.S., Europe and Asia. Although long known to infect ruminants, it is unclear why there are increasing numbers of human infections. We analyzed the genome sequences of strains infecting humans, animals and ticks from diverse geographic locations. Despite extensive variability amongst these strains, those infecting humans had conserved genome structure including the pfam01617 superfamily that encodes the major, neutralization-sensitive, surface antigen. These data provide potential targets to identify human-infective strains and have significance for understanding the selective pressures that lead to emergence of disease in new species. PMID:25437207

  5. Diversity of 23S rRNA genes within individual prokaryotic genomes.

    Directory of Open Access Journals (Sweden)

    Anna Pei

    Full Text Available BACKGROUND: The concept of ribosomal constraints on rRNA genes is deduced primarily based on the comparison of consensus rRNA sequences between closely related species, but recent advances in whole-genome sequencing allow evaluation of this concept within organisms with multiple rRNA operons. METHODOLOGY/PRINCIPAL FINDINGS: Using the 23S rRNA gene as an example, we analyzed the diversity among individual rRNA genes within a genome. Of 184 prokaryotic species containing multiple 23S rRNA genes, diversity was observed in 113 (61.4% genomes (mean 0.40%, range 0.01%-4.04%. Significant (1.17%-4.04% intragenomic variation was found in 8 species. In 5 of the 8 species, the diversity in the primary structure had only minimal effect on the secondary structure (stem versus loop transition. In the remaining 3 species, the diversity significantly altered local secondary structure, but the alteration appears minimized through complex rearrangement. Intervening sequences (IVS, ranging between 9 and 1471 nt in size, were found in 7 species. IVS in Deinococcus radiodurans and Nostoc sp. encode transposases. T. tengcongensis was the only species in which intragenomic diversity >3% was observed among 4 paralogous 23S rRNA genes. CONCLUSIONS/SIGNIFICANCE: These findings indicate tight ribosomal constraints on individual 23S rRNA genes within a genome. Although classification using primary 23S rRNA sequences could be erroneous, significant diversity among paralogous 23S rRNA genes was observed only once in the 184 species analyzed, indicating little overall impact on the mainstream of 23S rRNA gene-based prokaryotic taxonomy.

  6. Genome wide identification of cotton (Gossypium hirsutum)-encoded microRNA targets against Cotton leaf curl Burewala virus.

    Science.gov (United States)

    Shweta; Akhter, Yusuf; Khan, Jawaid Ahmad

    2018-01-05

    Cotton leaf curl Burewala virus (CLCuBV, genus Begomovirus) causes devastating cotton leaf curl disease. Among various known virus controlling strategies, RNAi-mediated one has shown potential to protect host crop plants. Micro(mi) RNAs, are the endogenous small RNAs and play a key role in plant development and stress resistance. In the present study we have identified cotton (Gossypium hirsutum)-encoded miRNAs targeting the CLCuBV. Based on threshold free energy and maximum complementarity scores of host miRNA-viral mRNA target pairs, a number of potential miRNAs were annotated. Among them, ghr-miR168 was selected as the most potent candidate, capable of targeting several vital genes namely C1, C3, C4, V1 and V2 of CLCuBV genome. In addition, ghr-miR395a and ghr-miR395d were observed to target the overlapping transcripts of C1 and C4 genes. We have verified the efficacy of these miRNA targets against CLCuBV following suppression of RNAi-mediated virus control through translational inhibition or cleavage of viral mRNA. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Draft Genome Sequence of Bacillus velezensis B6, a Rhizobacterium That Can Control Plant Diseases.

    Science.gov (United States)

    Gao, Yu-Han; Guo, Rong-Jun; Li, Shi-Dong

    2018-03-22

    The draft genome of Bacillus velezensis strain B6, a rhizobacterium with good biocontrol performance isolated from soil in China, was sequenced. The assembly comprises 32 scaffolds with a total size of 3.88 Mb. Gene clusters coding either ribosomally encoded bacteriocins or nonribosomally encoded antimicrobial polyketides and lipopeptides in the genome may contribute to plant disease control. Copyright © 2018 Gao et al.

  8. Comparative Genomic Analysis Reveals Ecological Differentiation in the Genus Carnobacterium.

    Science.gov (United States)

    Iskandar, Christelle F; Borges, Frédéric; Taminiau, Bernard; Daube, Georges; Zagorec, Monique; Remenant, Benoît; Leisner, Jørgen J; Hansen, Martin A; Sørensen, Søren J; Mangavel, Cécile; Cailliez-Grimal, Catherine; Revol-Junelles, Anne-Marie

    2017-01-01

    Lactic acid bacteria (LAB) differ in their ability to colonize food and animal-associated habitats: while some species are specialized and colonize a limited number of habitats, other are generalist and are able to colonize multiple animal-linked habitats. In the current study, Carnobacterium was used as a model genus to elucidate the genetic basis of these colonization differences. Analyses of 16S rRNA gene meta-barcoding data showed that C. maltaromaticum followed by C. divergens are the most prevalent species in foods derived from animals (meat, fish, dairy products), and in the gut. According to phylogenetic analyses, these two animal-adapted species belong to one of two deeply branched lineages. The second lineage contains species isolated from habitats where contact with animal is rare. Genome analyses revealed that members of the animal-adapted lineage harbor a larger secretome than members of the other lineage. The predicted cell-surface proteome is highly diversified in C. maltaromaticum and C. divergens with genes involved in adaptation to the animal milieu such as those encoding biopolymer hydrolytic enzymes, a heme uptake system, and biopolymer-binding adhesins. These species also exhibit genes for gut adaptation and respiration. In contrast, Carnobacterium species belonging to the second lineage encode a poorly diversified cell-surface proteome, lack genes for gut adaptation and are unable to respire. These results shed light on the important genomics traits required for adaptation to animal-linked habitats in generalist Carnobacterium .

  9. The Caulobacter crescentus phage phiCbK: genomics of a canonical phage

    Directory of Open Access Journals (Sweden)

    Gill Jason J

    2012-10-01

    Full Text Available Abstract Background The bacterium Caulobacter crescentus is a popular model for the study of cell cycle regulation and senescence. The large prolate siphophage phiCbK has been an important tool in C. crescentus biology, and has been studied in its own right as a model for viral morphogenesis. Although a system of some interest, to date little genomic information is available on phiCbK or its relatives. Results Five novel phiCbK-like C. crescentus bacteriophages, CcrMagneto, CcrSwift, CcrKarma, CcrRogue and CcrColossus, were isolated from the environment. The genomes of phage phiCbK and these five environmental phage isolates were obtained by 454 pyrosequencing. The phiCbK-like phage genomes range in size from 205 kb encoding 318 proteins (phiCbK to 280 kb encoding 448 proteins (CcrColossus, and were found to contain nonpermuted terminal redundancies of 10 to 17 kb. A novel method of terminal ligation was developed to map genomic termini, which confirmed termini predicted by coverage analysis. This suggests that sequence coverage discontinuities may be useable as predictors of genomic termini in phage genomes. Genomic modules encoding virion morphogenesis, lysis and DNA replication proteins were identified. The phiCbK-like phages were also found to encode a number of intriguing proteins; all contain a clearly T7-like DNA polymerase, and five of the six encode a possible homolog of the C. crescentus cell cycle regulator GcrA, which may allow the phage to alter the host cell’s replicative state. The structural proteome of phage phiCbK was determined, identifying the portal, major and minor capsid proteins, the tail tape measure and possible tail fiber proteins. All six phage genomes are clearly related; phiCbK, CcrMagneto, CcrSwift, CcrKarma and CcrRogue form a group related at the DNA level, while CcrColossus is more diverged but retains significant similarity at the protein level. Conclusions Due to their lack of any apparent relationship to

  10. Simple genomes, complex interactions: Epistasis in RNA virus

    Science.gov (United States)

    Elena, Santiago F.; Solé, Ricard V.; Sardanyés, Josep

    2010-06-01

    Owed to their reduced size and low number of proteins encoded, RNA viruses and other subviral pathogens are often considered as being genetically too simple. However, this structural simplicity also creates the necessity for viral RNA sequences to encode for more than one protein and for proteins to carry out multiple functions, all together resulting in complex patterns of genetic interactions. In this work we will first review the experimental studies revealing that the architecture of viral genomes is dominated by antagonistic interactions among loci. Second, we will also review mathematical models and provide a description of computational tools for the study of RNA virus dynamics and evolution. As an application of these tools, we will finish this review article by analyzing a stochastic bit-string model of in silico virus replication. This model analyzes the interplay between epistasis and the mode of replication on determining the population load of deleterious mutations. The model suggests that, for a given mutation rate, the deleterious mutational load is always larger when epistasis is predominantly antagonistic than when synergism is the rule. However, the magnitude of this effect is larger if replication occurs geometrically than if it proceeds linearly.

  11. Quantum Darwinism Requires an Extra-Theoretical Assumption of Encoding Redundancy

    Science.gov (United States)

    Fields, Chris

    2010-10-01

    Observers restricted to the observation of pointer states of apparatus cannot conclusively demonstrate that the pointer of an apparatus mathcal{A} registers the state of a system of interest S without perturbing S. Observers cannot, therefore, conclusively demonstrate that the states of a system S are redundantly encoded by pointer states of multiple independent apparatus without destroying the redundancy of encoding. The redundancy of encoding required by quantum Darwinism must, therefore, be assumed from outside the quantum-mechanical formalism and without the possibility of experimental demonstration.

  12. A gene encoding maize caffeoyl-CoA O-methyltransferase confers quantitative resistance to multiple pathogens.

    Science.gov (United States)

    Yang, Qin; He, Yijian; Kabahuma, Mercy; Chaya, Timothy; Kelly, Amy; Borrego, Eli; Bian, Yang; El Kasmi, Farid; Yang, Li; Teixeira, Paulo; Kolkman, Judith; Nelson, Rebecca; Kolomiets, Michael; L Dangl, Jeffery; Wisser, Randall; Caplan, Jeffrey; Li, Xu; Lauter, Nick; Balint-Kurti, Peter

    2017-09-01

    Alleles that confer multiple disease resistance (MDR) are valuable in crop improvement, although the molecular mechanisms underlying their functions remain largely unknown. A quantitative trait locus, qMdr 9.02 , associated with resistance to three important foliar maize diseases-southern leaf blight, gray leaf spot and northern leaf blight-has been identified on maize chromosome 9. Through fine-mapping, association analysis, expression analysis, insertional mutagenesis and transgenic validation, we demonstrate that ZmCCoAOMT2, which encodes a caffeoyl-CoA O-methyltransferase associated with the phenylpropanoid pathway and lignin production, is the gene within qMdr 9.02 conferring quantitative resistance to both southern leaf blight and gray leaf spot. We suggest that resistance might be caused by allelic variation at the level of both gene expression and amino acid sequence, thus resulting in differences in levels of lignin and other metabolites of the phenylpropanoid pathway and regulation of programmed cell death.

  13. Genomic Organization of Zebrafish microRNAs

    Directory of Open Access Journals (Sweden)

    Paydar Ima

    2008-05-01

    Full Text Available Abstract Background microRNAs (miRNAs are small (~22 nt non-coding RNAs that regulate cell movement, specification, and development. Expression of miRNAs is highly regulated, both spatially and temporally. Based on direct cloning, sequence conservation, and predicted secondary structures, a large number of miRNAs have been identified in higher eukaryotic genomes but whether these RNAs are simply a subset of a much larger number of noncoding RNA families is unknown. This is especially true in zebrafish where genome sequencing and annotation is not yet complete. Results We analyzed the zebrafish genome to identify the number and location of proven and predicted miRNAs resulting in the identification of 35 new miRNAs. We then grouped all 415 zebrafish miRNAs into families based on seed sequence identity as a means to identify possible functional redundancy. Based on genomic location and expression analysis, we also identified those miRNAs that are likely to be encoded as part of polycistronic transcripts. Lastly, as a resource, we compiled existing zebrafish miRNA expression data and, where possible, listed all experimentally proven mRNA targets. Conclusion Current analysis indicates the zebrafish genome encodes 415 miRNAs which can be grouped into 44 families. The largest of these families (the miR-430 family contains 72 members largely clustered in two main locations along chromosome 4. Thus far, most zebrafish miRNAs exhibit tissue specific patterns of expression.

  14. webMGR: an online tool for the multiple genome rearrangement problem.

    Science.gov (United States)

    Lin, Chi Ho; Zhao, Hao; Lowcay, Sean Harry; Shahab, Atif; Bourque, Guillaume

    2010-02-01

    The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results. webMGR can be accessed via http://www.gis.a-star.edu.sg/~bourque. The source code of the improved standalone version of MGR is also freely available from the web site. Supplementary data are available at Bioinformatics online.

  15. An encoding device and a method of encoding

    DEFF Research Database (Denmark)

    2012-01-01

    The present invention relates to an encoding device, such as an optical position encoder, for encoding input from an object, and a method for encoding input from an object, for determining a position of an object that interferes with light of the device. The encoding device comprises a light source...... in the area in the space and may interfere with the light, which interference may be encoded into a position or activation....

  16. Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

    Science.gov (United States)

    Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.

  17. Multiple displacement amplification of whole genomic DNA from urediospores of Puccinia striiformis f. sp. tritici.

    Science.gov (United States)

    Zhang, R; Ma, Z H; Wu, B M

    2015-05-01

    Biotrophic fungi, such as Puccinia striiformis f. sp. tritici, because they cannot be cultured on nutrient media, to obtain adequate quantity of DNA for molecular genetic analysis, are usually propagated on living hosts, wheat plants in case of P. striiformis f. sp. tritici. The propagation process is time-, space- and labor-consuming and has been a bottleneck to molecular genetic analysis of this pathogen. In this study we evaluated multiple displacement amplification (MDA) of pathogen genomic DNA from urediospores as an alternative approach to traditional propagation of urediospores followed by DNA extraction. The quantities of pathogen genomic DNA in the products were further determined via real-time PCR with a pair of primers specific for the β-tubulin gene of P. striiformis f. sp. tritici. The amplified fragment length polymorphism (AFLP) fingerprints were also compared between the DNA products. The results demonstrated that adequate genomic DNA at fragment size larger than 23 Kb could be amplified from 20 to 30 urediospores via MDA method. The real-time PCR results suggested that although fresh urediospores collected from diseased leaves were the best, spores picked from diseased leaves stored for a prolonged period could also be used for amplification. AFLP fingerprints exhibited no significant differences between amplified DNA and DNA extracted with CTAB method, suggesting amplified DNA can represent the pathogen's genomic DNA very well. Therefore, MDA could be used to obtain genomic DNA from small precious samples (dozens of spores) for molecular genetic analysis of wheat stripe rust pathogen, and other fungi that are difficult to propagate.

  18. Discovery of new enzymes and metabolic pathways using structure and genome context

    Science.gov (United States)

    Zhao, Suwen; Kumar, Ritesh; Sakai, Ayano; Vetting, Matthew W.; Wood, B. McKay; Brown, Shoshana; Bonanno, Jeffery B.; Hillerich, Brandan S.; Seidel, Ronald D.; Babbitt, Patricia C.; Almo, Steven C.; Sweedler, Jonathan V.; Gerlt, John A.; Cronan, John E.; Jacobson, Matthew P.

    2014-01-01

    Assigning valid functions to proteins identified in genome projects is challenging, with over-prediction and database annotation errors major concerns1. We, and others2, are developing computation-guided strategies for functional discovery using “metabolite docking” to experimentally derived3 or homology-based4 three-dimensional structures. Bacterial metabolic pathways often are encoded by “genome neighborhoods” (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by “predicting” the intermediates in the glycolytic pathway in E. coli5. Metabolite docking to multiple binding proteins/enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. We report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed i) the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and ii) the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guide functional predictions to enable the discovery of new metabolic pathways. PMID:24056934

  19. The apicoplast genomes of two taxonomic units of Babesia from sheep.

    Science.gov (United States)

    Wang, Tao; Guan, Guiquan; Korhonen, Pasi K; Koehler, Anson V; Hall, Ross S; Young, Neil D; Yin, Hong; Gasser, Robin B

    2017-01-15

    The apicoplast (ap) is a unique, non-photosynthetic organelle found in most apicomplexan parasites. Due to the essential roles that this organelle has, it has been widely considered as target for drugs against diseases caused by apicomplexans. Exploring the ap genomes of such parasites would provide a better understanding of their systematics and their basic molecular biology for therapeutics. However, there is limited information available on the ap genomes of apicomplexan parasites. In the present study, the ap genomes of two operational taxonomic units of Babesia (known as Babesia sp. Lintan [Bl] and Babesia sp. Xinjiang [Bx]) from sheep were sequenced, assembled and annotated using a massive parallel sequencing-based approach. Then, the gene content and gene order in these ap genomes (∼30.7kb in size) were defined and compared, and the genetic differences were assessed. In addition, a phylogenetic analysis of ap genomic data sets was carried out to assess the relationships of these taxonomic units with other apicomplexan parasites for which complete ap genomic data sets were publicly available. The results showed that the ap genomes of Bl and Bx encode 59 and 57 genes, respectively, including 2 ribosomal RNA genes, 25 transfer RNA genes and 30-32 protein-encoding genes, being similar in content to those of Babesia bovis and B. orientalis. Ap gene regions that might serve as markers for future epidemiological and population genetic studies of Babesia species were identified. Using sequence data for a subset of six protein-encoding genes, a close relationship of Bl and Bx with Babesia bovis from cattle and B. orientalis from water buffalo was inferred. Although the focus of the present study was on Babesia, we propose that the present sequencing-bioinformatic approach should be applicable to organellar genomes of a wide range of apicomplexans of veterinary importance. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. REDIdb 3.0: A Comprehensive Collection of RNA Editing Events in Plant Organellar Genomes.

    Science.gov (United States)

    Lo Giudice, Claudio; Pesole, Graziano; Picardi, Ernesto

    2018-01-01

    RNA editing is an important epigenetic mechanism by which genome-encoded transcripts are modified by substitutions, insertions and/or deletions. It was first discovered in kinetoplastid protozoa followed by its reporting in a wide range of organisms. In plants, RNA editing occurs mostly by cytidine (C) to uridine (U) conversion in translated regions of organelle mRNAs and tends to modify affected codons restoring evolutionary conserved aminoacid residues. RNA editing has also been described in non-protein coding regions such as group II introns and structural RNAs. Despite its impact on organellar transcriptome and proteome complexity, current primary databases still do not provide a specific field for RNA editing events. To overcome these limitations, we developed REDIdb a specialized database for RNA editing modifications in plant organelles. Hereafter we describe its third release containing more than 26,000 events in a completely novel web interface to accommodate RNA editing in its genomics, biological and evolutionary context through whole genome maps and multiple sequence alignments. REDIdb is freely available at http://srv00.recas.ba.infn.it/redidb/index.html.

  1. REDIdb 3.0: A Comprehensive Collection of RNA Editing Events in Plant Organellar Genomes

    Directory of Open Access Journals (Sweden)

    Claudio Lo Giudice

    2018-04-01

    Full Text Available RNA editing is an important epigenetic mechanism by which genome-encoded transcripts are modified by substitutions, insertions and/or deletions. It was first discovered in kinetoplastid protozoa followed by its reporting in a wide range of organisms. In plants, RNA editing occurs mostly by cytidine (C to uridine (U conversion in translated regions of organelle mRNAs and tends to modify affected codons restoring evolutionary conserved aminoacid residues. RNA editing has also been described in non-protein coding regions such as group II introns and structural RNAs. Despite its impact on organellar transcriptome and proteome complexity, current primary databases still do not provide a specific field for RNA editing events. To overcome these limitations, we developed REDIdb a specialized database for RNA editing modifications in plant organelles. Hereafter we describe its third release containing more than 26,000 events in a completely novel web interface to accommodate RNA editing in its genomics, biological and evolutionary context through whole genome maps and multiple sequence alignments. REDIdb is freely available at http://srv00.recas.ba.infn.it/redidb/index.html

  2. Identification of an Arabidopsis thaliana protein that binds to tomato mosaic virus genomic RNA and inhibits its multiplication

    International Nuclear Information System (INIS)

    Fujisaki, Koki; Ishikawa, Masayuki

    2008-01-01

    The genomic RNAs of positive-strand RNA viruses carry RNA elements that play positive, or in some cases, negative roles in virus multiplication by interacting with viral and cellular proteins. In this study, we purified Arabidopsis thaliana proteins that specifically bind to 5' or 3' terminal regions of tomato mosaic virus (ToMV) genomic RNA, which contain important regulatory elements for translation and RNA replication, and identified these proteins by mass spectrometry analyses. One of these host proteins, named BTR1, harbored three heterogeneous nuclear ribonucleoprotein K-homology RNA-binding domains and preferentially bound to RNA fragments that contained a sequence around the initiation codon of the 130K and 180K replication protein genes. The knockout and overexpression of BTR1 specifically enhanced and inhibited, respectively, ToMV multiplication in inoculated A. thaliana leaves, while such effect was hardly detectable in protoplasts. These results suggest that BTR1 negatively regulates the local spread of ToMV

  3. Evolutionary Genomics of an Ancient Prophage of the Order Sphingomonadales

    Science.gov (United States)

    Viswanathan, Vandana; Narjala, Anushree; Ravichandran, Aravind; Jayaprasad, Suvratha

    2017-01-01

    The order Sphingomonadales, containing the families Erythrobacteraceae and Sphingomonadaceae, is a relatively less well-studied phylogenetic branch within the class Alphaproteobacteria. Prophage elements are present in most bacterial genomes and are important determinants of adaptive evolution. An “intact” prophage was predicted within the genome of Sphingomonas hengshuiensis strain WHSC-8 and was designated Prophage IWHSC-8. Loci homologous to the region containing the first 22 open reading frames (ORFs) of Prophage IWHSC-8 were discovered among the genomes of numerous Sphingomonadales. In 17 genomes, the homologous loci were co-located with an ORF encoding a putative superoxide dismutase. Several other lines of molecular evidence implied that these homologous loci represent an ancient temperate bacteriophage integration, and this horizontal transfer event pre-dated niche-based speciation within the order Sphingomonadales. The “stabilization” of prophages in the genomes of their hosts is an indicator of “fitness” conferred by these elements and natural selection. Among the various ORFs predicted within the conserved prophages, an ORF encoding a putative proline-rich outer membrane protein A was consistently present among the genomes of many Sphingomonadales. Furthermore, the conserved prophages in six Sphingomonas sp. contained an ORF encoding a putative spermidine synthase. It is possible that one or more of these ORFs bestow selective fitness, and thus the prophages continue to be vertically transferred within the host strains. Although conserved prophages have been identified previously among closely related genera and species, this is the first systematic and detailed description of orthologous prophages at the level of an order that contains two diverse families and many pigmented species. PMID:28201618

  4. Characterization of the dsDNA prophage sequences in the genome of Neisseria gonorrhoeae and visualization of productive bacteriophage

    Directory of Open Access Journals (Sweden)

    Maugel Timothy K

    2007-07-01

    Full Text Available Abstract Background Bioinformatic analysis of the genome sequence of Neisseria gonorrhoeae revealed the presence of nine probable prophage islands. The distribution, conservation and function of many of these sequences, and their ability to produce bacteriophage particles are unknown. Results Our analysis of the genomic sequence of FA1090 identified five genomic regions (NgoΦ1 – 5 that are related to dsDNA lysogenic phage. The genetic content of the dsDNA prophage sequences were examined in detail and found to contain blocks of genes encoding for proteins homologous to proteins responsible for phage DNA replication, structural proteins and proteins responsible for phage assembly. The DNA sequences from NgoΦ1, NgoΦ2 and NgoΦ3 contain some significant regions of identity. A unique region of NgoΦ2 showed very high similarity with the Pseudomonas aeruginosa generalized transducing phage F116. Comparative analysis at the nucleotide and protein levels suggests that the sequences of NgoΦ1 and NgoΦ2 encode functionally active phages, while NgoΦ3, NgoΦ4 and NgoΦ5 encode incomplete genomes. Expression of the NgoΦ1 and NgoΦ2 repressors in Escherichia coli inhibit the growth of E. coli and the propagation of phage λ. The NgoΦ2 repressor was able to inhibit transcription of N. gonorrhoeae genes and Haemophilus influenzae HP1 phage promoters. The holin gene of NgoΦ1 (identical to that encoded by NgoΦ2, when expressed in E. coli, could serve as substitute for the phage λ s gene. We were able to detect the presence of the DNA derived from NgoΦ1 in the cultures of N. gonorrhoeae. Electron microscopy analysis of culture supernatants revealed the presence of multiple forms of bacteriophage particles. Conclusion These data suggest that the genes similar to dsDNA lysogenic phage present in the gonococcus are generally conserved in this pathogen and that they are able to regulate the expression of other neisserial genes. Since phage particles were

  5. Selective Memories: Infants' Encoding Is Enhanced in Selection via Suppression

    Science.gov (United States)

    Markant, Julie; Amso, Dima

    2013-01-01

    The present study examined the hypothesis that inhibitory visual selection mechanisms play a vital role in memory by limiting distractor interference during item encoding. In Experiment 1a we used a modified spatial cueing task in which 9-month-old infants encoded multiple category exemplars in the contexts of an attention orienting mechanism…

  6. Comparative genomic and plasmid analysis of beer-spoiling and non-beer-spoiling Lactobacillus brevis isolates.

    Science.gov (United States)

    Bergsveinson, Jordyn; Ziola, Barry

    2017-12-01

    Beer-spoilage-related lactic acid bacteria (BSR LAB) belong to multiple genera and species; however, beer-spoilage capacity is isolate-specific and partially acquired via horizontal gene transfer within the brewing environment. Thus, the extent to which genus-, species-, or environment- (i.e., brewery-) level genetic variability influences beer-spoilage phenotype is unknown. Publicly available Lactobacillus brevis genomes were analyzed via BlAst Diagnostic Gene findEr (BADGE) for BSR genes and assessed for pangenomic relationships. Also analyzed were functional coding capacities of plasmids of LAB inhabiting extreme niche environments. Considerable genetic variation was observed in L. brevis isolated from clinical samples, whereas 16 candidate genes distinguish BSR and non-BSR L. brevis genomes. These genes are related to nutrient scavenging of gluconate or pentoses, mannose, and metabolism of pectin. BSR L. brevis isolates also have higher average nucleotide identity and stronger pangenome association with one another, though isolation source (i.e., specific brewery) also appears to influence the plasmid coding capacity of BSR LAB. Finally, it is shown that niche-specific adaptation and phenotype are plasmid-encoded for both BSR and non-BSR LAB. The ultimate combination of plasmid-encoded genes dictates the ability of L. brevis to survive in the most extreme beer environment, namely, gassed (i.e., pressurized) beer.

  7. The KL24 gene cluster and a genomic island encoding a Wzy polymerase contribute genes needed for synthesis of the K24 capsular polysaccharide by the multiply antibiotic resistant Acinetobacter baumannii isolate RCH51.

    Science.gov (United States)

    Kenyon, Johanna J; Kasimova, Anastasiya A; Shneider, Mikhail M; Shashkov, Alexander S; Arbatsky, Nikolay P; Popova, Anastasiya V; Miroshnikov, Konstantin A; Hall, Ruth M; Knirel, Yuriy A

    2017-03-01

    The whole-genome sequence of the multiply antibiotic resistant Acinetobacter baumannii isolate RCH51 belonging to sequence type ST103 (Institut Pasteur scheme) revealed that the set of genes at the capsule locus, KL24, includes four genes predicted to direct the synthesis of 3-acetamido-3,6-dideoxy-d-galactose (d-Fuc3NAc), and this sugar was found in the capsular polysaccharide (CPS). One of these genes, fdtE, encodes a novel bifunctional protein with an N-terminal FdtA 3,4-ketoisomerase domain and a C-terminal acetyltransferase domain. KL24 lacks a gene encoding a Wzy polymerase to link the oligosaccharide K units to form the CPS found associated with isolate RCH51, and a wzy gene was found in a small genomic island (GI) near the cpn60 gene. This GI is in precisely the same location as another GI carrying wzy and atr genes recently found in several A. baumannii isolates, but it does not otherwise resemble it. The CPS isolated from RCH51, studied by sugar analysis and 1D and 2D 1H and 13C NMR spectroscopy, revealed that the K unit has a branched pentasaccharide structure made up of Gal, GalNAc and GlcNAc residues with d-Fuc3NAc as a side branch, and the K units are linked via a β-d-GlcpNAc-(1→3)-β-d-Galp linkage formed by the Wzy encoded by the GI. The functions of the glycosyltransferases encoded by KL24 were assigned to formation of specific bonds. A correspondence between the order of the genes in KL24 and other KL and the order of the linkages they form was noted, and this may be useful in future predictions of glycosyltransferase specificities.

  8. Identification of Genes Encoding the Folate- and Thiamine-Binding Membrane Proteins in Firmicutes

    NARCIS (Netherlands)

    Eudes, Aymerick; Erkens, Guus B.; Slotboom, Dirk J.; Rodionov, Dmitry A.; Naponelli, Valeria; Hanson, Andrew D.

    Genes encoding high-affinity folate- and thiamine-binding proteins (FolT, ThiT) were identified in the Lactobacillus casei genome, expressed in Lactococcus lactis, and functionally characterized. Similar genes occur in many Firmicutes, sometimes next to folate or thiamine salvage genes. Most thiT

  9. GIGGLE: a search engine for large-scale integrated genome analysis.

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-02-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  10. Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina).

    Science.gov (United States)

    Schwartze, Volker U; Winter, Sascha; Shelest, Ekaterina; Marcet-Houben, Marina; Horn, Fabian; Wehner, Stefanie; Linde, Jörg; Valiante, Vito; Sammeth, Michael; Riege, Konstantin; Nowrousian, Minou; Kaerger, Kerstin; Jacobsen, Ilse D; Marz, Manja; Brakhage, Axel A; Gabaldón, Toni; Böcker, Sebastian; Voigt, Kerstin

    2014-08-01

    Lichtheimia species are the second most important cause of mucormycosis in Europe. To provide broader insights into the molecular basis of the pathogenicity-associated traits of the basal Mucorales, we report the full genome sequence of L. corymbifera and compared it to the genome of Rhizopus oryzae, the most common cause of mucormycosis worldwide. The genome assembly encompasses 33.6 MB and 12,379 protein-coding genes. This study reveals four major differences of the L. corymbifera genome to R. oryzae: (i) the presence of an highly elevated number of gene duplications which are unlike R. oryzae not due to whole genome duplication (WGD), (ii) despite the relatively high incidence of introns, alternative splicing (AS) is not frequently observed for the generation of paralogs and in response to stress, (iii) the content of repetitive elements is strikingly low (<5%), (iv) L. corymbifera is typically haploid. Novel virulence factors were identified which may be involved in the regulation of the adaptation to iron-limitation, e.g. LCor01340.1 encoding a putative siderophore transporter and LCor00410.1 involved in the siderophore metabolism. Genes encoding the transcription factors LCor08192.1 and LCor01236.1, which are similar to GATA type regulators and to calcineurin regulated CRZ1, respectively, indicating an involvement of the calcineurin pathway in the adaption to iron limitation. Genes encoding MADS-box transcription factors are elevated up to 11 copies compared to the 1-4 copies usually found in other fungi. More findings are: (i) lower content of tRNAs, but unique codons in L. corymbifera, (ii) Over 25% of the proteins are apparently specific for L. corymbifera. (iii) L. corymbifera contains only 2/3 of the proteases (known to be essential virulence factors) in comparison to R. oryzae. On the other hand, the number of secreted proteases, however, is roughly twice as high as in R. oryzae.

  11. Whole-genome analysis of the methyl tert-butyl ether-degrading beta-proteobacterium Methylibium petroleiphilum PM1.

    Science.gov (United States)

    Kane, Staci R; Chakicherla, Anu Y; Chain, Patrick S G; Schmidt, Radomir; Shin, Maria W; Legler, Tina C; Scow, Kate M; Larimer, Frank W; Lucas, Susan M; Richardson, Paul M; Hristova, Krassimira R

    2007-03-01

    Methylibium petroleiphilum PM1 is a methylotroph distinguished by its ability to completely metabolize the fuel oxygenate methyl tert-butyl ether (MTBE). Strain PM1 also degrades aromatic (benzene, toluene, and xylene) and straight-chain (C(5) to C(12)) hydrocarbons present in petroleum products. Whole-genome analysis of PM1 revealed an approximately 4-Mb circular chromosome and an approximately 600-kb megaplasmid, containing 3,831 and 646 genes, respectively. Aromatic hydrocarbon and alkane degradation, metal resistance, and methylotrophy are encoded on the chromosome. The megaplasmid contains an unusual t-RNA island, numerous insertion sequences, and large repeated elements, including a 40-kb region also present on the chromosome and a 29-kb tandem repeat encoding phosphonate transport and cobalamin biosynthesis. The megaplasmid also codes for alkane degradation and was shown to play an essential role in MTBE degradation through plasmid-curing experiments. Discrepancies between the insertion sequence element distribution patterns, the distributions of best BLASTP hits among major phylogenetic groups, and the G+C contents of the chromosome (69.2%) and plasmid (66%), together with comparative genome hybridization experiments, suggest that the plasmid was recently acquired and apparently carries the genetic information responsible for PM1's ability to degrade MTBE. Comparative genomic hybridization analysis with two PM1-like MTBE-degrading environmental isolates (approximately 99% identical 16S rRNA gene sequences) showed that the plasmid was highly conserved (ca. 99% identical), whereas the chromosomes were too diverse to conduct resequencing analysis. PM1's genome sequence provides a foundation for investigating MTBE biodegradation and exploring the genetic regulation of multiple biodegradation pathways in M. petroleiphilum and other MTBE-degrading beta-proteobacteria.

  12. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome.

    Science.gov (United States)

    Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui; Kim, Su Yeon; Korneliussen, Thorfinn; Vinckenbosch, Nicolas; Tian, Geng; Huerta-Sanchez, Emilia; Feder, Alison F; Grarup, Niels; Jørgensen, Torben; Jiang, Tao; Witte, Daniel R; Sandbæk, Annelli; Hellmann, Ines; Lauritzen, Torsten; Hansen, Torben; Pedersen, Oluf; Wang, Jun; Nielsen, Rasmus

    2011-10-01

    A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.

  13. Genomic island excisions in Bordetella petrii

    Directory of Open Access Journals (Sweden)

    Levillain Erwan

    2009-07-01

    Full Text Available Abstract Background Among the members of the genus Bordetella B. petrii is unique, since it is the only species isolated from the environment, while the pathogenic Bordetellae are obligately associated with host organisms. Another feature distinguishing B. petrii from the other sequenced Bordetellae is the presence of a large number of mobile genetic elements including several large genomic regions with typical characteristics of genomic islands collectively known as integrative and conjugative elements (ICEs. These elements mainly encode accessory metabolic factors enabling this bacterium to grow on a large repertoire of aromatic compounds. Results During in vitro culture of Bordetella petrii colony variants appear frequently. We show that this variability can be attributed to the presence of a large number of metastable mobile genetic elements on its chromosome. In fact, the genome sequence of B. petrii revealed the presence of at least seven large genomic islands mostly encoding accessory metabolic functions involved in the degradation of aromatic compounds and detoxification of heavy metals. Four of these islands (termed GI1 to GI3 and GI6 are highly related to ICEclc of Pseudomonas knackmussii sp. strain B13. Here we present first data about the molecular characterization of these islands. We defined the exact borders of each island and we show that during standard culture of the bacteria these islands get excised from the chromosome. For all but one of these islands (GI5 we could detect circular intermediates. For the clc-like elements GI1 to GI3 of B. petrii we provide evidence that tandem insertion of these islands which all encode highly related integrases and attachment sites may also lead to incorporation of genomic DNA which originally was not part of the island and to the formation of huge composite islands. By integration of a tetracycline resistance cassette into GI3 we found this island to be rather unstable and to be lost from

  14. Comparative genomics of an IncA/C multidrug resistance plasmid from Escherichia coli and Klebsiella isolates from intensive care unit patients and the utility of whole-genome sequencing in health care settings.

    Science.gov (United States)

    Hazen, Tracy H; Zhao, LiCheng; Boutin, Mallory A; Stancil, Angela; Robinson, Gwen; Harris, Anthony D; Rasko, David A; Johnson, J Kristie

    2014-08-01

    The IncA/C plasmids have been implicated for their role in the dissemination of β-lactamases, including gene variants that confer resistance to expanded-spectrum cephalosporins, which are often the treatment of last resort against multidrug-resistant, hospital-associated pathogens. A bla(FOX-5) gene was detected in 14 Escherichia coli and 16 Klebsiella isolates that were cultured from perianal swabs of patients admitted to an intensive care unit (ICU) of the University of Maryland Medical Center (UMMC) in Baltimore, MD, over a span of 3 years. Four of the FOX-encoding isolates were obtained from subsequent samples of patients that were initially negative for an AmpC β-lactamase upon admission to the ICU, suggesting that the AmpC β-lactamase-encoding plasmid was acquired while the patient was in the ICU. The genomes of five E. coli isolates and six Klebsiella isolates containing bla(FOX-5) were selected for sequencing based on their plasmid profiles. An ∼ 167-kb IncA/C plasmid encoding the FOX-5 β-lactamase, a CARB-2 β-lactamase, additional antimicrobial resistance genes, and heavy metal resistance genes was identified. Another FOX-5-encoding IncA/C plasmid that was nearly identical except for a variable region associated with the resistance genes was also identified. To our knowledge, these plasmids represent the first FOX-5-encoding plasmids sequenced. We used comparative genomics to describe the genetic diversity of a plasmid encoding a FOX-5 β-lactamase relative to the whole-genome diversity of 11 E. coli and Klebsiella isolates that carry this plasmid. Our findings demonstrate the utility of whole-genome sequencing for tracking of plasmid and antibiotic resistance gene distribution in health care settings. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  15. Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform.

    Directory of Open Access Journals (Sweden)

    Graham F Hatfull

    2006-06-01

    Full Text Available Bacteriophages are the most abundant forms of life in the biosphere and carry genomes characterized by high genetic diversity and mosaic architectures. The complete sequences of 30 mycobacteriophage genomes show them collectively to encode 101 tRNAs, three tmRNAs, and 3,357 proteins belonging to 1,536 "phamilies" of related sequences, and a statistical analysis predicts that these represent approximately 50% of the total number of phamilies in the mycobacteriophage population. These phamilies contain 2.19 proteins on average; more than half (774 of them contain just a single protein sequence. Only six phamilies have representatives in more than half of the 30 genomes, and only three-encoding tape-measure proteins, lysins, and minor tail proteins-are present in all 30 phages, although these phamilies are themselves highly modular, such that no single amino acid sequence element is present in all 30 mycobacteriophage genomes. Of the 1,536 phamilies, only 230 (15% have amino acid sequence similarity to previously reported proteins, reflecting the enormous genetic diversity of the entire phage population. The abundance and diversity of phages, the simplicity of phage isolation, and the relatively small size of phage genomes support bacteriophage isolation and comparative genomic analysis as a highly suitable platform for discovery-based education.

  16. Nuclear-Cytoplasmic Conflict in Pea (Pisum sativum L.) Is Associated with Nuclear and Plastidic Candidate Genes Encoding Acetyl-CoA Carboxylase Subunits

    Science.gov (United States)

    Bogdanova, Vera S.; Zaytseva, Olga O.; Mglinets, Anatoliy V.; Shatskaya, Natalia V.; Kosterin, Oleg E.; Vasiliev, Gennadiy V.

    2015-01-01

    In crosses of wild and cultivated peas (Pisum sativum L.), nuclear-cytoplasmic incompatibility frequently occurs manifested as decreased pollen fertility, male gametophyte lethality, sporophyte lethality. High-throughput sequencing of plastid genomes of one cultivated and four wild pea accessions differing in cross-compatibility was performed. Candidate genes for involvement in the nuclear-plastid conflict were searched in the reconstructed plastid genomes. In the annotated Medicago truncatula genome, nuclear candidate genes were searched in the portion syntenic to the pea chromosome region known to harbor a locus involved in the conflict. In the plastid genomes, a substantial variability of the accD locus represented by nucleotide substitutions and indels was found to correspond to the pattern of cross-compatibility among the accessions analyzed. Amino acid substitutions in the polypeptides encoded by the alleles of a nuclear locus, designated as Bccp3, with a complementary function to accD, fitted the compatibility pattern. The accD locus in the plastid genome encoding beta subunit of the carboxyltransferase of acetyl-coA carboxylase and the nuclear locus Bccp3 encoding biotin carboxyl carrier protein of the same multi-subunit enzyme were nominated as candidate genes for main contribution to nuclear-cytoplasmic incompatibility in peas. Existence of another nuclear locus involved in the accD-mediated conflict is hypothesized. PMID:25789472

  17. Nuclear-cytoplasmic conflict in pea (Pisum sativum L. is associated with nuclear and plastidic candidate genes encoding acetyl-CoA carboxylase subunits.

    Directory of Open Access Journals (Sweden)

    Vera S Bogdanova

    Full Text Available In crosses of wild and cultivated peas (Pisum sativum L., nuclear-cytoplasmic incompatibility frequently occurs manifested as decreased pollen fertility, male gametophyte lethality, sporophyte lethality. High-throughput sequencing of plastid genomes of one cultivated and four wild pea accessions differing in cross-compatibility was performed. Candidate genes for involvement in the nuclear-plastid conflict were searched in the reconstructed plastid genomes. In the annotated Medicago truncatula genome, nuclear candidate genes were searched in the portion syntenic to the pea chromosome region known to harbor a locus involved in the conflict. In the plastid genomes, a substantial variability of the accD locus represented by nucleotide substitutions and indels was found to correspond to the pattern of cross-compatibility among the accessions analyzed. Amino acid substitutions in the polypeptides encoded by the alleles of a nuclear locus, designated as Bccp3, with a complementary function to accD, fitted the compatibility pattern. The accD locus in the plastid genome encoding beta subunit of the carboxyltransferase of acetyl-coA carboxylase and the nuclear locus Bccp3 encoding biotin carboxyl carrier protein of the same multi-subunit enzyme were nominated as candidate genes for main contribution to nuclear-cytoplasmic incompatibility in peas. Existence of another nuclear locus involved in the accD-mediated conflict is hypothesized.

  18. Genomic sequence and organization of two members of a human lectin gene family

    International Nuclear Information System (INIS)

    Gitt, M.A.; Barondes, S.H.

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family

  19. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas.

    Science.gov (United States)

    Calin, George A; Liu, Chang-gong; Ferracin, Manuela; Hyslop, Terry; Spizzo, Riccardo; Sevignani, Cinzia; Fabbri, Muller; Cimmino, Amelia; Lee, Eun Joo; Wojcik, Sylwia E; Shimizu, Masayoshi; Tili, Esmerina; Rossi, Simona; Taccioli, Cristian; Pichiorri, Flavia; Liu, Xiuping; Zupo, Simona; Herlea, Vlad; Gramantieri, Laura; Lanza, Giovanni; Alder, Hansjuerg; Rassenti, Laura; Volinia, Stefano; Schmittgen, Thomas D; Kipps, Thomas J; Negrini, Massimo; Croce, Carlo M

    2007-09-01

    Noncoding RNA (ncRNA) transcripts are thought to be involved in human tumorigenesis. We report that a large fraction of genomic ultraconserved regions (UCRs) encode a particular set of ncRNAs whose expression is altered in human cancers. Genome-wide profiling revealed that UCRs have distinct signatures in human leukemias and carcinomas. UCRs are frequently located at fragile sites and genomic regions involved in cancers. We identified certain UCRs whose expression may be regulated by microRNAs abnormally expressed in human chronic lymphocytic leukemia, and we proved that the inhibition of an overexpressed UCR induces apoptosis in colon cancer cells. Our findings argue that ncRNAs and interaction between noncoding genes are involved in tumorigenesis to a greater extent than previously thought.

  20. GIGGLE: a search engine for large-scale integrated genome analysis

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061

  1. Whole genome analysis of a livestock-associated methicillin-resistant Staphylococcus aureus ST398 isolate from a case of human endocarditis

    Directory of Open Access Journals (Sweden)

    van Strijp Jos AG

    2010-06-01

    Full Text Available Abstract Background Recently, a new livestock-associated methicillin-resistant Staphylococcus aureus (MRSA Sequence Type 398 (ST398 isolate has emerged worldwide. Although there have been reports of invasive disease in humans, MRSA ST398 colonization is much more common in livestock and demonstrates especially high prevalence rates in pigs and calves. The aim of this study was to compare the genome sequence of an ST398 MRSA isolate with other S. aureus genomes in order to identify genetic traits that may explain the success of this particular lineage. Therefore, we determined the whole genome sequence of S0385, an MRSA ST398 isolate from a human case of endocarditis. Results The entire genome sequence of S0385 demonstrated considerable accessory genome content differences relative to other S. aureus genomes. Several mobile genetic elements that confer antibiotic resistance were identified, including a novel composite of an type V (5C2&5 Staphylococcal Chromosome Cassette mec (SCCmec with distinct joining (J regions. The presence of multiple integrative conjugative elements combined with the absence of a type I restriction and modification system on one of the two νSa islands, could enhance horizontal gene transfer in this strain. The ST398 MRSA isolate carries a unique pathogenicity island which encodes homologues of two excreted virulence factors; staphylococcal complement inhibitor (SCIN and von Willebrand factor-binding protein (vWbp. However, several virulence factors such as enterotoxins and phage encoded toxins, including Panton-Valentine leukocidin (PVL, were not identified in this isolate. Conclusions Until now MRSA ST398 isolates did not cause frequent invasive disease in humans, which may be due to the absence of several common virulence factors. However, the proposed enhanced ability of these isolates to acquire mobile elements may lead to the rapid acquisition of determinants which contribute to virulence in human infections.

  2. IQCJ-SCHIP1, a novel fusion transcript encoding a calmodulin-binding IQ motif protein

    International Nuclear Information System (INIS)

    Kwasnicka-Crawford, Dorota A.; Carson, Andrew R.; Scherer, Stephen W.

    2006-01-01

    The existence of transcripts that span two adjacent, independent genes is considered rare in the human genome. This study characterizes a novel human fusion gene named IQCJ-SCHIP1. IQCJ-SCHIP1 is the longest isoform of a complex transcriptional unit that bridges two separate genes that encode distinct proteins, IQCJ, a novel IQ motif containing protein and SCHIP1, a schwannomin interacting protein that has been previously shown to interact with the Neurofibromatosis type 2 (NF2) protein. IQCJ-SCHIP1 is located on the chromosome 3q25 and comprises a 1692-bp transcript encompassing 11 exons spanning 828 kb of the genomic DNA. We show that IQCJ-SCHIP1 mRNA is highly expressed in the brain. Protein encoded by the IQCJ-SCHIP1 gene was localized to cytoplasm and actin-rich regions and in differentiated PC12 cells was also seen in neurite extensions

  3. Role of Virus-Encoded microRNAs in Avian Viral Diseases

    Directory of Open Access Journals (Sweden)

    Yongxiu Yao

    2014-03-01

    Full Text Available With total dependence on the host cell, several viruses have adopted strategies to modulate the host cellular environment, including the modulation of microRNA (miRNA pathway through virus-encoded miRNAs. Several avian viruses, mostly herpesviruses, have been shown to encode a number of novel miRNAs. These include the highly oncogenic Marek’s disease virus-1 (26 miRNAs, avirulent Marek’s disease virus-2 (36 miRNAs, herpesvirus of turkeys (28 miRNAs, infectious laryngotracheitis virus (10 miRNAs, duck enteritis virus (33 miRNAs and avian leukosis virus (2 miRNAs. Despite the closer antigenic and phylogenetic relationship among some of the herpesviruses, miRNAs encoded by different viruses showed no sequence conservation, although locations of some of the miRNAs were conserved within the repeat regions of the genomes. However, some of the virus-encoded miRNAs showed significant sequence homology with host miRNAs demonstrating their ability to serve as functional orthologs. For example, mdv1-miR-M4-5p, a functional ortholog of gga-miR-155, is critical for the oncogenicity of Marek’s disease virus. Additionally, we also describe the potential association of the recently described avian leukosis virus subgroup J encoded E (XSR miRNA in the induction of myeloid tumors in certain genetically-distinct chicken lines. In this review, we describe the advances in our understanding on the role of virus-encoded miRNAs in avian diseases.

  4. Non-functional plastid ndh gene fragments are present in the nuclear genome of Norway spruce (Picea abies L. Karsch): insights from in silico analysis of nuclear and organellar genomes.

    Science.gov (United States)

    Ranade, Sonali Sachin; García-Gil, María Rosario; Rosselló, Josep A

    2016-04-01

    Many genes have been lost from the prokaryote plastidial genome during the early events of endosymbiosis in eukaryotes. Some of them were definitively lost, but others were relocated and functionally integrated to the host nuclear genomes through serial events of gene transfer during plant evolution. In gymnosperms, plastid genome sequencing has revealed the loss of ndh genes from several species of Gnetales and Pinaceae, including Norway spruce (Picea abies). This study aims to trace the ndh genes in the nuclear and organellar Norway spruce genomes. The plastid genomes of higher plants contain 11 ndh genes which are homologues of mitochondrial genes encoding subunits of the proton-pumping NADH-dehydrogenase (nicotinamide adenine dinucleotide dehydrogenase) or complex I (electron transport chain). Ndh genes encode 11 NDH polypeptides forming the Ndh complex (analogous to complex I) which seems to be primarily involved in chloro-respiration processes. We considered ndh genes from the plastidial genome of four gymnosperms (Cryptomeria japonica, Cycas revoluta, Ginkgo biloba, Podocarpus totara) and a single angiosperm species (Arabidopsis thaliana) to trace putative homologs in the nuclear and organellar Norway spruce genomes using tBLASTn to assess the evolutionary fate of ndh genes in Norway spruce and to address their genomic location(s), structure, integrity and functionality. The results obtained from tBLASTn were subsequently analyzed by performing homology search for finding ndh specific conserved domains using conserved domain search. We report the presence of non-functional plastid ndh gene fragments, excepting ndhE and ndhG genes, in the nuclear genome of Norway spruce. Regulatory transcriptional elements like promoters, TATA boxes and enhancers were detected in the upstream regions of some ndh fragments. We also found transposable elements in the flanking regions of few ndh fragments suggesting nuclear rearrangements in those regions. These evidences

  5. GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing

    Directory of Open Access Journals (Sweden)

    Yu GongXin

    2010-10-01

    Full Text Available Abstract Background Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis is thus essential to our understanding of how selective and neutral forces shape bacterial populations over a short period of time. The Solexa Genome Analyzer, a next-generation sequencing platform, allows millions of short sequencing reads to be obtained with great accuracy, allowing for the ability to study the dynamics of the bacterial population at the whole genome level. The tool referred to as GenHtr was developed for genome-wide heterogeneity analysis. Results For particular bacterial strains, GenHtr relies on a set of Solexa short reads on given bacteria pathogens and their isogenic reference genome to identify heterogeneity sites, the chromosomal positions with multiple variants of genes in the bacterial population, and variations that occur in large gene families. GenHtr accomplishes this by building and comparatively analyzing genome-wide heterogeneity genotypes for both the newly sequenced genomes (using massive short-read sequencing and their isogenic reference (using simulated data. As proof of the concept, this approach was applied to SRX007711, the Solexa sequencing data for a newly sequenced Staphylococcus aureus subsp. USA300 cell line, and demonstrated that it could predict such multiple variants. They include multiple variants of genes critical in pathogenesis, e.g. genes encoding a LysR family transcriptional regulator, 23 S ribosomal RNA, and DNA mismatch repair protein MutS. The heterogeneity results in non-synonymous and nonsense mutations, leading to truncated proteins for both LysR and MutS. Conclusion GenHtr was developed for genome-wide heterogeneity analysis. Although it is much more time

  6. Distributed-phase OCDMA encoder-decoders based on fiber Bragg gratings

    OpenAIRE

    Zhang, Zhaowei; Tian, C.; Petropoulos, P.; Richardson, D.J.; Ibsen, M.

    2007-01-01

    We propose and demonstrate new optical code-division multiple-access (OCDMA) encoder-decoders having a continuous phase-distribution. With the same spatial refractive index distribution as the reconfigurable optical phase encoder-decoders, they are inherently suitable for the application in reconfigurable OCDMA systems. Furthermore, compared with conventional discrete-phase devices, they also have additional advantages of being more tolerant to input pulse width and, therefore, have the poten...

  7. Relaxation of Selective Constraints Causes Independent Selenoprotein Extinction in Insect Genomes

    OpenAIRE

    Chapple, Charles E.; Guigó, Roderic

    2008-01-01

    BACKGROUND: Selenoproteins are a diverse family of proteins notable for the presence of the 21st amino acid, selenocysteine. Until very recently, all metazoan genomes investigated encoded selenoproteins, and these proteins had therefore been believed to be essential for animal life. Challenging this assumption, recent comparative analyses of insect genomes have revealed that some insect genomes appear to have lost selenoprotein genes. METHODOLOGY/PRINCIPAL FINDINGS: In this paper we investiga...

  8. Genome scale metabolic modeling of cancer

    DEFF Research Database (Denmark)

    Nilsson, Avlant; Nielsen, Jens

    2017-01-01

    of metabolism which allows simulation and hypotheses testing of metabolic strategies. It has successfully been applied to many microorganisms and is now used to study cancer metabolism. Generic models of human metabolism have been reconstructed based on the existence of metabolic genes in the human genome......Cancer cells reprogram metabolism to support rapid proliferation and survival. Energy metabolism is particularly important for growth and genes encoding enzymes involved in energy metabolism are frequently altered in cancer cells. A genome scale metabolic model (GEM) is a mathematical formalization...

  9. Encoding asymmetry of the N-glycosylation motif facilitates glycoprotein evolution.

    Directory of Open Access Journals (Sweden)

    Ryan Williams

    Full Text Available Protein N-glycosylation is found in all domains of life and has a conserved role in glycoprotein folding and stability. In animals, glycoproteins transit through the Golgi where the N-glycans are trimmed and rebuilt with sequences that bind lectins, an innovation that greatly increases structural diversity and redundancy of glycoprotein-lectin interaction at the cell surface. Here we ask whether the natural tension between increasing diversity (glycan-protein interactions and site multiplicity (backup and status quo might be revealed by a phylogenic examination of glycoproteins and NXS/T(X ≠ P N-glycosylation sites. Site loss is more likely by mutation at Asn encoded by two adenosine (A-rich codons, while site gain is more probable by generating Ser or Thr downstream of an existing Asn. Thus mutations produce sites at novel positions more frequently than the reversal of recently lost sites, and therefore more paths though sequence space are made available to natural selection. An intra-species comparison of secretory and cytosolic proteins revealed a departure from equilibrium in sequences one-mutation-away from NXS/T and in (A content, indicating strong selective pressures and exploration of N-glycosylation positions during vertebrate evolution. Furthermore, secretory proteins have evolved at rates proportional to N-glycosylation site number, indicating adaptive interactions between the N-glycans and underlying protein. Given the topology of the genetic code, mutation of (A is more often nonsynonomous, and Lys, another target of many PTMs, is also encoded by two (A-rich codons. An examination of acetyl-Lys sites in proteins indicated similar evolutionary dynamics, consistent with asymmetry of the target and recognition portions of modified sites. Our results suggest that encoding asymmetry is an ancient mechanism of evolvability that increases diversity and experimentation with PTM site positions. Strong selective pressures on PTMs may have

  10. Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

    Directory of Open Access Journals (Sweden)

    Walker M Andrew

    2006-09-01

    Full Text Available Abstract Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c, 54 (Dixon, 83 (Ann1 and 9 (Temecula-1. A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes

  11. Genomics and fish adaptation

    Directory of Open Access Journals (Sweden)

    Agostinho Antunes

    2015-12-01

    Full Text Available The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of varied fish species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

  12. Unexplored therapeutic opportunities in the human genome

    DEFF Research Database (Denmark)

    Oprea, Tudor I; Bologa, Cristian G; Brunak, Søren

    2018-01-01

    A large proportion of biomedical research and the development of therapeutics is focused on a small fraction of the human genome. In a strategic effort to map the knowledge gaps around proteins encoded by the human genome and to promote the exploration of currently understudied, but potentially d...... as well as key drug target classes, including G protein-coupled receptors, protein kinases and ion channels, which illustrate the nature of the unexplored opportunities for biomedical research and therapeutic development....

  13. Complete genome sequence of Halorhodospira halophila SL1

    Energy Technology Data Exchange (ETDEWEB)

    Challacombe, Jean F [ORNL; Majid, Sophia [University of Chicago; Deole, Ratnakar [Oklahoma State University; Brettin, Thomas S. [Argonne National Laboratory (ANL); Bruce, David [Los Alamos National Laboratory (LANL); Delano, Susana [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Gleasner, Cheryl D. [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Misra, Monica [Los Alamos National Laboratory (LANL); Reitenga, Krista K. [Los Alamos National Laboratory (LANL); Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Hoff, Wouter D. [Oklahoma State University

    2013-01-01

    Halorhodospira halophila is among the most halophilic organisms known. It is an obligately photosynthetic and anaerobic purple sulfur bacterium that exhibits autotrophic growth up to saturated NaCl concentrations. The type strain H. halophila SL1 was isolated from a hypersaline lake in Oregon. Here we report the determination of its entire genome in a single contig. This is the first genome of a phototrophic extreme halophile. The genome consists of 2,678,452 bp, encoding 2493 predicted genes as determined by automated genome annotation. Of the 2407 predicted proteins, 1905 were assigned to a putative function. Future detailed analysis of this genome promises to yield insights into the halophilic adaptations of this organism, its ability for photoautotrophic growth under extreme conditions, and its characteristic sulfur metabolism.

  14. Concerted evolution of sea anemone neurotoxin genes is revealed through analysis of the Nematostella vectensis genome.

    Science.gov (United States)

    Moran, Yehu; Weinberger, Hagar; Sullivan, James C; Reitzel, Adam M; Finnerty, John R; Gurevitz, Michael

    2008-04-01

    Gene families, which encode toxins, are found in many poisonous animals, yet there is limited understanding of their evolution at the nucleotide level. The release of the genome draft sequence for the sea anemone Nematostella vectensis enabled a comprehensive study of a gene family whose neurotoxin products affect voltage-gated sodium channels. All gene family members are clustered in a highly repetitive approximately 30-kb genomic region and encode a single toxin, Nv1. These genes exhibit extreme conservation at the nucleotide level which cannot be explained by purifying selection. This conservation greatly differs from the toxin gene families of other animals (e.g., snakes, scorpions, and cone snails), whose evolution was driven by diversifying selection, thereby generating a high degree of genetic diversity. The low nucleotide diversity at the Nv1 genes is reminiscent of that reported for DNA encoding ribosomal RNA (rDNA) and 2 hsp70 genes from Drosophila, which have evolved via concerted evolution. This evolutionary pattern was experimentally demonstrated in yeast rDNA and was shown to involve unequal crossing-over. Through sequence analysis of toxin genes from multiple N. vectensis populations and 2 other anemone species, Anemonia viridis and Actinia equina, we observed that the toxin genes for each sea anemone species are more similar to one another than to those of other species, suggesting they evolved by manner of concerted evolution. Furthermore, in 2 of the species (A. viridis and A. equina) we found genes that evolved under diversifying selection, suggesting that concerted evolution and accelerated evolution may occur simultaneously.

  15. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : Implications for the microbial "pan-genome"

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and

  16. Properties of virion transactivator proteins encoded by primate cytomegaloviruses

    Directory of Open Access Journals (Sweden)

    Barry Peter A

    2009-05-01

    Full Text Available Abstract Background Human cytomegalovirus (HCMV is a betaherpesvirus that causes severe disease in situations where the immune system is immature or compromised. HCMV immediate early (IE gene expression is stimulated by the virion phosphoprotein pp71, encoded by open reading frame (ORF UL82, and this transactivation activity is important for the efficient initiation of viral replication. It is currently recognized that pp71 acts to overcome cellular intrinsic defences that otherwise block viral IE gene expression, and that interactions of pp71 with the cell proteins Daxx and ATRX are important for this function. A further property of pp71 is the ability to enable prolonged gene expression from quiescent herpes simplex virus type 1 (HSV-1 genomes. Non-human primate cytomegaloviruses encode homologs of pp71, but there is currently no published information that addresses their effects on gene expression and modes of action. Results The UL82 homolog encoded by simian cytomegalovirus (SCMV, strain Colburn, was identified and cloned. This ORF, named S82, was cloned into an HSV-1 vector, as were those from baboon, rhesus monkey and chimpanzee cytomegaloviruses. The use of an HSV-1 vector enabled expression of the UL82 homologs in a range of cell types, and permitted investigation of their abilities to direct prolonged gene expression from quiescent genomes. The results show that all UL82 homologs activate gene expression, and that neither host cell type nor promoter target sequence has major effects on these activities. Surprisingly, the UL82 proteins specified by non-human primate cytomegaloviruses, unlike pp71, did not direct long term expression from quiescent HSV-1 genomes. In addition, significant differences were observed in the intranuclear localization of the UL82 homologs, and in their effects on Daxx. Strikingly, S82 mediated the release of Daxx from nuclear domain 10 substructures much more rapidly than pp71 or the other proteins tested. All

  17. A system-level model for the microbial regulatory genome.

    Science.gov (United States)

    Brooks, Aaron N; Reiss, David J; Allard, Antoine; Wu, Wei-Ju; Salvanha, Diego M; Plaisier, Christopher L; Chandrasekaran, Sriram; Pan, Min; Kaur, Amardeep; Baliga, Nitin S

    2014-07-15

    Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes. © 2014 The Authors. Published under the terms of the CC BY 4.0 license.

  18. The complete genome sequence of a south Indian isolate of Rice tungro spherical virus reveals evidence of genetic recombination between distinct isolates.

    Science.gov (United States)

    Sailaja, B; Anjum, Najreen; Patil, Yogesh K; Agarwal, Surekha; Malathi, P; Krishnaveni, D; Balachandran, S M; Viraktamath, B C; Mangrauthia, Satendra K

    2013-12-01

    In this study, complete genome of a south Indian isolate of Rice tungro spherical virus (RTSV) from Andhra Pradesh (AP) was sequenced, and the predicted amino acid sequence was analysed. The RTSV RNA genome consists of 12,171 nt without the poly(A) tail, encoding a putative typical polyprotein of 3,470 amino acids. Furthermore, cleavage sites and sequence motifs of the polyprotein were predicted. Multiple alignment with other RTSV isolates showed a nucleotide sequence identity of 95% to east Indian isolates and 90% to Philippines isolates. A phylogenetic tree based on complete genome sequence showed that Indian isolates clustered together, while Vt6 and PhilA isolates of Philippines formed two separate clusters. Twelve recombination events were detected in RNA genome of RTSV using the Recombination Detection Program version 3. Recombination analysis suggested significant role of 5' end and central region of genome in virus evolution. Further, AP and Odisha isolates appeared as important RTSV isolates involved in diversification of this virus in India through recombination phenomenon. The new addition of complete genome of first south Indian isolate provided an opportunity to establish the molecular evolution of RTSV through recombination analysis and phylogenetic relationship.

  19. Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean.

    Science.gov (United States)

    Liu, Chen-Jian; Wang, Rui; Gong, Fu-Ming; Liu, Xiao-Feng; Zheng, Hua-Jun; Luo, Yi-Yong; Li, Xiao-Ran

    2015-12-01

    Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Comparative genome analysis of the high pathogenicity Salmonella Typhimurium strain UK-1.

    Directory of Open Access Journals (Sweden)

    Yingqin Luo

    Full Text Available Salmonella enterica serovar Typhimurium, a gram-negative facultative rod-shaped bacterium causing salmonellosis and foodborne disease, is one of the most common isolated Salmonella serovars in both developed and developing nations. Several S. Typhimurium genomes have been completed and many more genome-sequencing projects are underway. Comparative genome analysis of the multiple strains leads to a better understanding of the evolution of S. Typhimurium and its pathogenesis. S. Typhimurium strain UK-1 (belongs to phage type 1 is highly virulent when orally administered to mice and chickens and efficiently colonizes lymphoid tissues of these species. These characteristics make this strain a good choice for use in vaccine development. In fact, UK-1 has been used as the parent strain for a number of nonrecombinant and recombinant vaccine strains, including several commercial vaccines for poultry. In this study, we conducted a thorough comparative genome analysis of the UK-1 strain with other S. Typhimurium strains and examined the phenotypic impact of several genomic differences. Whole genomic comparison highlights an extremely close relationship between the UK-1 strain and other S. Typhimurium strains; however, many interesting genetic and genomic variations specific to UK-1 were explored. In particular, the deletion of a UK-1-specific gene that is highly similar to the gene encoding the T3SS effector protein NleC exhibited a significant decrease in oral virulence in BALB/c mice. The complete genetic complements in UK-1, especially those elements that contribute to virulence or aid in determining the diversity within bacterial species, provide key information in evaluating the functional characterization of important genetic determinants and for development of vaccines.

  1. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  2. Resolution of habitat-associated ecogenomic signatures in bacteriophage genomes and application to microbial source tracking.

    Science.gov (United States)

    Ogilvie, Lesley A; Nzakizwanayo, Jonathan; Guppy, Fergus M; Dedi, Cinzia; Diston, David; Taylor, Huw; Ebdon, James; Jones, Brian V

    2018-04-01

    Just as the expansion in genome sequencing has revealed and permitted the exploitation of phylogenetic signals embedded in bacterial genomes, the application of metagenomics has begun to provide similar insights at the ecosystem level for microbial communities. However, little is known regarding this aspect of bacteriophage associated with microbial ecosystems, and if phage encode discernible habitat-associated signals diagnostic of underlying microbiomes. Here we demonstrate that individual phage can encode clear habitat-related 'ecogenomic signatures', based on relative representation of phage-encoded gene homologues in metagenomic data sets. Furthermore, we show the ecogenomic signature encoded by the gut-associated ɸB124-14 can be used to segregate metagenomes according to environmental origin, and distinguish 'contaminated' environmental metagenomes (subject to simulated in silico human faecal pollution) from uncontaminated data sets. This indicates phage-encoded ecological signals likely possess sufficient discriminatory power for use in biotechnological applications, such as development of microbial source tracking tools for monitoring water quality.

  3. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    Energy Technology Data Exchange (ETDEWEB)

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  4. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

    Directory of Open Access Journals (Sweden)

    Jonathan A Eisen

    2006-09-01

    Full Text Available The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC, which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases, using diverse resources (e.g., proteases and transporters, and generating structural complexity (e.g., kinesins and dyneins. In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates, no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from

  5. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    Science.gov (United States)

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  6. The porcine lymphotropic herpesvirus 1 encodes functional regulators of gene expression

    International Nuclear Information System (INIS)

    Lindner, I.; Ehlers, B.; Noack, S.; Dural, G.; Yasmum, N.; Bauer, C.; Goltz, M.

    2007-01-01

    The porcine lymphotropic herpesviruses (PLHV) are discussed as possible risk factors in xenotransplantation because of the high prevalence of PLHV-1, PLHV-2 and PLHV-3 in pig populations world-wide and the fact that PLHV-1 has been found to be associated with porcine post-transplant lymphoproliferative disease. To provide structural and functional knowledge on the PLHV immediate-early (IE) transactivator genes, the central regions of the PLHV genomes were characterized by genome walking, sequence and splicing analysis. Three spliced genes were identified (ORF50, ORFA6/BZLF1 h , ORF57) encoding putative IE transactivators, homologous to (i) ORF50 and BRLF1/Rta (ii) K8/K-bZIP and BZLF1/Zta and (iii) ORF57 and BMLF1 of HHV-8 and EBV, respectively. Expressed as myc-tag or HA-tag fusion proteins, they were located to the cellular nucleus. In reporter gene assays, several PLHV-promoters were mainly activated by PLHV-1 ORF50, to a lower level by PLHV-1 ORFA6/BZLF1 h and not by PLHV-1 ORF57. However, the ORF57-encoded protein acted synergistically on ORF50-mediated activation

  7. A 10 Gbit/s OCDMA system based on electric encoding and optical transmission

    Science.gov (United States)

    Li, Chuan-qi; Hu, Jin-lin; He, Dong-dong; Chen, Mei-juan; Wang, Da-chi; Chen, Yan

    2013-11-01

    An electric encoded/optical transmission system of code division multiple access (CDMA) is proposed. It encodes the user signal in electric domain, and transfers the different code slice signals via the different wavelengths of light. This electric domain encoder/decoder is compared with current traditional encoder/decoder. Four-user modulation/demodulation optical CDMA (OCDMA) system with rate of 2.5 Gbit/s is simulated, which is based on the optical orthogonal code (OCC) designed in our laboratory. The results show that the structure of electric encoding/optical transmission can encode/decode signal correctly, and can achieve the chip rate equal to the user data rate. It can overcome the rate limitation of electronic bottleneck, and bring some potential applications in the electro-optical OCDMA system.

  8. The complete plastid genomes of the two 'dinotoms' Durinskia baltica and Kryptoperidinium foliaceum.

    Directory of Open Access Journals (Sweden)

    Behzad Imanian

    2010-05-01

    Full Text Available In one small group of dinoflagellates, photosynthesis is carried out by a tertiary endosymbiont derived from a diatom, giving rise to a complex cell that we collectively refer to as a 'dinotom'. The endosymbiont is separated from its host by a single membrane and retains plastids, mitochondria, a large nucleus, and many other eukaryotic organelles and structures, a level of complexity suggesting an early stage of integration. Although the evolution of these endosymbionts has attracted considerable interest, the plastid genome has not been examined in detail, and indeed no tertiary plastid genome has yet been sequenced.Here we describe the complete plastid genomes of two closely related dinotoms, Durinskia baltica and Kryptoperidinium foliaceum. The D. baltica (116470 bp and K. foliaceum (140426 bp plastid genomes map as circular molecules featuring two large inverted repeats that separate distinct single copy regions. The organization and gene content of the D. baltica plastid closely resemble those of the pennate diatom Phaeodactylum tricornutum. The K. foliaceum plastid genome is much larger, has undergone more reorganization, and encodes a putative tyrosine recombinase (tyrC also found in the plastid genome of the heterokont Heterosigma akashiwo, and two putative serine recombinases (serC1 and serC2 homologous to recombinases encoded by plasmids pCf1 and pCf2 in another pennate diatom, Cylindrotheca fusiformis. The K. foliaceum plastid genome also contains an additional copy of serC1, two degenerate copies of another plasmid-encoded ORF, and two non-coding regions whose sequences closely resemble portions of the pCf1 and pCf2 plasmids.These results suggest that while the plastid genomes of two dinotoms share very similar gene content and genome organization with that of the free-living pennate diatom P. tricornutum, the K. folicaeum plastid genome has absorbed two exogenous plasmids. Whether this took place before or after the tertiary

  9. The Path to Enlightenment: Making Sense of Genomic and Proteomic Information

    OpenAIRE

    Maurer, Martin H.

    2016-01-01

    Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases an...

  10. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  11. A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus.

    Science.gov (United States)

    Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E

    2016-12-01

    The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Genome update: the 1000th genome - a cautionary tale

    DEFF Research Database (Denmark)

    Lagesen, Karin; Ussery, David; Wassenaar, Gertrude Maria

    2010-01-01

    conclusions for example about the largest bacterial genome sequenced. Biological diversity is far greater than many have thought. For example, analysis of multiple Escherichia coli genomes has led to an estimate of around 45 000 gene families more genes than are recognized in the human genome. Moreover......There are now more than 1000 sequenced prokaryotic genomes deposited in public databases and available for analysis. Currently, although the sequence databases GenBank, DNA Database of Japan and EMBL are synchronized continually, there are slight differences in content at the genomes level...... for a variety of logistical reasons, including differences in format and loading errors, such as those caused by file transfer protocol interruptions. This means that the 1000th genome will be different in the various databases. Some of the data on the highly accessed web pages are inaccurate, leading to false...

  13. Whole-Genome Analysis of the Methyl tert-Butyl Ether-Degrading Beta-Proteobacterium Methylibium petroleiphilum PM1▿ †

    Science.gov (United States)

    Kane, Staci R.; Chakicherla, Anu Y.; Chain, Patrick S. G.; Schmidt, Radomir; Shin, Maria W.; Legler, Tina C.; Scow, Kate M.; Larimer, Frank W.; Lucas, Susan M.; Richardson, Paul M.; Hristova, Krassimira R.

    2007-01-01

    Methylibium petroleiphilum PM1 is a methylotroph distinguished by its ability to completely metabolize the fuel oxygenate methyl tert-butyl ether (MTBE). Strain PM1 also degrades aromatic (benzene, toluene, and xylene) and straight-chain (C5 to C12) hydrocarbons present in petroleum products. Whole-genome analysis of PM1 revealed an ∼4-Mb circular chromosome and an ∼600-kb megaplasmid, containing 3,831 and 646 genes, respectively. Aromatic hydrocarbon and alkane degradation, metal resistance, and methylotrophy are encoded on the chromosome. The megaplasmid contains an unusual t-RNA island, numerous insertion sequences, and large repeated elements, including a 40-kb region also present on the chromosome and a 29-kb tandem repeat encoding phosphonate transport and cobalamin biosynthesis. The megaplasmid also codes for alkane degradation and was shown to play an essential role in MTBE degradation through plasmid-curing experiments. Discrepancies between the insertion sequence element distribution patterns, the distributions of best BLASTP hits among major phylogenetic groups, and the G+C contents of the chromosome (69.2%) and plasmid (66%), together with comparative genome hybridization experiments, suggest that the plasmid was recently acquired and apparently carries the genetic information responsible for PM1's ability to degrade MTBE. Comparative genomic hybridization analysis with two PM1-like MTBE-degrading environmental isolates (∼99% identical 16S rRNA gene sequences) showed that the plasmid was highly conserved (ca. 99% identical), whereas the chromosomes were too diverse to conduct resequencing analysis. PM1's genome sequence provides a foundation for investigating MTBE biodegradation and exploring the genetic regulation of multiple biodegradation pathways in M. petroleiphilum and other MTBE-degrading beta-proteobacteria. PMID:17158667

  14. Nuclear scaffold attachment sites within ENCODE regions associate with actively transcribed genes.

    Directory of Open Access Journals (Sweden)

    Mignon A Keaton

    2011-03-01

    Full Text Available The human genome must be packaged and organized in a functional manner for the regulation of DNA replication and transcription. The nuclear scaffold/matrix, consisting of structural and functional nuclear proteins, remains after extraction of nuclei and anchors loops of DNA. In the search for cis-elements functioning as chromatin domain boundaries, we identified 453 nuclear scaffold attachment sites purified by lithium-3,5-iodosalicylate extraction of HeLa nuclei across 30 Mb of the human genome studied by the ENCODE pilot project. The scaffold attachment sites mapped predominately near expressed genes and localized near transcription start sites and the ends of genes but not to boundary elements. In addition, these regions were enriched for RNA polymerase II and transcription factor binding sites and were located in early replicating regions of the genome. We believe these sites correspond to genome-interactions mediated by transcription factors and transcriptional machinery immobilized on a nuclear substructure.

  15. Comparative Genomic and Functional Analysis of 100 Lactobacillus rhamnosus Strains and Their Comparison with Strain GG

    Science.gov (United States)

    Pietilä, Taija E.; Järvinen, Hanna M.; Messing, Marcel; Randazzo, Cinzia L.; Paulin, Lars; Laine, Pia; Ritari, Jarmo; Caggia, Cinzia; Lähteinen, Tanja; Brouns, Stan J. J.; Satokari, Reetta; von Ossowski, Ingemar; Reunanen, Justus; Palva, Airi; de Vos, Willem M.

    2013-01-01

    Lactobacillus rhamnosus is a lactic acid bacterium that is found in a large variety of ecological habitats, including artisanal and industrial dairy products, the oral cavity, intestinal tract or vagina. To gain insights into the genetic complexity and ecological versatility of the species L. rhamnosus, we examined the genomes and phenotypes of 100 L. rhamnosus strains isolated from diverse sources. The genomes of 100 L. rhamnosus strains were mapped onto the L. rhamnosus GG reference genome. These strains were phenotypically characterized for a wide range of metabolic, antagonistic, signalling and functional properties. Phylogenomic analysis showed multiple groupings of the species that could partly be associated with their ecological niches. We identified 17 highly variable regions that encode functions related to lifestyle, i.e. carbohydrate transport and metabolism, production of mucus-binding pili, bile salt resistance, prophages and CRISPR adaptive immunity. Integration of the phenotypic and genomic data revealed that some L. rhamnosus strains possibly resided in multiple niches, illustrating the dynamics of bacterial habitats. The present study showed two distinctive geno-phenotypes in the L. rhamnosus species. The geno-phenotype A suggests an adaptation to stable nutrient-rich niches, i.e. milk-derivative products, reflected by the alteration or loss of biological functions associated with antimicrobial activity spectrum, stress resistance, adaptability and fitness to a distinctive range of habitats. In contrast, the geno-phenotype B displays adequate traits to a variable environment, such as the intestinal tract, in terms of nutrient resources, bacterial population density and host effects. PMID:23966868

  16. Comparative genomic and functional analysis of 100 Lactobacillus rhamnosus strains and their comparison with strain GG.

    Directory of Open Access Journals (Sweden)

    François P Douillard

    Full Text Available Lactobacillus rhamnosus is a lactic acid bacterium that is found in a large variety of ecological habitats, including artisanal and industrial dairy products, the oral cavity, intestinal tract or vagina. To gain insights into the genetic complexity and ecological versatility of the species L. rhamnosus, we examined the genomes and phenotypes of 100 L. rhamnosus strains isolated from diverse sources. The genomes of 100 L. rhamnosus strains were mapped onto the L. rhamnosus GG reference genome. These strains were phenotypically characterized for a wide range of metabolic, antagonistic, signalling and functional properties. Phylogenomic analysis showed multiple groupings of the species that could partly be associated with their ecological niches. We identified 17 highly variable regions that encode functions related to lifestyle, i.e. carbohydrate transport and metabolism, production of mucus-binding pili, bile salt resistance, prophages and CRISPR adaptive immunity. Integration of the phenotypic and genomic data revealed that some L. rhamnosus strains possibly resided in multiple niches, illustrating the dynamics of bacterial habitats. The present study showed two distinctive geno-phenotypes in the L. rhamnosus species. The geno-phenotype A suggests an adaptation to stable nutrient-rich niches, i.e. milk-derivative products, reflected by the alteration or loss of biological functions associated with antimicrobial activity spectrum, stress resistance, adaptability and fitness to a distinctive range of habitats. In contrast, the geno-phenotype B displays adequate traits to a variable environment, such as the intestinal tract, in terms of nutrient resources, bacterial population density and host effects.

  17. Multiple-trait genetic evaluation using genomic matrix

    African Journals Online (AJOL)

    Jane

    2011-07-06

    Jul 6, 2011 ... relationships was estimated through computer simulation and was compared with the accuracy of ... programs, detect animals with superior genetic and select ... genomic matrices in the mixed model equations of BLUP.

  18. An electrophysiological investigation of memory encoding, depth of processing, and word frequency in humans.

    Science.gov (United States)

    Guo, Chunyan; Zhu, Ying; Ding, Jinhong; Fan, Silu; Paller, Ken A

    2004-02-12

    Memory encoding can be studied by monitoring brain activity correlated with subsequent remembering. To understand brain potentials associated with encoding, we compared multiple factors known to affect encoding. Depth of processing was manipulated by requiring subjects to detect animal names (deep encoding) or boldface (shallow encoding) in a series of Chinese words. Recognition was more accurate with deep than shallow encoding, and for low- compared to high-frequency words. Potentials were generally more positive for subsequently recognized versus forgotten words; for deep compared to shallow processing; and, for remembered words only, for low- than for high-frequency words. Latency and topographic differences between these potentials suggested that several factors influence the effectiveness of encoding and can be distinguished using these methods, even with Chinese logographic symbols.

  19. Evolution of small prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    David José Martínez-Cano

    2015-01-01

    Full Text Available As revealed by genome sequencing, the biology of prokaryotes with reduced genomes is strikingly diverse. These include free-living prokaryotes with ~800 genes as well as endosymbiotic bacteria with as few as ~140 genes. Comparative genomics is revealing the evolutionary mechanisms that led to these small genomes. In the case of free-living prokaryotes, natural selection directly favored genome reduction, while in the case of endosymbiotic prokaryotes neutral processes played a more prominent role. However, new experimental data suggest that selective processes may be at operation as well for endosymbiotic prokaryotes at least during the first stages of genome reduction. Endosymbiotic prokaryotes have evolved diverse strategies for living with reduced gene sets inside a host-defined medium. These include utilization of host-encoded functions (some of them coded by genes acquired by gene transfer from the endosymbiont and/or other bacteria; metabolic complementation between co-symbionts; and forming consortiums with other bacteria within the host. Recent genome sequencing projects of intracellular mutualistic bacteria showed that previously believed universal evolutionary trends like reduced G+C content and conservation of genome synteny are not always present in highly reduced genomes. Finally, the simplified molecular machinery of some of these organisms with small genomes may be used to aid in the design of artificial minimal cells. Here we review recent genomic discoveries of the biology of prokaryotes endowed with small gene sets and discuss the evolutionary mechanisms that have been proposed to explain their peculiar nature.

  20. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Science.gov (United States)

    Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  1. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Directory of Open Access Journals (Sweden)

    Hongbo Shi

    Full Text Available MicroRNAs (miRNAs play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  2. Molecular characterization of the genome of Maize rayado fino virus, the type member of the genus Marafivirus.

    Science.gov (United States)

    Hammond, R W; Ramirez, P

    2001-04-10

    The complete nucleotide sequence of the single-stranded RNA genome of Maize rayado fino virus (MRFV), the type member of the genus Marafivirus, is 6305 nucleotides (nts) in length and contains two putative open reading frames (ORFs). The largest ORF (nt 97-6180) encodes a polyprotein of 224 kDa with sequence similarities at its N-terminus to the replication-associated proteins of other viruses with positive-strand RNA genomes and to the papainlike protease domain found in tymoviruses. The C-terminus of the 224-kDa ORF also encodes the MRFV capsid protein. A smaller, overlapping ORF (nt 302-1561) encodes a putative protein of 43 kDa with unknown function but with limited sequence similarities to putative movement proteins of tymoviruses. The nucleotide sequence and proposed genome expression strategy of MRFV is most closely related to that of oat blue dwarf virus (OBDV). Unlike OBDV, MRFV RNA does not appear to contain a poly(A) tail, and it encodes a putative second overlapping open reading frame.

  3. Genome diversity and divergence in Drosophila mauritiana: multiple signatures of faster X evolution.

    Science.gov (United States)

    Garrigan, Daniel; Kingan, Sarah B; Geneva, Anthony J; Vedanayagam, Jeffrey P; Presgraves, Daven C

    2014-09-04

    Drosophila mauritiana is an Indian Ocean island endemic species that diverged from its two sister species, Drosophila simulans and Drosophila sechellia, approximately 240,000 years ago. Multiple forms of incomplete reproductive isolation have evolved among these species, including sexual, gametic, ecological, and intrinsic postzygotic barriers, with crosses among all three species conforming to Haldane's rule: F(1) hybrid males are sterile and F(1) hybrid females are fertile. Extensive genetic resources and the fertility of hybrid females have made D. mauritiana, in particular, an important model for speciation genetics. Analyses between D. mauritiana and both of its siblings have shown that the X chromosome makes a disproportionate contribution to hybrid male sterility. But why the X plays a special role in the evolution of hybrid sterility in these, and other, species remains an unsolved problem. To complement functional genetic analyses, we have investigated the population genomics of D. mauritiana, giving special attention to differences between the X and the autosomes. We present a de novo genome assembly of D. mauritiana annotated with RNAseq data and a whole-genome analysis of polymorphism and divergence from ten individuals. Our analyses show that, relative to the autosomes, the X chromosome has reduced nucleotide diversity but elevated nucleotide divergence; an excess of recurrent adaptive evolution at its protein-coding genes; an excess of recent, strong selective sweeps; and a large excess of satellite DNA. Interestingly, one of two centimorgan-scale selective sweeps on the D. mauritiana X chromosome spans a region containing two sex-ratio meiotic drive elements and a high concentration of satellite DNA. Furthermore, genes with roles in reproduction and chromosome biology are enriched among genes that have histories of recurrent adaptive protein evolution. Together, these genome-wide analyses suggest that genetic conflict and frequent positive natural

  4. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    OpenAIRE

    Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V

    2007-01-01

    Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...

  5. On the representability of complete genomes by multiple competing finite-context (Markov models.

    Directory of Open Access Journals (Sweden)

    Armando J Pinho

    Full Text Available A finite-context (Markov model of order k yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth k. Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i multiple competing Markov models of different orders (ii careful programming techniques that allow orders as large as sixteen (iii adequate inverted repeat handling (iv probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range, contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.

  6. Genomic sequences of murine gamma B- and gamma C-crystallin-encoding genes: promoter analysis and complete evolutionary pattern of mouse, rat and human gamma-crystallins.

    Science.gov (United States)

    Graw, J; Liebstein, A; Pietrowski, D; Schmitt-John, T; Werner, T

    1993-12-22

    The murine genes, gamma B-cry and gamma C-cry, encoding the gamma B- and gamma C-crystallins, were isolated from a genomic DNA library. The complete nucleotide (nt) sequences of both genes were determined from 661 and 711 bp, respectively, upstream from the first exon to the corresponding polyadenylation sites, comprising more than 2650 and 2890 bp, respectively. The new sequences were compared to the partial cDNA sequences available for the murine gamma B-cry and gamma C-cry, as well as to the corresponding genomic sequences from rat and man, at both the nt and predicted amino acid (aa) sequence levels. In the gamma B-cry promoter region, a canonical CCAAT-box, a TATA-box, putative NF-I and C/EBP sites were detected. An R-repeat is inserted 366 bp upstream from the transcription start point. In contrast, the gamma C-cry promoter does not contain a CCAAT-box, but some other putative binding sites for transcription factors (AP-2, UBP-1, LBP-1) were located by computer analysis. The promoter regions of all six gamma-cry from mouse, rat and human, except human psi gamma F-cry, were analyzed for common sequence elements. A complex sequence element of about 70-80 bp was found in the proximal promoter, which contains a gamma-cry-specific and almost invariant sequence (crygpel) of 14 nt, and ends with the also invariant TATA-box. Within the complex sequence element, a minimum of three further features specific for the gamma A-, gamma B- and gamma D/E/F-cry genes can be defined, at least two of which were recently shown to be functional. In addition to these four sequence elements, a subtype-specific structure of inverted repeats with different-sized spacers can be deduced from the multiple sequence alignment. A phylogenetic analysis based on the promoter region, as well as the complete exon 3 of all gamma-cry from mouse, rat and man, suggests separation of only five gamma-cry subtypes (gamma A-, gamma B-, gamma C-, gamma D- and gamma E/F-cry) prior to species separation.

  7. Genome evolution in an ancient bacteria-ant symbiosis: parallel gene loss among Blochmannia spanning the origin of the ant tribe Camponotini

    Directory of Open Access Journals (Sweden)

    Laura E. Williams

    2015-04-01

    Full Text Available Stable associations between bacterial endosymbionts and insect hosts provide opportunities to explore genome evolution in the context of established mutualisms and assess the roles of selection and genetic drift across host lineages and habitats. Blochmannia, obligate endosymbionts of ants of the tribe Camponotini, have coevolved with their ant hosts for ∼40 MY. To investigate early events in Blochmannia genome evolution across this ant host tribe, we sequenced Blochmannia from two divergent host lineages, Colobopsis obliquus and Polyrhachis turneri, and compared them with four published genomes from Blochmannia of Camponotus sensu stricto. Reconstructed gene content of the last common ancestor (LCA of these six Blochmannia genomes is reduced (690 protein coding genes, consistent with rapid gene loss soon after establishment of the symbiosis. Differential gene loss among Blochmannia lineages has affected cellular functions and metabolic pathways, including DNA replication and repair, vitamin biosynthesis and membrane proteins. Blochmannia of P. turneri (i.e., B. turneri encodes an intact DnaA chromosomal replication initiation protein, demonstrating that loss of dnaA was not essential for establishment of the symbiosis. Based on gene content, B. obliquus and B. turneri are unable to provision hosts with riboflavin. Of the six sequenced Blochmannia, B. obliquus is the earliest diverging lineage (i.e., the sister group of other Blochmannia sampled and encodes the fewest protein-coding genes and the most pseudogenes. We identified 55 genes involved in parallel gene loss, including glutamine synthetase, which may participate in nitrogen recycling. Pathways for biosynthesis of coenzyme A, terpenoids and riboflavin were lost in multiple lineages, suggesting relaxed selection on the pathway after inactivation of one component. Analysis of Illumina read datasets did not detect evidence of plasmids encoding missing functions, nor the presence of

  8. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  9. Preparation of genomic DNA from a single species of uncultured magnetotactic bacterium by multiple-displacement amplification.

    Science.gov (United States)

    Arakaki, Atsushi; Shibusawa, Mie; Hosokawa, Masahito; Matsunaga, Tadashi

    2010-03-01

    Magnetotactic bacteria comprise a phylogenetically diverse group that is capable of synthesizing intracellular magnetic particles. Although various morphotypes of magnetotactic bacteria have been observed in the environment, bacterial strains available in pure culture are currently limited to a few genera due to difficulties in their enrichment and cultivation. In order to obtain genetic information from uncultured magnetotactic bacteria, a genome preparation method that involves magnetic separation of cells, flow cytometry, and multiple displacement amplification (MDA) using phi29 polymerase was used in this study. The conditions for the MDA reaction using samples containing 1 to 100 cells were evaluated using a pure-culture magnetotactic bacterium, "Magnetospirillum magneticum AMB-1," whose complete genome sequence is available. Uniform gene amplification was confirmed by quantitative PCR (Q-PCR) when 100 cells were used as a template. This method was then applied for genome preparation of uncultured magnetotactic bacteria from complex bacterial communities in an aquatic environment. A sample containing 100 cells of the uncultured magnetotactic coccus was prepared by magnetic cell separation and flow cytometry and used as an MDA template. 16S rRNA sequence analysis of the MDA product from these 100 cells revealed that the amplified genomic DNA was from a single species of magnetotactic bacterium that was phylogenetically affiliated with magnetotactic cocci in the Alphaproteobacteria. The combined use of magnetic separation, flow cytometry, and MDA provides a new strategy to access individual genetic information from magnetotactic bacteria in environmental samples.

  10. Complete mitochondrial genome of threatened mahseer Tor tor ...

    Indian Academy of Sciences (India)

    A.

    In the present study, complete mitochondrial genome of Tor tor has been sequenced .... Most of the genes were encoded on the heavy strand (H- strand), whereas only .... 4 bp in the DHU stem (figure 5 in electronic supplementary material).

  11. The FUN of identifying gene function in bacterial pathogens; insights from Salmonella functional genomics.

    Science.gov (United States)

    Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D

    2013-10-01

    The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina.

    Directory of Open Access Journals (Sweden)

    Volker U Schwartze

    2014-08-01

    Full Text Available Lichtheimia species are the second most important cause of mucormycosis in Europe. To provide broader insights into the molecular basis of the pathogenicity-associated traits of the basal Mucorales, we report the full genome sequence of L. corymbifera and compared it to the genome of Rhizopus oryzae, the most common cause of mucormycosis worldwide. The genome assembly encompasses 33.6 MB and 12,379 protein-coding genes. This study reveals four major differences of the L. corymbifera genome to R. oryzae: (i the presence of an highly elevated number of gene duplications which are unlike R. oryzae not due to whole genome duplication (WGD, (ii despite the relatively high incidence of introns, alternative splicing (AS is not frequently observed for the generation of paralogs and in response to stress, (iii the content of repetitive elements is strikingly low (<5%, (iv L. corymbifera is typically haploid. Novel virulence factors were identified which may be involved in the regulation of the adaptation to iron-limitation, e.g. LCor01340.1 encoding a putative siderophore transporter and LCor00410.1 involved in the siderophore metabolism. Genes encoding the transcription factors LCor08192.1 and LCor01236.1, which are similar to GATA type regulators and to calcineurin regulated CRZ1, respectively, indicating an involvement of the calcineurin pathway in the adaption to iron limitation. Genes encoding MADS-box transcription factors are elevated up to 11 copies compared to the 1-4 copies usually found in other fungi. More findings are: (i lower content of tRNAs, but unique codons in L. corymbifera, (ii Over 25% of the proteins are apparently specific for L. corymbifera. (iii L. corymbifera contains only 2/3 of the proteases (known to be essential virulence factors in comparison to R. oryzae. On the other hand, the number of secreted proteases, however, is roughly twice as high as in R. oryzae.

  13. Morphology and genome organization of the virus PSV of the hyperthermophilic archaeal genera Pyrobaculum and Thermoproteus: a novel virus family, the Globuloviridae.

    Science.gov (United States)

    Häring, Monika; Peng, Xu; Brügger, Kim; Rachel, Reinhard; Stetter, Karl O; Garrett, Roger A; Prangishvili, David

    2004-06-01

    A novel virus, termed Pyrobaculum spherical virus (PSV), is described that infects anaerobic hyperthermophilic archaea of the genera Pyrobaculum and Thermoproteus. Spherical enveloped virions, about 100 nm in diameter, contain a major multimeric 33-kDa protein and host-derived lipids. A viral envelope encases a superhelical nucleoprotein core containing linear double-stranded DNA. The PSV infection cycle does not cause lysis of host cells. The viral genome was sequenced and contains 28337 bp. The genome is unique for known archaeal viruses in that none of the genes, including that encoding the major structural protein, show any significant sequence matches to genes in public sequence databases. Exceptionally for an archaeal double-stranded DNA virus, almost all the recognizable genes are located on one DNA strand. The ends of the genome consist of 190-bp inverted repeats that contain multiple copies of short direct repeats. The two DNA strands are probably covalently linked at their termini. On the basis of the unusual morphological and genomic properties of this DNA virus, we propose to assign PSV to a new viral family, the Globuloviridae.

  14. The Genome of the Western Clawed Frog Xenopus tropicalis

    Energy Technology Data Exchange (ETDEWEB)

    Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.; Hendrix, David; Jurka, Jerzy; Kapitonov, Vladimir; Ovcharenko, Ivan; Putnam, Nicholas H.; Shu, Shengqiang; Taher, Leila; Blitz, Ira L.; Blumberg, Bruce; Dichmann, Darwin S.; Dubchak, Inna; Amaya, Enrique; Detter, John C.; Fletcher, Russell; Gerhard, Daniela S.; Goodstein, David; Graves, Tina; Grigoriev, Igor V.; Grimwood, Jane; Kawashima, Takeshi; Lindquist, Erika; Lucas, Susan M.; Mead, Paul E.; Mitros, Therese; Ogino, Hajime; Ohta, Yuko; Poliakov, Alexander V.; Pollet, Nicolas; Robert, Jacques; Salamov, Asaf; Sater, Amy K.; Schmutz, Jeremy; Terry, Astrid; Vize, Peter D.; Warren, Wesley C.; Wells, Dan; Wills, Andrea; Wilson, Richard K.; Zimmerman, Lyle B.; Zorn, Aaron M.; Grainger, Robert; Grammer, Timothy; Khokha, Mustafa K.; Richardson, Paul M.; Rokhsar, Daniel S.

    2009-10-01

    The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with human and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.

  15. Security enhanced BioEncoding for protecting iris codes

    Science.gov (United States)

    Ouda, Osama; Tsumura, Norimichi; Nakaguchi, Toshiya

    2011-06-01

    Improving the security of biometric template protection techniques is a key prerequisite for the widespread deployment of biometric technologies. BioEncoding is a recently proposed template protection scheme, based on the concept of cancelable biometrics, for protecting biometric templates represented as binary strings such as iris codes. The main advantage of BioEncoding over other template protection schemes is that it does not require user-specific keys and/or tokens during verification. Besides, it satisfies all the requirements of the cancelable biometrics construct without deteriorating the matching accuracy. However, although it has been shown that BioEncoding is secure enough against simple brute-force search attacks, the security of BioEncoded templates against more smart attacks, such as record multiplicity attacks, has not been sufficiently investigated. In this paper, a rigorous security analysis of BioEncoding is presented. Firstly, resistance of BioEncoded templates against brute-force attacks is revisited thoroughly. Secondly, we show that although the cancelable transformation employed in BioEncoding might be non-invertible for a single protected template, the original iris code could be inverted by correlating several templates used in different applications but created from the same iris. Accordingly, we propose an important modification to the BioEncoding transformation process in order to hinder attackers from exploiting this type of attacks. The effectiveness of adopting the suggested modification is validated and its impact on the matching accuracy is investigated empirically using CASIA-IrisV3-Interval dataset. Experimental results confirm the efficacy of the proposed approach and show that it preserves the matching accuracy of the unprotected iris recognition system.

  16. Escherichia coli yjjPB genes encode a succinate transporter important for succinate production.

    Science.gov (United States)

    Fukui, Keita; Nanatani, Kei; Hara, Yoshihiko; Yamakami, Suguru; Yahagi, Daiki; Chinen, Akito; Tokura, Mitsunori; Abe, Keietsu

    2017-09-01

    Under anaerobic conditions, Escherichia coli produces succinate from glucose via the reductive tricarboxylic acid cycle. To date, however, no genes encoding succinate exporters have been established in E. coli. Therefore, we attempted to identify genes encoding succinate exporters by screening an E. coli MG1655 genome library. We identified the yjjPB genes as candidates encoding a succinate transporter, which enhanced succinate production in Pantoea ananatis under aerobic conditions. A complementation assay conducted in Corynebacterium glutamicum strain AJ110655ΔsucE1 demonstrated that both YjjP and YjjB are required for the restoration of succinate production. Furthermore, deletion of yjjPB decreased succinate production in E. coli by 70% under anaerobic conditions. Taken together, these results suggest that YjjPB constitutes a succinate transporter in E. coli and that the products of both genes are required for succinate export.

  17. Transposon domestication versus mutualism in ciliate genome rearrangements.

    Directory of Open Access Journals (Sweden)

    Alexander Vogt

    Full Text Available Ciliated protists rearrange their genomes dramatically during nuclear development via chromosome fragmentation and DNA deletion to produce a trimmer and highly reorganized somatic genome. The deleted portion of the genome includes potentially active transposons or transposon-like sequences that reside in the germline. Three independent studies recently showed that transposase proteins of the DDE/DDD superfamily are indispensible for DNA processing in three distantly related ciliates. In the spirotrich Oxytricha trifallax, high copy-number germline-limited transposons mediate their own excision from the somatic genome but also contribute to programmed genome rearrangement through a remarkable transposon mutualism with the host. By contrast, the genomes of two oligohymenophorean ciliates, Tetrahymena thermophila and Paramecium tetraurelia, encode homologous PiggyBac-like transposases as single-copy genes in both their germline and somatic genomes. These domesticated transposases are essential for deletion of thousands of different internal sequences in these species. This review contrasts the events underlying somatic genome reduction in three different ciliates and considers their evolutionary origins and the relationships among their distinct mechanisms for genome remodeling.

  18. mpscan: Fast Localisation of Multiple Reads in Genomes

    Science.gov (United States)

    Rivals, Eric; Salmela, Leena; Kiiskinen, Petteri; Kalsi, Petri; Tarhio, Jorma

    With Next Generation Sequencers, sequence based transcriptomic or epigenomic assays yield millions of short sequence reads that need to be mapped back on a reference genome. The upcoming versions of these sequencers promise even higher sequencing capacities; this may turn the read mapping task into a bottleneck for which alternative pattern matching approaches must be experimented. We present an algorithm and its implementation, called mpscan, which uses a sophisticated filtration scheme to match a set of patterns/reads exactly on a sequence. mpscan can search for millions of reads in a single pass through the genome without indexing its sequence. Moreover, we show that mpscan offers an optimal average time complexity, which is sublinear in the text length, meaning that it does not need to examine all sequence positions. Comparisons with BLAT-like tools and with six specialised read mapping programs (like bowtie or zoom) demonstrate that mpscan also is the fastest algorithm in practice for exact matching. Our accuracy and scalability comparisons reveal that some tools are inappropriate for read mapping. Moreover, we provide evidence suggesting that exact matching may be a valuable solution in some read mapping applications. As most read mapping programs somehow rely on exact matching procedures to perform approximate pattern mapping, the filtration scheme we experimented may reveal useful in the design of future algorithms. The absence of genome index gives mpscan its low memory requirement and flexibility that let it run on a desktop computer and avoids a time-consuming genome preprocessing.

  19. Complete genome sequence of thermophilic Bacillus smithii type strain DSM 4216T

    DEFF Research Database (Denmark)

    Bosma, Elleke Fenna; Koehorst, Jasper J.; van Hijum, Sacha A. F. T.

    2016-01-01

    determined the complete genomic sequence of the B. smithii type strain DSM 4216T, which consists of a 3,368,778 bp chromosome (GenBank accession number CP012024.1) and a 12,514 bp plasmid (GenBank accession number CP012025.1), together encoding 3880 genes. Genome annotation via RAST was complemented...

  20. One Year Genome Evolution of Lausannevirus in Allopatric versus Sympatric Conditions.

    Science.gov (United States)

    Mueller, Linda; Bertelli, Claire; Pillonel, Trestan; Salamin, Nicolas; Greub, Gilbert

    2017-06-01

    Amoeba-resisting microorganisms raised a great interest during the last decade. Among them, some large DNA viruses present huge genomes up to 2.5 Mb long, exceeding the size of small bacterial genomes. The rate of genome evolution in terms of mutation, deletion, and gene acquisition in these genomes is yet unknown. Given the suspected high plasticity of viral genomes, the microevolution of the 346 kb genome of Lausannevirus, a member of Megavirales, was studied. Hence, Lausannevirus was co-cultured within the amoeba Acanthamoeba castellanii over one year. Despite a low number of mutations, the virus showed a genome reduction of 3.7% after 12 months. Lausannevirus genome evolution in sympatric conditions was investigated by its co-culture with Estrella lausannensis, an obligate intracellular bacterium, in the amoeba A. castellanii during one year. Cultures were split every 3 months. Genome sequencing revealed that in these conditions both, Lausannevirus and E. lausannensis, show stable genome, presenting no major rearrangement. In fact, after one year they acquired from 2 to 7 and from 4 to 10 mutations per culture for Lausannevirus and E. lausannensis, respectively. Interestingly, different mutations in the endonuclease encoding genes of Lausannevirus were observed in different subcultures, highlighting the importance of this gene product in the replication of Lausannevirus. Conversely, mutations in E. lausannensis were mainly located in a gene encoding for a phosphoenolpyruvate-protein phosphotransferase (PtsI), implicated in sugar metabolism. Moreover, in our conditions and with our analyses we detected no horizontal gene transfer during one year of co-culture. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  1. Revisiting the Phylogeny of the Animal Formins: Two New Subtypes, Relationships with Multiple Wing Hairs Proteins, and a Lost Human Formin.

    Science.gov (United States)

    Pruyne, David

    2016-01-01

    Formins are a widespread family of eukaryotic cytoskeleton-organizing proteins. Many species encode multiple formin isoforms, and for animals, much of this reflects the presence of multiple conserved subtypes. Earlier phylogenetic analyses identified seven major formin subtypes in animals (DAAM, DIAPH, FHOD, FMN, FMNL, INF, and GRID2IP/delphilin), but left a handful of formins, particularly from nematodes, unassigned. In this new analysis drawing from genomic data from a wider range of taxa, nine formin subtypes are identified that encompass all the animal formins analyzed here. Included in this analysis are Multiple Wing Hairs proteins (MWH), which bear homology to formin N-terminal domains. Originally identified in Drosophila melanogaster and other arthropods, MWH-related proteins are also identified here in some nematodes (including Caenorhabditis elegans), and are shown to be related to a novel MWH-related formin (MWHF) subtype. One surprising result of this work is the discovery that a family of pleckstrin homology domain-containing formins (PHCFs) is represented in many vertebrates, but is strikingly absent from placental mammals. Consistent with a relatively recent loss of this formin, the human genome retains fragments of a defunct homologous formin gene.

  2. High quality draft genome sequence of the moderately halophilic bacterium Pontibacillus yanchengensis Y32(T) and comparison among Pontibacillus genomes.

    Science.gov (United States)

    Huang, Jing; Qiao, Zi Xu; Tang, Jing Wei; Wang, Gejiao

    2015-01-01

    Pontibacillus yanchengensis Y32(T) is an aerobic, motile, Gram-positive, endospore-forming, and moderately halophilic bacterium isolated from a salt field. In this study, we describe the features of P. yanchengensis strain Y32(T) together with a comparison with other four Pontibacillus genomes. The 4,281,464 bp high-quality-draft genome of strain Y32(T) is arranged into 153 contigs containing 3,965 protein-coding genes and 77 RNA encoding genes. The genome of strain Y32(T) possesses many genes related to its halophilic character, flagellar assembly and chemotaxis to support its survival in a salt-rich environment.

  3. A survey of innovation through duplication in the reduced genomes of twelve parasites.

    Directory of Open Access Journals (Sweden)

    Jeremy D DeBarry

    Full Text Available We characterize the prevalence, distribution, divergence, and putative functions of detectable two-copy paralogs and segmental duplications in the Apicomplexa, a phylum of parasitic protists. Apicomplexans are mostly obligate intracellular parasites responsible for human and animal diseases (e.g. malaria and toxoplasmosis. Gene loss is a major force in the phylum. Genomes are small and protein-encoding gene repertoires are reduced. Despite this genomic streamlining, duplications and gene family amplifications are present. The potential for innovation introduced by duplications is of particular interest. We compared genomes of twelve apicomplexans across four lineages and used orthology and genome cartography to map distributions of duplications against genome architectures. Segmental duplications appear limited to five species. Where present, they correspond to regions enriched for multi-copy and species-specific genes, pointing toward roles in adaptation and innovation. We found a phylum-wide association of duplications with dynamic chromosome regions and syntenic breakpoints. Trends in the distribution of duplicated genes indicate that recent, species-specific duplicates are often tandem while most others have been dispersed by genome rearrangements. These trends show a relationship between genome architecture and gene duplication. Functional analysis reveals: proteases, which are vital to a parasitic lifecycle, to be prominent in putative recent duplications; a pair of paralogous genes in Toxoplasma gondii previously shown to produce the rate-limiting step in dopamine synthesis in mammalian cells, a possible link to the modification of host behavior; and phylum-wide differences in expression and subcellular localization, indicative of modes of divergence. We have uncovered trends in multiple modes of duplicate divergence including sequence, intron content, expression, subcellular localization, and functions of putative recent duplicates that

  4. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    Full Text Available Abstract Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1 the metabolically versatile Haloarcula marismortui; (2 the non-pigmented Natrialba asiatica; (3 the psychrophile Halorubrum lacusprofundi and (4 the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI for their predicted proteins. Multiple insertion sequence (IS elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP and transcription factor IIB (TFB homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1 high GC content and (2 low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the

  5. Short and long-term genome stability analysis of prokaryotic genomes.

    Science.gov (United States)

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were

  6. Type II heat-labile enterotoxins from 50 diverse Escherichia coli isolates belong almost exclusively to the LT-IIc family and may be prophage encoded.

    Directory of Open Access Journals (Sweden)

    Michael G Jobling

    Full Text Available Some enterotoxigenic Escherichia coli (ETEC produce a type II heat-labile enterotoxin (LT-II that activates adenylate cyclase in susceptible cells but is not neutralized by antisera against cholera toxin or type I heat-labile enterotoxin (LT-I. LT-I variants encoded by plasmids in ETEC from humans and pigs have amino acid sequences that are ≥ 95% identical. In contrast, LT-II toxins are chromosomally encoded and are much more diverse. Early studies characterized LT-IIa and LT-IIb variants, but a novel LT-IIc was reported recently. Here we characterized the LT-II encoding loci from 48 additional ETEC isolates. Two encoded LT-IIa, none encoded LT-IIb, and 46 encoded highly related variants of LT-IIc. Phylogenetic analysis indicated that the predicted LT-IIc toxins encoded by these loci could be assigned to 6 subgroups. The loci corresponding to individual toxins within each subgroup had DNA sequences that were more than 99% identical. The LT-IIc subgroups appear to have arisen by multiple recombinational events between progenitor loci encoding LT-IIc1- and LT-IIc3-like variants. All loci from representative isolates encoding the LT-IIa, LT-IIb, and each subgroup of LT-IIc enterotoxins are preceded by highly-related genes that are between 80 and 93% identical to predicted phage lysozyme genes. DNA sequences immediately following the B genes differ considerably between toxin subgroups, but all are most closely related to genomic sequences found in predicted prophages. Together these data suggest that the LT-II loci are inserted into lambdoid type prophages that may or may not be infectious. These findings raise the possibility that production of LT-II enterotoxins by ETEC may be determined by phage conversion and may be activated by induction of prophage, in a manner similar to control of production of Shiga-like toxins by converting phages in isolates of enterohemmorhagic E. coli.

  7. Human coronavirus 229E encodes a single ORF4 protein between the spike and the envelope genes

    Directory of Open Access Journals (Sweden)

    Berkhout Ben

    2006-12-01

    Full Text Available Abstract Background The genome of coronaviruses contains structural and non-structural genes, including several so-called accessory genes. All group 1b coronaviruses encode a single accessory protein between the spike and envelope genes, except for human coronavirus (HCoV 229E. The prototype virus has a split gene, encoding the putative ORF4a and ORF4b proteins. To determine whether primary HCoV-229E isolates exhibit this unusual genome organization, we analyzed the ORF4a/b region of five current clinical isolates from The Netherlands and three early isolates collected at the Common Cold Unit (CCU in Salisbury, UK. Results All Dutch isolates were identical in the ORF4a/b region at amino acid level. All CCU isolates are only 98% identical to the Dutch isolates at the nucleotide level, but more closely related to the prototype HCoV-229E (>98%. Remarkably, our analyses revealed that the laboratory adapted, prototype HCoV-229E has a 2-nucleotide deletion in the ORF4a/b region, whereas all clinical isolates carry a single ORF, 660 nt in size, encoding a single protein of 219 amino acids, which is a homologue of the ORF3 proteins encoded by HCoV-NL63 and PEDV. Conclusion Thus, the genome organization of the group 1b coronaviruses HCoV-NL63, PEDV and HCoV-229E is identical. It is possible that extensive culturing of the HCoV-229E laboratory strain resulted in truncation of ORF4. This may indicate that the protein is not essential in cell culture, but the highly conserved amino acid sequence of the ORF4 protein among clinical isolates suggests that the protein plays an important role in vivo.

  8. The Human Genome Initiative of the Department of Energy

    Science.gov (United States)

    1988-01-01

    The structural characterization of genes and elucidation of their encoded functions have become a cornerstone of modern health research, biology and biotechnology. A genome program is an organized effort to locate and identify the functions of all the genes of an organism. Beginning with the DOE-sponsored, 1986 human genome workshop at Santa Fe, the value of broadly organized efforts supporting total genome characterization became a subject of intensive study. There is now national recognition that benefits will rapidly accrue from an effective scientific infrastructure for total genome research. In the US genome research is now receiving dedicated funds. Several other nations are implementing genome programs. Supportive infrastructure is being improved through both national and international cooperation. The Human Genome Initiative of the Department of Energy (DOE) is a focused program of Resource and Technology Development, with objectives of speeding and bringing economies to the national human genome effort. This report relates the origins and progress of the Initiative.

  9. The genome of the social amoeba Dictyostelium discoideum

    DEFF Research Database (Denmark)

    Eichinger, L; Pachebat, J A; Glöckner, G

    2005-01-01

    The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins,...

  10. The Trichoplax Genome and the Nature of Placozoans

    Energy Technology Data Exchange (ETDEWEB)

    Srivastava, Mansi; Begovic, Emina; Chapman, Jarrod; Putnam, Nicholas H.; Hellsten, Uffe; Kawashima, Takeshi; Kuo, Alan; Mitros, Therese; Salamov, Asaf; Carpenter, Meredith L.; Signorovitch, Ana Y.; Moreno, Maria A.; Kamm, Kai; Grimwood, Jane; Schmutz, Jeremy; Shapiro, Harris; Grigoriev, Igor V.; Buss, Leo W.; Schierwater, Bernd; Dellaporta, Stephen L.; Rokhsar, Daniel S.

    2008-08-01

    Placozoans are arguably the simplest free-living animals, possibly evoking an early stage in metazoan evolution, yet their biology is poorly understood. Here we report the sequencing and analysis of the {approx}98 million base pair nuclear genome of the placozoan Trichoplax adhaerens. Whole genome phylogenetic analysis suggests that placozoans belong to a 'eumetazoan' clade that includes cnidarians and bilaterians, with sponges as the earliest diverging animals. The compact genome exhibits conserved gene content, gene structure, and synteny relative to the human and other complex eumetazoan genomes. Despite the apparent cellular and organismal simplicity of Trichoplax, its genome encodes a rich array of transcription factor and signaling pathway genes that are typically associated with diverse cell types and developmental processes in eumetazoans, motivating further searches for cryptic cellular complexity and/or as yet unobserved life history stages.

  11. Primary structure of the human follistatin precursor and its genomic organization

    International Nuclear Information System (INIS)

    Shimasaki, Shunichi; Koga, Makoto; Esch, F.

    1988-01-01

    Follistatin is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release. By use of the recently characterized porcine follistatin cDNA as a probe to screen a human testis cDNA library and a genomic library, the structure of the complete human follistatin precursor as well as its genomic organization have been determined. Three of eight cDNA clones that were sequenced predicted a precursor with 344 amino acids, whereas the remaining five cDNA clones encoded a 317 amino acid precursor, resulting from alternative splicing of the precursor mRNA. Mature follistatins contain four contiguous domains that are encoded by precisely separated exons; three of the domains are highly similar to each other, as well as to human epidermal growth factor and human pancreatic secretory trypsin inhibitor. The genomic organization of the human follistatin is similar to that of the human epidermal growth factor gene and thus supports the notion of exon shuffling during evolution

  12. Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

    Science.gov (United States)

    Diao, Wei-Ping; Snyder, John C; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

    2016-01-01

    The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper.

  13. Harnessing Omics Big Data in Nine Vertebrate Species by Genome-Wide Prioritization of Sequence Variants with the Highest Predicted Deleterious Effect on Protein Function.

    Science.gov (United States)

    Rozman, Vita; Kunej, Tanja

    2018-05-10

    Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).

  14. The evolution of genome mining in microbes – a review

    DEFF Research Database (Denmark)

    Ziemert, Nadine; Alanjary, Mohammad; Weber, Tilmann

    2016-01-01

    Covering: 2006 to 2016. The computational mining of genomes has become an important part in the discovery of novel natural products as drug leads. Thousands of bacterial genome sequences are publically available these days containing an even larger number and diversity of secondary metabolite gene...... clusters that await linkage to their encoded natural products. With the development of high-throughput sequencing methods and the wealth of DNA data available, a variety of genome mining methods and tools have been developed to guide discovery and characterisation of these compounds. This article reviews...

  15. Genomic comparison of virulent and non-virulent Streptococcus agalactiae in fish.

    Science.gov (United States)

    Delannoy, C M J; Zadoks, R N; Crumlish, M; Rodgers, D; Lainson, F A; Ferguson, H W; Turnbull, J; Fontaine, M C

    2016-01-01

    Streptococcus agalactiae infections in fish are predominantly caused by beta-haemolytic strains of clonal complex (CC) 7, notably its namesake sequence type (ST) 7, or by non-haemolytic strains of CC552, including the globally distributed ST260. In contrast, CC23, including its namesake ST23, has been associated with a wide homeothermic and poikilothermic host range, but never with fish. The aim of this study was to determine whether ST23 is virulent in fish and to identify genomic markers of fish adaptation of S. agalactiae. Intraperitoneal challenge of Nile tilapia, Oreochromis niloticus (Linnaeus), showed that ST260 is lethal at doses down to 10(2) cfu per fish, whereas ST23 does not cause disease at 10(7) cfu per fish. Comparison of the genome sequence of ST260 and ST23 with those of strains derived from fish, cattle and humans revealed the presence of genomic elements that are unique to subpopulations of S. agalactiae that have the ability to infect fish (CC7 and CC552). These loci occurred in clusters exhibiting typical signatures of mobile genetic elements. PCR-based screening of a collection of isolates from multiple host species confirmed the association of selected genes with fish-derived strains. Several fish-associated genes encode proteins that potentially provide fitness in the aquatic environment. © 2014 John Wiley & Sons Ltd.

  16. Genome sequence of the button mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche

    Energy Technology Data Exchange (ETDEWEB)

    Morin, Emmanuelle; Kohler, Annegret; Baker, Adam R.; Foulongne-Oriol, Marie; Lombard, Vincent; Nagy, Laszlo G.; Ohm, Robin A.; Patyshakuliyeva, Aleksandrina; Brun, Annick; Aerts, Andrea L.; Bailey, Andrew M.; Billette, Christophe; Coutinho, Pedro M.; Deakin, Greg; Doddapaneni, Harshavardhan; Floudas, Dimitrios; Grimwood, Jane; Hilden, Kristiina; Kues, Ursula; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lucas, Susan M.; Murat, Claude; Riley, Robert W.; Salamov, Asaf A.; Schmutz, Jeremy; Subramanian, Venkataramanan; Wosten, Han A. B.; Xu, Jianping; Eastwood, Daniel C.; Foster, Gary D.; Sonnenberg, Anton S. M.; Cullen, Dan; de Vries, Ronald P.; Lundell, Taina; Hibbett, David S.; Henrissat, Bernard; Burton, Kerry S.; Kerrigan, Richard W.; Challen, Michael P.; Grigoriev, Igor V.; Martin, Francis

    2012-04-27

    Agaricus bisporus is the model fungus for the adaptation, persistence, and growth in the humic-rich leaf-litter environment. Aside from its ecological role, A. bisporus has been an important component of the human diet for over 200 y and worldwide cultivation of the button mushroom forms a multibillion dollar industry. We present two A. bisporus genomes, their gene repertoires and transcript profiles on compost and during mushroom formation. The genomes encode a full repertoire of polysaccharide-degrading enzymes similar to that of wood-decayers. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose, pectin, and protein degradation are more highly expressed in compost. The striking expansion of heme-thiolate peroxidases and etherases is distinctive from Agaricomycotina wood-decayers and suggests a broad attack on decaying lignin and related metabolites found in humic acid-rich environment. Similarly, up-regulation of these genes together with a lignolytic manganese peroxidase, multiple copper radical oxidases, and cytochrome P450s is consistent with challenges posed by complex humic-rich substrates. The gene repertoire and expression of hydrolytic enzymes in A. bisporus is substantially different from the taxonomically related ectomycorrhizal symbiont Laccaria bicolor. A common promoter motif was also identified in genes very highly expressed in humic-rich substrates. These observations reveal genetic and enzymatic mechanisms governing adaptation to the humic-rich ecological niche formed during plant degradation, further defining the critical role such fungi contribute to soil structure and carbon sequestration in terrestrial ecosystems. Genome sequence will expedite mushroom breeding for improved agronomic characteristics.

  17. Genome sequence of the button mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche.

    Science.gov (United States)

    Morin, Emmanuelle; Kohler, Annegret; Baker, Adam R; Foulongne-Oriol, Marie; Lombard, Vincent; Nagy, Laszlo G; Ohm, Robin A; Patyshakuliyeva, Aleksandrina; Brun, Annick; Aerts, Andrea L; Bailey, Andrew M; Billette, Christophe; Coutinho, Pedro M; Deakin, Greg; Doddapaneni, Harshavardhan; Floudas, Dimitrios; Grimwood, Jane; Hildén, Kristiina; Kües, Ursula; Labutti, Kurt M; Lapidus, Alla; Lindquist, Erika A; Lucas, Susan M; Murat, Claude; Riley, Robert W; Salamov, Asaf A; Schmutz, Jeremy; Subramanian, Venkataramanan; Wösten, Han A B; Xu, Jianping; Eastwood, Daniel C; Foster, Gary D; Sonnenberg, Anton S M; Cullen, Dan; de Vries, Ronald P; Lundell, Taina; Hibbett, David S; Henrissat, Bernard; Burton, Kerry S; Kerrigan, Richard W; Challen, Michael P; Grigoriev, Igor V; Martin, Francis

    2012-10-23

    Agaricus bisporus is the model fungus for the adaptation, persistence, and growth in the humic-rich leaf-litter environment. Aside from its ecological role, A. bisporus has been an important component of the human diet for over 200 y and worldwide cultivation of the "button mushroom" forms a multibillion dollar industry. We present two A. bisporus genomes, their gene repertoires and transcript profiles on compost and during mushroom formation. The genomes encode a full repertoire of polysaccharide-degrading enzymes similar to that of wood-decayers. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose, pectin, and protein degradation are more highly expressed in compost. The striking expansion of heme-thiolate peroxidases and β-etherases is distinctive from Agaricomycotina wood-decayers and suggests a broad attack on decaying lignin and related metabolites found in humic acid-rich environment. Similarly, up-regulation of these genes together with a lignolytic manganese peroxidase, multiple copper radical oxidases, and cytochrome P450s is consistent with challenges posed by complex humic-rich substrates. The gene repertoire and expression of hydrolytic enzymes in A. bisporus is substantially different from the taxonomically related ectomycorrhizal symbiont Laccaria bicolor. A common promoter motif was also identified in genes very highly expressed in humic-rich substrates. These observations reveal genetic and enzymatic mechanisms governing adaptation to the humic-rich ecological niche formed during plant degradation, further defining the critical role such fungi contribute to soil structure and carbon sequestration in terrestrial ecosystems. Genome sequence will expedite mushroom breeding for improved agronomic characteristics.

  18. Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Jolly Emmitt R

    2005-11-01

    Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.

  19. The genome of Paenibacillus sabinae T27 provides insight into evolution, organization and functional elucidation of nif and nif-like genes

    OpenAIRE

    Li, Xinxin; Deng, Zhiping; Liu, Zhanzhi; Yan, Yongliang; Wang, Tianshu; Xie, Jianbo; Lin, Min; Cheng, Qi; Chen, Sanfeng

    2014-01-01

    Background Most biological nitrogen fixation is catalyzed by the molybdenum nitrogenase. This enzyme is a complex which contains the MoFe protein encoded by nifDK and the Fe protein encoded by nifH. In addition to nifHDK, nifHDK-like genes were found in some Archaea and Firmicutes, but their function is unclear. Results We sequenced the genome of Paenibacillus sabinae T27. A total of 4,793 open reading frames were predicted from its 5.27 Mb genome. The genome of P. sabinae T27 contains fiftee...

  20. Genomic Evolution of the Ascomycete Yeasts

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Haridas, Sajeet; Salamov, Asaf; Boundy-Mills, Kyria; Goker, Markus; Hittinger, Chris; Klenk, Hans-Peter; Lopes, Mariana; Meir-Kolthoff, Jan P.; Rokas, Antonis; Rosa, Carlos; Scheuner, Carmen; Soares, Marco; Stielow, Benjamin; Wisecaver, Jennifer H.; Wolfe, Ken; Blackwell, Meredith; Kurtzman, Cletus; Grigoriev, Igor; Jeffries, Thomas

    2015-03-16

    Yeasts are important for industrial and biotechnological processes and show remarkable metabolic and phylogenetic diversity despite morphological similarities. We have sequenced the genomes of 16 ascomycete yeasts of taxonomic and industrial importance including members of Saccharomycotina and Taphrinomycotina. Phylogenetic analysis of these and previously published yeast genomes helped resolve the placement of species including Saitoella complicata, Babjeviella inositovora, Hyphopichia burtonii, and Metschnikowia bicuspidata. Moreover, we find that alternative nuclear codon usage, where CUG encodes serine instead of leucine, are monophyletic within the Saccharomycotina. Most of the yeasts have compact genomes with a large fraction of single exon genes, and a tendency towards more introns in early-diverging species. Analysis of enzyme phylogeny gives insights into the evolution of metabolic capabilities such as methanol utilization and assimilation of alternative carbon sources.

  1. GWIS: Genome-Wide Inferred Statistics for Functions of Multiple Phenotypes

    NARCIS (Netherlands)

    Nieuwboer, H.A.; Pool, R.; Dolan, C.V.; Boomsma, D.I.; Nivard, M.G.

    2016-01-01

    Here we present a method of genome-wide inferred study (GWIS) that provides an approximation of genome-wide association study (GWAS) summary statistics for a variable that is a function of phenotypes for which GWAS summary statistics, phenotypic means, and covariances are available. A GWIS can be

  2. The Arabidopsis thaliana homolog of the helicase RTEL1 plays multiple roles in preserving genome stability.

    Science.gov (United States)

    Recker, Julia; Knoll, Alexander; Puchta, Holger

    2014-12-01

    In humans, mutations in the DNA helicase Regulator of Telomere Elongation Helicase1 (RTEL1) lead to Hoyeraal-Hreidarsson syndrome, a severe, multisystem disorder. Here, we demonstrate that the RTEL1 homolog in Arabidopsis thaliana plays multiple roles in preserving genome stability. RTEL1 suppresses homologous recombination in a pathway parallel to that of the DNA translocase FANCM. Cytological analyses of root meristems indicate that RTEL1 is involved in processing DNA replication intermediates independently from FANCM and the nuclease MUS81. Moreover, RTEL1 is involved in interstrand and intrastrand DNA cross-link repair independently from FANCM and (in intrastrand cross-link repair) parallel to MUS81. RTEL1 contributes to telomere homeostasis; the concurrent loss of RTEL1 and the telomerase TERT leads to rapid, severe telomere shortening, which occurs much more rapidly than it does in the single-mutant line tert, resulting in developmental arrest after four generations. The double mutant rtel1-1 recq4A-4 exhibits massive growth defects, indicating that this RecQ family helicase, which is also involved in the suppression of homologous recombination and the repair of DNA lesions, can partially replace RTEL1 in the processing of DNA intermediates. The requirement for RTEL1 in multiple pathways to preserve genome stability in plants can be explained by its putative role in the destabilization of DNA loop structures, such as D-loops and T-loops. © 2014 American Society of Plant Biologists. All rights reserved.

  3. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  4. Polymorphisms in genes encoding leptin, ghrelin and their receptors in German multiple sclerosis patients.

    Science.gov (United States)

    Rey, Linda K; Wieczorek, Stefan; Akkad, Denis A; Linker, Ralf A; Chan, Andrew; Hoffjan, Sabine

    2011-01-01

    Multiple sclerosis (MS) is a neuro-inflammatory, autoimmune disease influenced by environmental and polygenic components. There is growing evidence that the peptide hormone leptin, known to regulate energy homeostasis, as well as its antagonist ghrelin play an important role in inflammatory processes in autoimmune diseases, including MS. Recently, single nucleotide polymorphisms (SNPs) in the genes encoding leptin, ghrelin and their receptors were evaluated, amongst others, in Wegener's granulomatosis and Churg-Strauss syndrome. The Lys656Asn SNP in the LEPR gene showed a significant but contrasting association with these vasculitides. We therefore aimed at investigating these polymorphisms in a German MS case-control cohort. Twelve SNPs in the LEP, LEPR, GHRL and GHSR genes were genotyped in 776 MS patients and 878 control subjects. We found an association of a haplotype in the GHSR gene with MS that could not be replicated in a second cohort. Otherwise, no significant differences in allele or genotype frequencies were observed between patients and controls in this particular cohort. Thus, the present results do not support the hypothesis that genetic variation in the leptin/ghrelin system contributes substantially to the pathogenesis of MS. However, a modest effect of GHSR variation cannot be ruled out and needs to be further evaluated in future studies. Copyright © 2011 Elsevier Ltd. All rights reserved.

  5. Genomic sequence of 'Candidatus Liberibacter solanacearum' haplotype C and its comparison with haplotype A and B genomes.

    Directory of Open Access Journals (Sweden)

    Jinhui Wang

    Full Text Available Haplotypes A and B of 'Candidatus Liberibacter solanacearum' (CLso are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp and FIN111 (1.20 Mbp, were obtained from carrot psyllids (Trioza apicalis harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.

  6. Identification and classification of conserved RNA secondary structures in the human genome

    DEFF Research Database (Denmark)

    Pedersen, Jakob Skou; Bejerano, Gill; Siepel, Adam

    2006-01-01

    The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars...... for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set......, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization....

  7. Genome organization, instabilities, stem cells, and cancer

    Directory of Open Access Journals (Sweden)

    Senthil Kumar Pazhanisamy

    2009-01-01

    Full Text Available It is now widely recognized that advances in exploring genome organization provide remarkable insights on the induction and progression of chromosome abnormalities. Much of what we know about how mutations evolve and consequently transform into genome instabilities has been characterized in the spatial organization context of chromatin. Nevertheless, many underlying concepts of impact of the chromatin organization on perpetuation of multiple mutations and on propagation of chromosomal aberrations remain to be investigated in detail. Genesis of genome instabilities from accumulation of multiple mutations that drive tumorigenesis is increasingly becoming a focal theme in cancer studies. This review focuses on structural alterations evolve to raise a variety of genome instabilities that are manifested at the nucleotide, gene or sub-chromosomal, and whole chromosome level of genome. Here we explore an underlying connection between genome instability and cancer in the light of genome architecture. This review is limited to studies directed towards spatial organizational aspects of origin and propagation of aberrations into genetically unstable tumors.

  8. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

    Science.gov (United States)

    Wang, Ying; Ding, Jun; Daniell, Henry; Hu, Haiyan; Li, Xiaoman

    2012-09-01

    Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.

  9. Isolation and characterization of the gene encoding the starch debranching enzyme limit dextrinase from germinating barley

    DEFF Research Database (Denmark)

    Kristensen, Michael; Lok, Finn; Planchot, Véronique

    1999-01-01

    with a value of 105 kDa estimated by SDS;;PAGE, The coding sequence is interrupted by 26 introns varying in length from 93 bp to 825 bp. The 27 exons vary in length from 53 bp to 197 bp. Southern blot analysis shows that the limit dextrinase gene is present as a single copy in the barley genome. Gene......The gene encoding the starch debranching enzyme limit dextrinase, LD, from barley (Hordeum vulgare), was isolated from a genomic phage library using a barley cDNA clone as probe. The gene encodes a protein of 904 amino acid residues with a calculated molecular mass of 98.6 kDa. This is in agreement...... expression is high during germination and the steady state transcription level reaches a maximum at day 5 of germination. The deduced amino acid sequence corresponds to the protein sequence of limit dextrinase purified from germinating malt, as determined by automated N-terminal sequencing of tryptic...

  10. ARA-PEPs: a repository of putative sORF-encoded peptides in Arabidopsis thaliana.

    Science.gov (United States)

    Hazarika, Rashmi R; De Coninck, Barbara; Yamamoto, Lidia R; Martin, Laura R; Cammue, Bruno P A; van Noort, Vera

    2017-01-17

    Many eukaryotic RNAs have been considered non-coding as they only contain short open reading frames (sORFs). However, there is increasing evidence for the translation of these sORFs into bioactive peptides with potent signaling, antimicrobial, developmental, antioxidant roles etc. Yet only a few peptides encoded by sORFs are annotated in the model organism Arabidopsis thaliana. To aid the functional annotation of these peptides, we have developed ARA-PEPs (available at http://www.biw.kuleuven.be/CSB/ARA-PEPs ), a repository of putative peptides encoded by sORFs in the A. thaliana genome starting from in-house Tiling arrays, RNA-seq data and other publicly available datasets. ARA-PEPs currently lists 13,748 sORF-encoded peptides with transcriptional evidence. In addition to existing data, we have identified 100 novel transcriptionally active regions (TARs) that might encode 341 novel stress-induced peptides (SIPs). To aid in identification of bioactivity, we add functional annotation and sequence conservation to predicted peptides. To our knowledge, this is the largest repository of plant peptides encoded by sORFs with transcript evidence, publicly available and this resource will help scientists to effortlessly navigate the list of experimentally studied peptides, the experimental and computational evidence supporting the activity of these peptides and gain new perspectives for peptide discovery.

  11. A DNA vaccine encoding multiple HIV CD4 epitopes elicits vigorous polyfunctional, long-lived CD4+ and CD8+ T cell responses.

    Directory of Open Access Journals (Sweden)

    Daniela Santoro Rosa

    Full Text Available T-cell based vaccines against HIV have the goal of limiting both transmission and disease progression by inducing broad and functionally relevant T cell responses. Moreover, polyfunctional and long-lived specific memory T cells have been associated to vaccine-induced protection. CD4(+ T cells are important for the generation and maintenance of functional CD8(+ cytotoxic T cells. We have recently developed a DNA vaccine encoding 18 conserved multiple HLA-DR-binding HIV-1 CD4 epitopes (HIVBr18, capable of eliciting broad CD4(+ T cell responses in multiple HLA class II transgenic mice. Here, we evaluated the breadth and functional profile of HIVBr18-induced immune responses in BALB/c mice. Immunized mice displayed high-magnitude, broad CD4(+/CD8(+ T cell responses, and 8/18 vaccine-encoded peptides were recognized. In addition, HIVBr18 immunization was able to induce polyfunctional CD4(+ and CD8(+ T cells that proliferate and produce any two cytokines (IFNγ/TNFα, IFNγ/IL-2 or TNFα/IL-2 simultaneously in response to HIV-1 peptides. For CD4(+ T cells exclusively, we also detected cells that proliferate and produce all three tested cytokines simultaneously (IFNγ/TNFα/IL-2. The vaccine also generated long-lived central and effector memory CD4(+ T cells, a desirable feature for T-cell based vaccines. By virtue of inducing broad, polyfunctional and long-lived T cell responses against conserved CD4(+ T cell epitopes, combined administration of this vaccine concept may provide sustained help for CD8(+ T cells and antibody responses- elicited by other HIV immunogens.

  12. Human endogenous retroviruses and multiple sclerosis: innocent bystanders or disease determinants?

    Science.gov (United States)

    Antony, Joseph M; Deslauriers, Andre M; Bhat, Rakesh K; Ellestad, Kristofer K; Power, Christopher

    2011-02-01

    Human endogenous retroviruses (HERVs) constitute 5-8% of human genomic DNA and are replication incompetent despite expression of individual HERV genes from different chromosomal loci depending on the specific tissue. Several HERV genes have been detected as transcripts and proteins in the central nervous system, frequently in the context of neuroinflammation. The HERV-W family has received substantial attention in large part because of associations with diverse syndromes including multiple sclerosis (MS) and several psychiatric disorders. A HERV-W-related retroelement, multiple sclerosis retrovirus (MSRV), has been reported in MS patients to be both a biomarker as well as an effector of aberrant immune responses. HERV-H and HERV-K have also been implicated in MS and other neurological diseases but await delineation of their contributions to disease. The HERV-W envelope-encoded glycosylated protein, syncytin-1, is encoded by chromosome 7q21 and exhibits increased glial expression within MS lesions. Overexpression of syncytin-1 in glia induces endoplasmic reticulum stress leading to neuroinflammation and the induction of free radicals, which damage proximate cells. Syncytin-1's receptor, ASCT1 is a neutral amino acid transporter expressed on glia and is suppressed in white matter of MS patients. Of interest, antioxidants ameliorate syncytin-1's neuropathogenic effects raising the possibility of using these agents as therapeutics for neuroinflammatory diseases. Given the multiple insertion sites of HERV genes as complete and incomplete open reading frames, together with their differing capacity to be expressed and the complexities of individual HERVs as both disease markers and bioactive effectors, HERV biology is a compelling area for understanding neuropathogenic mechanisms and developing new therapeutic strategies. 2010 Elsevier B.V. All rights reserved.

  13. Comparative genomic analysis of the multispecies probiotic-marketed product VSL#3.

    Directory of Open Access Journals (Sweden)

    François P Douillard

    Full Text Available Several probiotic-marketed formulations available for the consumers contain live lactic acid bacteria and/or bifidobacteria. The multispecies product commercialized as VSL#3 has been used for treating various gastro-intestinal disorders. However, like many other products, the bacterial strains present in VSL#3 have only been characterized to a limited extent and their efficacy as well as their predicted mode of action remain unclear, preventing further applications or comparative studies. In this work, the genomes of all eight bacterial strains present in VSL#3 were sequenced and characterized, to advance insights into the possible mode of action of this product and also to serve as a basis for future work and trials. Phylogenetic and genomic data analysis allowed us to identify the 7 species present in the VSL#3 product as specified by the manufacturer. The 8 strains present belong to the species Streptococcus thermophilus, Lactobacillus acidophilus, Lactobacillus paracasei, Lactobacillus plantarum, Lactobacillus helveticus, Bifidobacterium breve and B. animalis subsp. lactis (two distinct strains. Comparative genomics revealed that the draft genomes of the S. thermophilus and L. helveticus strains were predicted to encode most of the defence systems such as restriction modification and CRISPR-Cas systems. Genes associated with a variety of potential probiotic functions were also identified. Thus, in the three Bifidobacterium spp., gene clusters were predicted to encode tight adherence pili, known to promote bacteria-host interaction and intestinal barrier integrity, and to impact host cell development. Various repertoires of putative signalling proteins were predicted to be encoded by the genomes of the Lactobacillus spp., i.e. surface layer proteins, LPXTG-containing proteins, or sortase-dependent pili that may interact with the intestinal mucosa and dendritic cells. Taken altogether, the individual genomic characterization of the strains

  14. Genomic insights into a new acidophilic, copper-resistant Desulfosporosinus isolate from the oxidized tailings area of an abandoned gold mine.

    Science.gov (United States)

    Mardanov, Andrey V; Panova, Inna A; Beletsky, Alexey V; Avakyan, Marat R; Kadnikov, Vitaly V; Antsiferov, Dmitry V; Banks, David; Frank, Yulia A; Pimenov, Nikolay V; Ravin, Nikolai V; Karnachuk, Olga V

    2016-08-01

    Microbial sulfate reduction in acid mine drainage is still considered to be confined to anoxic conditions, although several reports have shown that sulfate-reducing bacteria occur under microaerophilic or aerobic conditions. We have measured sulfate reduction rates of up to 60 nmol S cm(-3) day(-1) in oxidized layers of gold mine tailings in Kuzbass (SW Siberia). A novel, acidophilic, copper-tolerant Desulfosporosinus sp. I2 was isolated from the same sample and its genome was sequenced. The genomic analysis and physiological data indicate the involvement of transporters and additional mechanisms to tolerate metals, such as sequestration by polyphosphates. Desulfosporinus sp. I2 encodes systems for a metabolically versatile life style. The genome possessed a complete Embden-Meyerhof pathway for glycolysis and gluconeogenesis. Complete oxidation of organic substrates could be enabled by the complete TCA cycle. Genomic analysis found all major components of the electron transfer chain necessary for energy generation via oxidative phosphorylation. Autotrophic CO2 fixation could be performed through the Wood-Ljungdahl pathway. Multiple oxygen detoxification systems were identified in the genome. Taking into account the metabolic activity and genomic analysis, the traits of the novel isolate broaden our understanding of active sulfate reduction and associated metabolism beyond strictly anaerobic niches. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Genome sequence of the Lotus spp. microsymbiont Mesorhizobium loti strain R7A.

    Science.gov (United States)

    Kelly, Simon; Sullivan, John; Ronson, Clive; Tian, Rui; Bräu, Lambert; Munk, Christine; Goodwin, Lynne; Han, Cliff; Woyke, Tanja; Reddy, Tatiparthi; Huntemann, Marcel; Pati, Amrita; Mavromatis, Konstantinos; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Reeve, Wayne

    2014-01-01

    Mesorhizobium loti strain R7A was isolated in 1993 in Lammermoor, Otago, New Zealand from a Lotus corniculatus root nodule and is a reisolate of the inoculant strain ICMP3153 (NZP2238) used at the site. R7A is an aerobic, Gram-negative, non-spore-forming rod. The symbiotic genes in the strain are carried on a 502-kb integrative and conjugative element known as the symbiosis island or ICEMlSym(R7A). M. loti is the microsymbiont of the model legume Lotus japonicus and strain R7A has been used extensively in studies of the plant-microbe interaction. This report reveals that the genome of M. loti strain R7A does not harbor any plasmids and contains a single scaffold of size 6,529,530 bp which encodes 6,323 protein-coding genes and 75 RNA-only encoding genes. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.

  16. The Genome of the Epsilonproteobacterial Chemolithoautotroph Sulfurimonas dentrificans

    Energy Technology Data Exchange (ETDEWEB)

    USF Genomics Class; Sievert, Stefan M.; Scott, Kathleen M.; Klotz, Martin G.; Chain, Patrick S.G.; Hauser, Loren J.; Hemp, James; Hugler, Michael; Land, Miriam; Lapidus, Alla; Larimer, Frank W.; Lucas, Susan; Malfatti, Stephanie A.; Meyer, Folker; Paulsen, Ian T.; Ren, Qinghu; Simon, Jorg

    2007-08-08

    Sulfur-oxidizing epsilonproteobacteria are common in a variety of sulfidogenic environments. These autotrophic and mixotrophic sulfur-oxidizing bacteria are believed to contribute substantially to the oxidative portion of the global sulfur cycle. In order to better understand the ecology and roles of sulfur-oxidizing epsilonproteobacteria, in particular those of the widespread genus Sulfurimonas, in biogeochemical cycles, the genome of Sulfurimonas denitrificans DSM1251 was sequenced. This genome has many features, including a larger size (2.2 Mbp), that suggest a greater degree of metabolic versatility or responsiveness to the environment than seen for most of the other sequenced epsilonproteobacteria. A branched electron transport chain is apparent, with genes encoding complexes for the oxidation of hydrogen, reduced sulfur compounds, and formate and the reduction of nitrate and oxygen. Genes are present for a complete, autotrophic reductive citric acid cycle. Many genes are present that could facilitate growth in the spatially and temporally heterogeneous sediment habitat from where Sulfurimonas denitrificans was originally isolated. Many resistance-nodulation-development family transporter genes (10 total) are present; of these, several are predicted to encode heavy metal efflux transporters. An elaborate arsenal of sensory and regulatory protein-encoding genes is in place, as are genes necessary to prevent and respond to oxidative stress.

  17. Progress toward characterization of the group A Streptococcus metagenome: complete genome sequence of a macrolide-resistant serotype M6 strain.

    Science.gov (United States)

    Banks, David J; Porcella, Stephen F; Barbian, Kent D; Beres, Stephen B; Philips, Lauren E; Voyich, Jovanka M; DeLeo, Frank R; Martin, Judith M; Somerville, Greg A; Musser, James M

    2004-08-15

    We describe the genome sequence of a macrolide-resistant strain (MGAS10394) of serotype M6 group A Streptococcus (GAS). The genome is 1,900,156 bp in length, and 8 prophage-like elements or remnants compose 12.4% of the chromosome. A 8.3-kb prophage remnant encodes the SpeA4 variant of streptococcal pyrogenic exotoxin A. The genome of strain MGAS10394 contains a chimeric genetic element composed of prophage genes and a transposon encoding the mefA gene conferring macrolide resistance. This chimeric element also has a gene encoding a novel surface-exposed protein (designated "R6 protein"), with an LPKTG cell-anchor motif located at the carboxyterminus. Surface expression of this protein was confirmed by flow cytometry. Humans with GAS pharyngitis caused by serotype M6 strains had antibody against the R6 protein present in convalescent, but not acute, serum samples. Our studies add to the theme that GAS prophage-encoded extracellular proteins contribute to host-pathogen interactions in a strain-specific fashion.

  18. Geographic isolates of Lymantria dispar multiple nucleopolyhedrovirus: Genome sequence analysis and pathogenicity against European and Asian gypsy moth strains.

    Science.gov (United States)

    Harrison, Robert L; Rowley, Daniel L; Keena, Melody A

    2016-06-01

    Isolates of the baculovirus species Lymantria dispar multiple nucleopolyhedrovirus have been formulated and applied to suppress outbreaks of the gypsy moth, L. dispar. To evaluate the genetic diversity in this species at the genomic level, the genomes of three isolates from Massachusetts, USA (LdMNPV-Ab-a624), Spain (LdMNPV-3054), and Japan (LdMNPV-3041) were sequenced and compared with four previously determined LdMNPV genome sequences. The LdMNPV genome sequences were collinear and contained the same homologous repeats (hrs) and clusters of baculovirus repeat orf (bro) gene family members in the same relative positions in their genomes, although sequence identities in these regions were low. Of 146 non-bro ORFs annotated in the genome of the representative isolate LdMNPV 5-6, 135 ORFs were found in every other LdMNPV genome, including the 37 core genes of Baculoviridae and other genes conserved in genus Alphabaculovirus. Phylogenetic inference with an alignment of the core gene nucleotide sequences grouped isolates 3041 (Japan) and 2161 (Korea) separately from a cluster containing isolates from Europe, North America, and Russia. To examine phenotypic diversity, bioassays were carried out with a selection of isolates against neonate larvae from three European gypsy moth (Lymantria dispar dispar) and three Asian gypsy moth (Lymantria dispar asiatica and Lymantria dispar japonica) colonies. LdMNPV isolates 2161 (Korea), 3029 (Russia), and 3041 (Japan) exhibited a greater degree of pathogenicity against all L. dispar strains than LdMNPV from a sample of Gypchek. This study provides additional information on the genetic diversity of LdMNPV isolates and their activity against the Asian gypsy moth, a potential invasive pest of North American trees and forests. Published by Elsevier Inc.

  19. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

    Science.gov (United States)

    Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

    2015-01-01

    We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.

  20. Patient-controlled encrypted genomic data: an approach to advance clinical genomics

    Directory of Open Access Journals (Sweden)

    Trakadis Yannis J

    2012-07-01

    Full Text Available Abstract Background The revolution in DNA sequencing technologies over the past decade has made it feasible to sequence an individual’s whole genome at a relatively low cost. The potential value of the information generated by genomic technologies for medicine and society is enormous. However, in order for exome sequencing, and eventually whole genome sequencing, to be implemented clinically, a number of major challenges need to be overcome. For instance, obtaining meaningful informed-consent, managing incidental findings and the great volume of data generated (including multiple findings with uncertain clinical significance, re-interpreting the genomic data and providing additional counselling to patients as genetic knowledge evolves are issues that need to be addressed. It appears that medical genetics is shifting from the present “phenotype-first” medical model to a “data-first” model which leads to multiple complexities. Discussion This manuscript discusses the different challenges associated with integrating genomic technologies into clinical practice and describes a “phenotype-first” approach, namely, “Individualized Mutation-weighed Phenotype Search”, and its benefits. The proposed approach allows for a more efficient prioritization of the genes to be tested in a clinical lab based on both the patient’s phenotype and his/her entire genomic data. It simplifies “informed-consent” for clinical use of genomic technologies and helps to protect the patient’s autonomy and privacy. Overall, this approach could potentially render widespread use of genomic technologies, in the immediate future, practical, ethical and clinically useful. Summary The “Individualized Mutation-weighed Phenotype Search” approach allows for an incremental integration of genomic technologies into clinical practice. It ensures that we do not over-medicalize genomic data but, rather, continue our current medical model which is based on serving

  1. Diverse Lifestyles and Strategies of Plant Pathogenesis Encoded in the Genomes of Eighteen Dothideomycetes

    Energy Technology Data Exchange (ETDEWEB)

    Ohm, Robin A.; Feau, Nicolas; Henrissat, Bernard; Schoch, Conrad L.; Horwitz, Benjamin A.; Barry, Kerrie W.; Condon, Bradford J.; Copeland, Alex C.; Dhillon, Braham; Glaser, Fabian; Hesse, Cedar N.; Kosti, Idit; LaButti, Kurt; Lindquist, Erika A.; Lucas, Susan; Salamov, Asaf A.; Bradshaw, Rosie E.; Ciuffetti, Lynda; Hamelin, Richard C.; Kema, Gert H. J.; Lawrence, Christopher; Scott, James A.; Spatafora, Joseph W.; Turgeon, B. Gillian; de Wit, Pierre J. G. M.; Zhong, Shaobin; Goodwin, Stephen B.; Grigoriev, Igor V.

    2013-03-05

    The class of Dothideomycetes is one of the largest and most diverse groups of fungi. Many are plant pathogens and pose a serious threat to agricultural crops that are grown for biofuel, food or feed. Most Dothideomycetes have only a single host plant, and related species can have very diverse hosts. Eighteen genomes of Dothideomycetes have currently been sequenced by the Joint Genome Institute and other sequencing centers. Here we describe the results of comparative analyses of the fungi in this group.

  2. Genome Segregation and Packaging Machinery in Acanthamoeba polyphaga Mimivirus Is Reminiscent of Bacterial Apparatus

    Science.gov (United States)

    Chelikani, Venkata; Ranjan, Tushar; Zade, Amrutraj; Shukla, Avi

    2014-01-01

    ABSTRACT Genome packaging is a critical step in the virion assembly process. The putative ATP-driven genome packaging motor of Acanthamoeba polyphaga mimivirus (APMV) and other nucleocytoplasmic large DNA viruses (NCLDVs) is a distant ortholog of prokaryotic chromosome segregation motors, such as FtsK and HerA, rather than other viral packaging motors, such as large terminase. Intriguingly, APMV also encodes other components, i.e., three putative serine recombinases and a putative type II topoisomerase, all of which are essential for chromosome segregation in prokaryotes. Based on our analyses of these components and taking the limited available literature into account, here we propose for the first time a model for genome segregation and packaging in APMV that can possibly be extended to NCLDV subfamilies, except perhaps Poxviridae and Ascoviridae. This model might represent a unique variation of the prokaryotic system acquired and contrived by the large DNA viruses of eukaryotes. It is also consistent with previous observations that unicellular eukaryotes, such as amoebae, are melting pots for the advent of chimeric organisms with novel mechanisms. IMPORTANCE Extremely large viruses with DNA genomes infect a wide range of eukaryotes, from human beings to amoebae and from crocodiles to algae. These large DNA viruses, unlike their much smaller cousins, have the capability of making most of the protein components required for their multiplication. Once they infect the cell, these viruses set up viral replication centers, known as viral factories, to carry out their multiplication with very little help from the host. Our sequence analyses show that there is remarkable similarity between prokaryotes (bacteria and archaea) and large DNA viruses, such as mimivirus, vaccinia virus, and pandoravirus, in the way that they process their newly synthesized genetic material to make sure that only one copy of the complete genome is generated and is meticulously placed inside

  3. Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113

    Science.gov (United States)

    Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martín, Marta

    2012-01-01

    Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

  4. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation.

    Science.gov (United States)

    Audit, Benjamin; Zaghloul, Lamia; Baker, Antoine; Arneodo, Alain; Chen, Chun-Long; d'Aubenton-Carafa, Yves; Thermes, Claude

    2013-01-01

    In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.

  5. Comparison of 26 sphingomonad genomes reveals diverse environmental adaptations and biodegradative capabilities

    DEFF Research Database (Denmark)

    Aylward, Frank O.; McDonald, Bradon R.; Adams, Sandra M.

    2013-01-01

    to the genus Sphingobium. Our pan-genomic analysis of sphingomonads reveals numerous species-specific open reading frames (ORFs) but few signatures of genus-specific cores. The organization and coding potential of the sphingomonad genomes appear to be highly variable, and plasmid-mediated gene transfer...... and chromosome-plasmid recombination, together with prophage- and transposon-mediated rearrangements, appear to play prominent roles in the genome evolution of this group. We find that many of the sphingomonad genomes encode numerous oxygenases and glycoside hydrolases, which are likely responsible...... a basis for understanding the ecological strategies employed by sphingomonads and their role in environmental nutrient cycling....

  6. Break Breast Cancer Addiction by CRISPR/Cas9 Genome Editing.

    Science.gov (United States)

    Yang, Haitao; Jaeger, MariaLynn; Walker, Averi; Wei, Daniel; Leiker, Katie; Weitao, Tao

    2018-01-01

    Breast cancer is the leading diagnosed cancer for women globally. Evolution of breast cancer in tumorigenesis, metastasis and treatment resistance appears to be driven by the aberrant gene expression and protein degradation encoded by the cancer genomes. The uncontrolled cancer growth relies on these cellular events, thus constituting the cancerous programs and rendering the addiction towards them. These programs are likely the potential anticancer biomarkers for Personalized Medicine of breast cancer. This review intends to delineate the impact of the CRSPR/Cas-mediated genome editing in identification and validation of these anticancer biomarkers. It reviews the progress in three aspects of CRISPR/Cas9-mediated editing of the breast cancer genomes: Somatic genome editing, transcription and protein degradation addictions.

  7. Formation of mushrooms and lignocellulose degradation encoded in the genome sequence of Schizophyllum commune

    Energy Technology Data Exchange (ETDEWEB)

    Ohm, Robin A.; de Jong, Jan F.; Lugones, Luis G.; Aerts, Andrea; Kothe, Erika; Stajich, Jason E.; de Vries, Ronald P.; Record, Eric; Levasseur, Anthony; Baker, Scott E.; Bartholomew, Kirk A.; Coutinho, Pedro M.; Erdmann, Susann; Fowler, Thomas J.; Gathman, Allen C.; Lombard, Vincent; Henrissat, Bernard; Knabe, Nicole; Kues, Ursula; Lilly, Walt W.; Lindquist, Erika; Lucas, Susan; Magnuson, Jon K.; Piumi, Francois; Raudaskoski, Marjatta; Salamov, Asaf; Schmutz, Jeremy; Schwarze, Francis W.M.R.; van Kuyk, Patricia A.; Horton, J. Stephen; Grigoriev, Igor V.; Wosten, Han A.B.

    2010-07-12

    The wood degrading fungus Schizophyllum commune is a model system for mushroom development. Here, we describe the 38.5 Mb assembled genome of this basidiomycete and application of whole genome expression analysis to study the 13,210 predicted genes. Comparative analyses of the S. commune genome revealed unique wood degrading machinery and mating type loci with the highest number of reported genes. Gene expression analyses revealed that one third of the 471 identified transcription factor genes were differentially expressed during sexual development. Two of these transcription factor genes were deleted. Inactivation of fst4 resulted in the inability to form mushrooms, whereas inactivation of fst3 resulted in more but smaller mushrooms than wild-type. These data illustrate that mechanisms underlying mushroom formation can be dissected using S. commune as a model. This will impact commercial production of mushrooms and the industrial use of these fruiting bodies to produce enzymes and pharmaceuticals.

  8. Comparative genome analysis of non-toxigenic non-O1 versus toxigenic O1 Vibrio cholerae

    Science.gov (United States)

    Mukherjee, Munmun; Kakarla, Prathusha; Kumar, Sanath; Gonzalez, Esmeralda; Floyd, Jared T.; Inupakutika, Madhuri; Devireddy, Amith Reddy; Tirrell, Selena R.; Bruns, Merissa; He, Guixin; Lindquist, Ingrid E.; Sundararajan, Anitha; Schilkey, Faye D.; Mudge, Joann; Varela, Manuel F.

    2015-01-01

    Pathogenic strains of Vibrio cholerae are responsible for endemic and pandemic outbreaks of the disease cholera. The complete toxigenic mechanisms underlying virulence in Vibrio strains are poorly understood. The hypothesis of this work was that virulent versus non-virulent strains of V. cholerae harbor distinctive genomic elements that encode virulence. The purpose of this study was to elucidate genomic differences between the O1 serotypes and non-O1 V. cholerae PS15, a non-toxigenic strain, in order to identify novel genes potentially responsible for virulence. In this study, we compared the whole genome of the non-O1 PS15 strain to the whole genomes of toxigenic serotypes at the phylogenetic level, and found that the PS15 genome was distantly related to those of toxigenic V. cholerae. Thus we focused on a detailed gene comparison between PS15 and the distantly related O1 V. cholerae N16961. Based on sequence alignment we tentatively assigned chromosome numbers 1 and 2 to elements within the genome of non-O1 V. cholerae PS15. Further, we found that PS15 and O1 V. cholerae N16961 shared 98% identity and 766 genes, but of the genes present in N16961 that were missing in the non-O1 V. cholerae PS15 genome, 56 were predicted to encode not only for virulence–related genes (colonization, antimicrobial resistance, and regulation of persister cells) but also genes involved in the metabolic biosynthesis of lipids, nucleosides and sulfur compounds. Additionally, we found 113 genes unique to PS15 that were predicted to encode other properties related to virulence, disease, defense, membrane transport, and DNA metabolism. Here, we identified distinctive and novel genomic elements between O1 and non-O1 V. cholerae genomes as potential virulence factors and, thus, targets for future therapeutics. Modulation of such novel targets may eventually enhance eradication efforts of endemic and pandemic disease cholera in afflicted nations. PMID:25722857

  9. Genome sequence of the agar-degrading marine bacterium Alteromonadaceae sp. strain G7.

    Science.gov (United States)

    Kwak, Min-Jung; Song, Ju Yeon; Kim, Byung Kwon; Chi, Won-Jae; Kwon, Soon-Kyeong; Choi, Soobeom; Chang, Yong-Keun; Hong, Soon-Kwang; Kim, Jihyun F

    2012-12-01

    Here, we present the high-quality draft genome sequence of the agar-degrading marine gammaproteobacterium Alteromonadaceae sp. strain G7, which was isolated from coastal seawater to be utilized as a bioresource for production of agar-derived biofuels. The 3.91-Mb genome contains a number of genes encoding algal polysaccharide-degrading enzymes such as agarases and sulfatases.

  10. Genome Sequence of the Agar-Degrading Marine Bacterium Alteromonadaceae sp. Strain G7

    OpenAIRE

    Kwak, Min-Jung; Song, Ju Yeon; Kim, Byung Kwon; Chi, Won-Jae; Kwon, Soon-Kyeong; Choi, Soobeom; Chang, Yong-Keun; Hong, Soon-Kwang; Kim, Jihyun F.

    2012-01-01

    Here, we present the high-quality draft genome sequence of the agar-degrading marine gammaproteobacterium Alteromonadaceae sp. strain G7, which was isolated from coastal seawater to be utilized as a bioresource for production of agar-derived biofuels. The 3.91-Mb genome contains a number of genes encoding algal polysaccharide-degrading enzymes such as agarases and sulfatases.

  11. Composition and expression of genes encoding carbohydrate-active enzymes in the straw-degrading mushroom Volvariella volvacea.

    Directory of Open Access Journals (Sweden)

    Bingzhi Chen

    Full Text Available Volvariella volvacea is one of a few commercial cultivated mushrooms mainly using straw as carbon source. In this study, the genome of V. volcacea was sequenced and assembled. A total of 285 genes encoding carbohydrate-active enzymes (CAZymes in V. volvacea were identified and annotated. Among 15 fungi with sequenced genomes, V. volvacea ranks seventh in the number of genes encoding CAZymes. In addition, the composition of glycoside hydrolases in V. volcacea is dramatically different from other basidiomycetes: it is particularly rich in members of the glycoside hydrolase families GH10 (hemicellulose degradation and GH43 (hemicellulose and pectin degradation, and the lyase families PL1, PL3 and PL4 (pectin degradation but lacks families GH5b, GH11, GH26, GH62, GH93, GH115, GH105, GH9, GH53, GH32, GH74 and CE12. Analysis of genome-wide gene expression profiles of 3 strains using 3'-tag digital gene expression (DGE reveals that 239 CAZyme genes were expressed even in potato destrose broth medium. Our data also showed that the formation of a heterokaryotic strain could dramatically increase the expression of a number of genes which were poorly expressed in its parental homokaryotic strains.

  12. Complete Genome Sequence of Zucchini Yellow Mosaic Virus Strain Kurdistan, Iran.

    Science.gov (United States)

    Maghamnia, Hamid Reza; Hajizadeh, Mohammad; Azizi, Abdolbaset

    2018-03-01

    The complete genome sequence of Zucchini yellow mosaic virus strain Kurdistan (ZYMV-Kurdistan) infecting squash from Iran was determined from 13 overlapping fragments. Excluding the poly (A) tail, ZYMV-Kurdistan genome consisted of 9593 nucleotides (nt), with 138 and 211 nt at the 5' and 3' non-translated regions, respectively. It contained two open-reading frames (ORFs), the large ORF encoding a polyprotein of 3080 amino acids (aa) and the small overlapping ORF encoding a P3N-PIPO protein of 74 aa. This isolate had six unique aa differences compared to other ZYMV isolates and shared 79.6-98.8% identities with other ZYMV genome sequences at the nt level and 90.1-99% identities at the aa level. A phylogenetic tree of ZYMV complete genomic sequences showed that Iranian and Central European isolates are closely related and form a phylogenetically homogenous group. All values in the ratio of substitution rates at non-synonymous and synonymous sites ( d N / d S ) were below 1, suggestive of strong negative selection forces during ZYMV protein history. This is the first report of complete genome sequence information of the most prevalent virus in the west of Iran. This study helps our understanding of the genetic diversity of ZYMV isolates infecting cucurbit plants in Iran, virus evolution and epidemiology and can assist in designing better diagnostic tools.

  13. Induction of the gap-pgk operon encoding glyceraldehyde-3-phosphate dehydrogenase and 3-phosphoglycerate kinase of Xanthobacter flavus requires the LysR-type transcriptional activator CbbR

    NARCIS (Netherlands)

    Meijer, W.G; van den Bergh, E.R E; Smith, L.M

    In a previous study, a gene (pgk) encoding phosphoglycerate kinase was isolated from a genomic labrid of Xanthobacter flavus. Although this gene is essential for autotrophic growth, it is not located within the cbb operon encoding other Calvin cycle enzymes. An analysis of the nucleotide sequence

  14. A Rickettsia Genome Overrun by Mobile Genetic Elements Provides Insight into the Acquisition of Genes Characteristic of an Obligate Intracellular Lifestyle

    Science.gov (United States)

    Joardar, Vinita; Williams, Kelly P.; Driscoll, Timothy; Hostetler, Jessica B.; Nordberg, Eric; Shukla, Maulik; Walenz, Brian; Hill, Catherine A.; Nene, Vishvanath M.; Aza