WorldWideScience

Sample records for mouse genome database

  1. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    Science.gov (United States)

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Mouse Genome Informatics (MGI)

    Data.gov (United States)

    U.S. Department of Health & Human Services — MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human...

  3. Mouse Phenome Database (MPD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Mouse Phenome Database (MPD) has characterizations of hundreds of strains of laboratory mice to facilitate translational discoveries and to assist in selection...

  4. Rat Genome Database (RGD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Rat Genome Database (RGD) is a collaborative effort between leading research institutions involved in rat genetic and genomic research to collect, consolidate,...

  5. 10. international mouse genome conference

    Energy Technology Data Exchange (ETDEWEB)

    Meisler, M.H.

    1996-12-31

    Ten years after hosting the First International Mammalian Genome Conference in Paris in 1986, Dr. Jean-Louis Guenet presided over the Tenth Conference at the Pasteur Institute, October 7--10, 1996. The 1986 conference was a satellite to the Human Gene Mapping Workshop and had approximately 50 attendees. The 1996 meeting was attended by 300 scientists from around the world. In the interim, the number of mapped loci in the mouse increased from 1,000 to over 20,000. This report contains a listing of the program and its participants, and two articles that review the meeting and the role of the laboratory mouse in the Human Genome project. More than 200 papers were presented at the conference covering the following topics: International mouse chromosome committee meetings; Mutant generation and identification; Physical and genetic maps; New technology and resources; Chromatin structure and gene regulation; Rate and hamster genetic maps; Informatics and databases; and Quantitative trait analysis.

  6. Mycobacteriophage genome database.

    Science.gov (United States)

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  7. Mouse Resource Browser-a database of mouse databases

    NARCIS (Netherlands)

    Zouberakis, Michael; Chandras, Christina; Swertz, Morris; Smedley, Damian; Gruenberger, Michael; Bard, Jonathan; Schughart, Klaus; Rosenthal, Nadia; Hancock, John M.; Schofield, Paul N.; Kollias, George; Aidinis, Vassilis

    2010-01-01

    The laboratory mouse has become the organism of choice for discovering gene function and unravelling pathogenetic mechanisms of human diseases through the application of various functional genomic approaches. The resulting deluge of data has led to the deployment of numerous online resources and the

  8. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  9. GOBASE: an organelle genome database

    OpenAIRE

    O?Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

    2008-01-01

    The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (?913 000) and chloroplast-encoded sequences (?250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing...

  10. The Ensembl genome database project.

    Science.gov (United States)

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  11. The YH database: the first Asian diploid genome database

    DEFF Research Database (Denmark)

    Li, Guoqing; Ma, Lijia; Song, Chao

    2009-01-01

    genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn....

  12. The Mouse SAGE Site: database of public mouse SAGE libraries

    Czech Academy of Sciences Publication Activity Database

    Divina, Petr; Forejt, Jiří

    2004-01-01

    Roč. 32, - (2004), s. D482-D483 ISSN 0305-1048 R&D Projects: GA MŠk LN00A079; GA ČR GV204/98/K015 Grant - others:HHMI(US) 555000306 Institutional research plan: CEZ:AV0Z5052915 Keywords : mouse SAGE libraries * web -based database Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 7.260, year: 2004

  13. The UCSC Genome Browser Database: 2008 update

    DEFF Research Database (Denmark)

    Karolchik, D; Kuhn, R M; Baertsch, R

    2007-01-01

    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrat...

  14. The UCSC Genome Browser Database: update 2006

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, ...

  15. The UCSC genome browser database: update 2007

    DEFF Research Database (Denmark)

    Kuhn, R M; Karolchik, D; Zweig, A S

    2006-01-01

    The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up t...

  16. Development of the mouse cochlea database (MCD).

    Science.gov (United States)

    Santi, Peter A; Rapson, Ian; Voie, Arne

    2008-09-01

    The mouse cochlea database (MCD) provides an interactive, image database of the mouse cochlea for learning its anatomy and data mining of its resources. The MCD website is hosted on a centrally maintained, high-speed server at the following URL: (http://mousecochlea.umn.edu). The MCD contains two types of image resources, serial 2D image stacks and 3D reconstructions of cochlear structures. Complete image stacks of the cochlea from two different mouse strains were obtained using orthogonal plane fluorescence optical microscopy (OPFOS). 2D images of the cochlea are presented on the MCD website as: viewable images within a stack, 2D atlas of the cochlea, orthogonal sections, and direct volume renderings combined with isosurface reconstructions. In order to assess cochlear structures quantitatively, "true" cross-sections of the scala media along the length of the basilar membrane were generated by virtual resectioning of a cochlea orthogonal to a cochlear structure, such as the centroid of the basilar membrane or the scala media. 3D images are presented on the MCD website as: direct volume renderings, movies, interactive QuickTime VRs, flythrough, and isosurface 3D reconstructions of different cochlear structures. 3D computer models can also be used for solid model fabrication by rapid prototyping and models from different cochleas can be combined to produce an average 3D model. The MCD is the first comprehensive image resource on the mouse cochlea and is a new paradigm for understanding the anatomy of the cochlea, and establishing morphometric parameters of cochlear structures in normal and mutant mice.

  17. BGD: a database of bat genomes.

    Science.gov (United States)

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  18. BGD: a database of bat genomes.

    Directory of Open Access Journals (Sweden)

    Jianfei Fang

    Full Text Available Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD. BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  19. Insights from Human/Mouse genome comparisons

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.

    2003-03-30

    Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestry of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.

  20. Mouse SNP Miner: an annotated database of mouse functional single nucleotide polymorphisms

    Directory of Open Access Journals (Sweden)

    Ramensky Vasily E

    2007-01-01

    Full Text Available Abstract Background The mapping of quantitative trait loci in rat and mouse has been extremely successful in identifying chromosomal regions associated with human disease-related phenotypes. However, identifying the specific phenotype-causing DNA sequence variations within a quantitative trait locus has been much more difficult. The recent availability of genomic sequence from several mouse inbred strains (including C57BL/6J, 129X1/SvJ, 129S1/SvImJ, A/J, and DBA/2J has made it possible to catalog DNA sequence differences within a quantitative trait locus derived from crosses between these strains. However, even for well-defined quantitative trait loci ( Description To help identify functional DNA sequence variations within quantitative trait loci we have used the Ensembl annotated genome sequence to compile a database of mouse single nucleotide polymorphisms (SNPs that are predicted to cause missense, nonsense, frameshift, or splice site mutations (available at http://bioinfo.embl.it/SnpApplet/. For missense mutations we have used the PolyPhen and PANTHER algorithms to predict whether amino acid changes are likely to disrupt protein function. Conclusion We have developed a database of mouse SNPs predicted to cause missense, nonsense, frameshift, and splice-site mutations. Our analysis revealed that 20% and 14% of missense SNPs are likely to be deleterious according to PolyPhen and PANTHER, respectively, and 6% are considered deleterious by both algorithms. The database also provides gene expression and functional annotations from the Symatlas, Gene Ontology, and OMIM databases to further assess candidate phenotype-causing mutations. To demonstrate its utility, we show that Mouse SNP Miner successfully finds a previously identified candidate SNP in the taste receptor, Tas1r3, that underlies sucrose preference in the C57BL/6J strain. We also use Mouse SNP Miner to derive a list of candidate phenotype-causing mutations within a previously

  1. The Saccharomyces Genome Database Variant Viewer.

    Science.gov (United States)

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Saccharomyces genome database informs human biology

    OpenAIRE

    Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Hellerstedt, Sage T; Engel, Stacia R; Karra, Kalpana; Weng, Shuai; Sheppard, Travis K; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Cherry, J Michael

    2017-01-01

    Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and...

  3. Gramene database: Navigating plant comparative genomics resources

    Directory of Open Access Journals (Sweden)

    Parul Gupta

    2016-11-01

    Full Text Available Gramene (http://www.gramene.org is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationships to enrich the annotation of genomic data and provides tools to perform powerful comparative analyses across a wide spectrum of plant species. It consists of an integrated portal for querying, visualizing and analyzing data for 44 plant reference genomes, genetic variation data sets for 12 species, expression data for 16 species, curated rice pathways and orthology-based pathway projections for 66 plant species including various crops. Here we briefly describe the functions and uses of the Gramene database.

  4. Benchmarking database performance for genomic data.

    Science.gov (United States)

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. © 2015 Wiley Periodicals, Inc.

  5. Requirements and standards for organelle genome databases

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.

    2006-01-09

    Mitochondria and plastids (collectively called organelles)descended from prokaryotes that adopted an intracellular, endosymbioticlifestyle within early eukaryotes. Comparisons of their remnant genomesaddress a wide variety of biological questions, especially when includingthe genomes of their prokaryotic relatives and the many genes transferredto the eukaryotic nucleus during the transitions from endosymbiont toorganelle. The pace of producing complete organellar genome sequences nowmakes it unfeasible to do broad comparisons using the primary literatureand, even if it were feasible, it is now becoming uncommon for journalsto accept detailed descriptions of genome-level features. Unfortunatelyno database is currently useful for this task, since they have littlestandardization and are riddled with error. Here I outline what iscurrently wrong and what must be done to make this data useful to thescientific community.

  6. Genomes of the Mouse Collaborative Cross.

    Science.gov (United States)

    Srivastava, Anuj; Morgan, Andrew P; Najarian, Maya L; Sarsani, Vishal Kumar; Sigmon, J Sebastian; Shorter, John R; Kashfeen, Anwica; McMullan, Rachel C; Williams, Lucy H; Giusti-Rodríguez, Paola; Ferris, Martin T; Sullivan, Patrick; Hock, Pablo; Miller, Darla R; Bell, Timothy A; McMillan, Leonard; Churchill, Gary A; de Villena, Fernando Pardo-Manuel

    2017-06-01

    The Collaborative Cross (CC) is a multiparent panel of recombinant inbred (RI) mouse strains derived from eight founder laboratory strains. RI panels are popular because of their long-term genetic stability, which enhances reproducibility and integration of data collected across time and conditions. Characterization of their genomes can be a community effort, reducing the burden on individual users. Here we present the genomes of the CC strains using two complementary approaches as a resource to improve power and interpretation of genetic experiments. Our study also provides a cautionary tale regarding the limitations imposed by such basic biological processes as mutation and selection. A distinct advantage of inbred panels is that genotyping only needs to be performed on the panel, not on each individual mouse. The initial CC genome data were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes, and there was uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30× coverage of a single male per strain. Sequencing leads to a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples show a significant reduction in the genome-wide haplotype frequencies from two wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of

  7. A report from the Sixth International Mouse Genome Conference

    Energy Technology Data Exchange (ETDEWEB)

    Brown, S. [Saint Mary`s Hospital Medical School, London (United Kingdom). Dept. of Biochemistry and Molecular Genetics

    1992-12-31

    The Sixth Annual Mouse Genome Conference was held in October, 1992 at Buffalo, USA. The mouse is one of the primary model organisms in the Human Genome Project. Through the use of gene targeting studies the mouse has become a powerful biological model for the study of gene function and, in addition, the comparison of the many homologous mutations identified in human and mouse have widened our understanding of the biology of these two organisms. A primary goal in the mouse genome program has been to create a genetic map of STSs of high resolution (<1cM) that would form the basis for the physical mapping of the whole mouse genome. Buffalo saw substantial new progress towards the goal of a very high density genetic map and the beginnings of substantive efforts towards physical mapping in chromosome regions with a high density of genetic markers.

  8. CyanoBase: the cyanobacteria genome database update 2010.

    Science.gov (United States)

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  9. CyanoBase: the cyanobacteria genome database update 2010

    OpenAIRE

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2009-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in var...

  10. Biocuration at the Saccharomyces genome database.

    Science.gov (United States)

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. © 2015 Wiley Periodicals, Inc.

  11. Private and Efficient Query Processing on Outsourced Genomic Databases.

    Science.gov (United States)

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  12. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  13. Database Description - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ENOME is a database for transmembrane β-barrel proteins in complete genomes. For each genome, calculations with machine learning algo...rithms and statistical methods have been perfumed and th

  14. Meeting Report: The Twelfth International Mouse Genome Conference

    Energy Technology Data Exchange (ETDEWEB)

    Manolakou, Katerina; Cross, Sally H.; Simpson, Eleanor H.; Jackson, Ian J.

    1998-10-01

    The annual International Mouse Genome Conference (IMGC) is where, scientifically speaking, classical mouse genetics meets the relative newcomer of genomics. The 12th meeting took place last October in the delightful Bavarian village of Garmisch-Partenkirchen, and we were greeted by the sight on the mountains of the first snowfall of the season. However the discussions left little time for exploration. Minds of participants in Garmisch were focused by a recent document produced by the NIH and by discussions within other funding agencies worldwide. If implemented, the proposals will further enhance the status of the mouse as the principal model for study of the function of the human genome.

  15. Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

    Science.gov (United States)

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.

  16. MIPS: a database for protein sequences and complete genomes.

    Science.gov (United States)

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  17. Human · mouse genome analysis and radiation biology. Proceedings

    International Nuclear Information System (INIS)

    Hori, Tada-aki

    1994-03-01

    This issue is the collection of the papers presented at the 25th NIRS symposium on Human, Mouse Genome Analysis and Radiation Biology. The 14 of the presented papers are indexed individually. (J.P.N.)

  18. Recent updates and developments to plant genome size databases

    Science.gov (United States)

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  19. Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse.

    Science.gov (United States)

    Eppig, Janan T

    2017-07-01

    The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. © The Author 2017. Published by Oxford University Press.

  20. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    Science.gov (United States)

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org.

  1. Potential translational targets revealed by linking mouse grooming behavioral phenotypes to gene expression using public databases.

    Science.gov (United States)

    Roth, Andrew; Kyzar, Evan J; Cachat, Jonathan; Stewart, Adam Michael; Green, Jeremy; Gaikwad, Siddharth; O'Leary, Timothy P; Tabakoff, Boris; Brown, Richard E; Kalueff, Allan V

    2013-01-10

    Rodent self-grooming is an important, evolutionarily conserved behavior, highly sensitive to pharmacological and genetic manipulations. Mice with aberrant grooming phenotypes are currently used to model various human disorders. Therefore, it is critical to understand the biology of grooming behavior, and to assess its translational validity to humans. The present in-silico study used publicly available gene expression and behavioral data obtained from several inbred mouse strains in the open-field, light-dark box, elevated plus- and elevated zero-maze tests. As grooming duration differed between strains, our analysis revealed several candidate genes with significant correlations between gene expression in the brain and grooming duration. The Allen Brain Atlas, STRING, GoMiner and Mouse Genome Informatics databases were used to functionally map and analyze these candidate mouse genes against their human orthologs, assessing the strain ranking of their expression and the regional distribution of expression in the mouse brain. This allowed us to identify an interconnected network of candidate genes (which have expression levels that correlate with grooming behavior), display altered patterns of expression in key brain areas related to grooming, and underlie important functions in the brain. Collectively, our results demonstrate the utility of large-scale, high-throughput data-mining and in-silico modeling for linking genomic and behavioral data, as well as their potential to identify novel neural targets for complex neurobehavioral phenotypes, including grooming. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. INE: a rice genome database with an integrated map view.

    Science.gov (United States)

    Sakata, K; Antonio, B A; Mukai, Y; Nagasaki, H; Sakai, Y; Makino, K; Sasaki, T

    2000-01-01

    The Rice Genome Research Program (RGP) launched a large-scale rice genome sequencing in 1998 aimed at decoding all genetic information in rice. A new genome database called INE (INtegrated rice genome Explorer) has been developed in order to integrate all the genomic information that has been accumulated so far and to correlate these data with the genome sequence. A web interface based on Java applet provides a rapid viewing capability in the database. The first operational version of the database has been completed which includes a genetic map, a physical map using YAC (Yeast Artificial Chromosome) clones and PAC (P1-derived Artificial Chromosome) contigs. These maps are displayed graphically so that the positional relationships among the mapped markers on each chromosome can be easily resolved. INE incorporates the sequences and annotations of the PAC contig. A site on low quality information ensures that all submitted sequence data comply with the standard for accuracy. As a repository of rice genome sequence, INE will also serve as a common database of all sequence data obtained by collaborating members of the International Rice Genome Sequencing Project (IRGSP). The database can be accessed at http://www. dna.affrc.go.jp:82/giot/INE. html or its mirror site at http://www.staff.or.jp/giot/INE.html

  3. Brassica ASTRA: an integrated database for Brassica genomic research.

    Science.gov (United States)

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  4. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  5. Mouse IDGenes: a reference database for genetic interactions in the developing mouse brain.

    Science.gov (United States)

    Matthes, Michaela; Preusse, Martin; Zhang, Jingzhong; Schechter, Julia; Mayer, Daniela; Lentes, Bernd; Theis, Fabian; Prakash, Nilima; Wurst, Wolfgang; Trümbach, Dietrich

    2014-01-01

    The study of developmental processes in the mouse and other vertebrates includes the understanding of patterning along the anterior-posterior, dorsal-ventral and medial- lateral axis. Specifically, neural development is also of great clinical relevance because several human neuropsychiatric disorders such as schizophrenia, autism disorders or drug addiction and also brain malformations are thought to have neurodevelopmental origins, i.e. pathogenesis initiates during childhood and adolescence. Impacts during early neurodevelopment might also predispose to late-onset neurodegenerative disorders, such as Parkinson's disease. The neural tube develops from its precursor tissue, the neural plate, in a patterning process that is determined by compartmentalization into morphogenetic units, the action of local signaling centers and a well-defined and locally restricted expression of genes and their interactions. While public databases provide gene expression data with spatio-temporal resolution, they usually neglect the genetic interactions that govern neural development. Here, we introduce Mouse IDGenes, a reference database for genetic interactions in the developing mouse brain. The database is highly curated and offers detailed information about gene expressions and the genetic interactions at the developing mid-/hindbrain boundary. To showcase the predictive power of interaction data, we infer new Wnt/β-catenin target genes by machine learning and validate one of them experimentally. The database is updated regularly. Moreover, it can easily be extended by the research community. Mouse IDGenes will contribute as an important resource to the research on mouse brain development, not exclusively by offering data retrieval, but also by allowing data input. http://mouseidgenes.helmholtz-muenchen.de. © The Author(s) 2014. Published by Oxford University Press.

  6. Uniform standards for genome databases in forest and fruit trees

    Science.gov (United States)

    TreeGenes and tfGDR serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype...

  7. OryzaGenome: Genome Diversity Database of Wild Oryza Species

    KAUST Repository

    Ohyanagi, Hajime

    2015-11-18

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a textbased browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.

  8. OryzaGenome: Genome Diversity Database of Wild Oryza Species

    KAUST Repository

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi Xuan; Han, Bin; Kurata, Nori

    2015-01-01

    . Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all

  9. The Mouse Tumor Biology Database: A Comprehensive Resource for Mouse Models of Human Cancer.

    Science.gov (United States)

    Krupke, Debra M; Begley, Dale A; Sundberg, John P; Richardson, Joel E; Neuhauser, Steven B; Bult, Carol J

    2017-11-01

    Research using laboratory mice has led to fundamental insights into the molecular genetic processes that govern cancer initiation, progression, and treatment response. Although thousands of scientific articles have been published about mouse models of human cancer, collating information and data for a specific model is hampered by the fact that many authors do not adhere to existing annotation standards when describing models. The interpretation of experimental results in mouse models can also be confounded when researchers do not factor in the effect of genetic background on tumor biology. The Mouse Tumor Biology (MTB) database is an expertly curated, comprehensive compendium of mouse models of human cancer. Through the enforcement of nomenclature and related annotation standards, MTB supports aggregation of data about a cancer model from diverse sources and assessment of how genetic background of a mouse strain influences the biological properties of a specific tumor type and model utility. Cancer Res; 77(21); e67-70. ©2017 AACR . ©2017 American Association for Cancer Research.

  10. Generation of Knock-in Mouse by Genome Editing.

    Science.gov (United States)

    Fujii, Wataru

    2017-01-01

    Knock-in mice are useful for evaluating endogenous gene expressions and functions in vivo. Instead of the conventional gene-targeting method using embryonic stem cells, an exogenous DNA sequence can be inserted into the target locus in the zygote using genome editing technology. In this chapter, I describe the generation of epitope-tagged mice using engineered endonuclease and single-stranded oligodeoxynucleotide through the mouse zygote as an example of how to generate a knock-in mouse by genome editing.

  11. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  12. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  13. Human Ageing Genomic Resources: new and updated databases

    Science.gov (United States)

    Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E

    2018-01-01

    Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237

  14. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    Science.gov (United States)

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  15. De-anonymizing Genomic Databases Using Phenotypic Traits

    Directory of Open Access Journals (Sweden)

    Humbert Mathias

    2015-06-01

    Full Text Available People increasingly have their genomes sequenced and some of them share their genomic data online. They do so for various purposes, including to find relatives and to help advance genomic research. An individual’s genome carries very sensitive, private information such as its owner’s susceptibility to diseases, which could be used for discrimination. Therefore, genomic databases are often anonymized. However, an individual’s genotype is also linked to visible phenotypic traits, such as eye or hair color, which can be used to re-identify users in anonymized public genomic databases, thus raising severe privacy issues. For instance, an adversary can identify a target’s genome using known her phenotypic traits and subsequently infer her susceptibility to Alzheimer’s disease. In this paper, we quantify, based on various phenotypic traits, the extent of this threat in several scenarios by implementing de-anonymization attacks on a genomic database of OpenSNP users sequenced by 23andMe. Our experimental results show that the proportion of correct matches reaches 23% with a supervised approach in a database of 50 participants. Our approach outperforms the baseline by a factor of four, in terms of the proportion of correct matches, in most scenarios. We also evaluate the adversary’s ability to predict individuals’ predisposition to Alzheimer’s disease, and we observe that the inference error can be halved compared to the baseline. We also analyze the effect of the number of known phenotypic traits on the success rate of the attack. As progress is made in genomic research, especially for genotype-phenotype associations, the threat presented in this paper will become more serious.

  16. i-Genome: A database to summarize oligonucleotide data in genomes

    Directory of Open Access Journals (Sweden)

    Chang Yu-Chung

    2004-10-01

    Full Text Available Abstract Background Information on the occurrence of sequence features in genomes is crucial to comparative genomics, evolutionary analysis, the analyses of regulatory sequences and the quantitative evaluation of sequences. Computing the frequencies and the occurrences of a pattern in complete genomes is time-consuming. Results The proposed database provides information about sequence features generated by exhaustively computing the sequences of the complete genome. The repetitive elements in the eukaryotic genomes, such as LINEs, SINEs, Alu and LTR, are obtained from Repbase. The database supports various complete genomes including human, yeast, worm, and 128 microbial genomes. Conclusions This investigation presents and implements an efficiently computational approach to accumulate the occurrences of the oligonucleotides or patterns in complete genomes. A database is established to maintain the information of the sequence features, including the distributions of oligonucleotide, the gene distribution, the distribution of repetitive elements in genomes and the occurrences of the oligonucleotides. The database can provide more effective and efficient way to access the repetitive features in genomes.

  17. KAIKObase: An integrated silkworm genome database and data mining tool

    Directory of Open Access Journals (Sweden)

    Nagaraju Javaregowda

    2009-10-01

    Full Text Available Abstract Background The silkworm, Bombyx mori, is one of the most economically important insects in many developing countries owing to its large-scale cultivation for silk production. With the development of genomic and biotechnological tools, B. mori has also become an important bioreactor for production of various recombinant proteins of biomedical interest. In 2004, two genome sequencing projects for B. mori were reported independently by Chinese and Japanese teams; however, the datasets were insufficient for building long genomic scaffolds which are essential for unambiguous annotation of the genome. Now, both the datasets have been merged and assembled through a joint collaboration between the two groups. Description Integration of the two data sets of silkworm whole-genome-shotgun sequencing by the Japanese and Chinese groups together with newly obtained fosmid- and BAC-end sequences produced the best continuity (~3.7 Mb in N50 scaffold size among the sequenced insect genomes and provided a high degree of nucleotide coverage (88% of all 28 chromosomes. In addition, a physical map of BAC contigs constructed by fingerprinting BAC clones and a SNP linkage map constructed using BAC-end sequences were available. In parallel, proteomic data from two-dimensional polyacrylamide gel electrophoresis in various tissues and developmental stages were compiled into a silkworm proteome database. Finally, a Bombyx trap database was constructed for documenting insertion positions and expression data of transposon insertion lines. Conclusion For efficient usage of genome information for functional studies, genomic sequences, physical and genetic map information and EST data were compiled into KAIKObase, an integrated silkworm genome database which consists of 4 map viewers, a gene viewer, and sequence, keyword and position search systems to display results and data at the level of nucleotide sequence, gene, scaffold and chromosome. Integration of the

  18. Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

    Science.gov (United States)

    Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin

    2018-04-12

    The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.

  19. BRAD, the genetics and genomics database for Brassica plants

    Directory of Open Access Journals (Sweden)

    Li Pingxia

    2011-10-01

    Full Text Available Abstract Background Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data. Description BRAD, the Brassica database, is a web-based resource focusing on genome scale genetic and genomic data for important Brassica crops. BRAD was built based on the first whole genome sequence and on further data analysis of the Brassica A genome species, Brassica rapa (Chiifu-401-42. It provides datasets, such as the complete genome sequence of B. rapa, which was de novo assembled from Illumina GA II short reads and from BAC clone sequences, predicted genes and associated annotations, non coding RNAs, transposable elements (TE, B. rapa genes' orthologous to those in A. thaliana, as well as genetic markers and linkage maps. BRAD offers useful searching and data mining tools, including search across annotation datasets, search for syntenic or non-syntenic orthologs, and to search the flanking regions of a certain target, as well as the tools of BLAST and Gbrowse. BRAD allows users to enter almost any kind of information, such as a B. rapa or A. thaliana gene ID, physical position or genetic marker. Conclusion BRAD, a new database which focuses on the genetics and genomics of the Brassica plants has been developed, it aims at helping scientists and breeders to fully and efficiently use the information of genome data of Brassica plants. BRAD will be continuously updated and can be accessed through http://brassicadb.org.

  20. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    Science.gov (United States)

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. BBGD: an online database for blueberry genomic data

    Directory of Open Access Journals (Sweden)

    Matthews Benjamin F

    2007-01-01

    Full Text Available Abstract Background Blueberry is a member of the Ericaceae family, which also includes closely related cranberry and more distantly related rhododendron, azalea, and mountain laurel. Blueberry is a major berry crop in the United States, and one that has great nutritional and economical value. Extreme low temperatures, however, reduce crop yield and cause major losses to US farmers. A better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation is needed to produce blueberry cultivars with enhanced cold hardiness. To that end, the blueberry genomics database (BBDG was developed. Along with the analysis tools and web-based query interfaces, the database serves both the broader Ericaceae research community and the blueberry research community specifically by making available ESTs and gene expression data in searchable formats and in elucidating the underlying mechanisms of cold acclimation and freeze tolerance in blueberry. Description BBGD is the world's first database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. Conclusion By using the database, researchers have developed EST-based markers for mapping and have identified a number of "candidate" cold tolerance genes that are highly expressed in blueberry flower buds after exposure to low temperatures.

  2. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Directory of Open Access Journals (Sweden)

    Qingyu Chen

    Full Text Available First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases.We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  3. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  4. Building a genome database using an object-oriented approach.

    Science.gov (United States)

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.

  5. Mouse Genome Informatics (MGI) Is the International Resource for Information on the Laboratory Mouse.

    Science.gov (United States)

    Law, MeiYee; Shaw, David R

    2018-01-01

    Mouse Genome Informatics (MGI, http://www.informatics.jax.org/ ) web resources provide free access to meticulously curated information about the laboratory mouse. MGI's primary goal is to help researchers investigate the genetic foundations of human diseases by translating information from mouse phenotypes and disease models studies to human systems. MGI provides comprehensive phenotypes for over 50,000 mutant alleles in mice and provides experimental model descriptions for over 1500 human diseases. Curated data from scientific publications are integrated with those from high-throughput phenotyping and gene expression centers. Data are standardized using defined, hierarchical vocabularies such as the Mammalian Phenotype (MP) Ontology, Mouse Developmental Anatomy and the Gene Ontologies (GO). This chapter introduces you to Gene and Allele Detail pages and provides step-by-step instructions for simple searches and those that take advantage of the breadth of MGI data integration.

  6. An Open Access Database of Genome-wide Association Results

    Directory of Open Access Journals (Sweden)

    Johnson Andrew D

    2009-01-01

    Full Text Available Abstract Background The number of genome-wide association studies (GWAS is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results. Methods We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS. Results Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., APOE, LPL. At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (SLC16A7, CSMD1, OAS1, suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p -14, a finding

  7. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  8. PReMod: a database of genome-wide mammalian cis-regulatory module predictions.

    Science.gov (United States)

    Ferretti, Vincent; Poitras, Christian; Bergeron, Dominique; Coulombe, Benoit; Robert, François; Blanchette, Mathieu

    2007-01-01

    We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656-668, exploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites. Contrary to other existing databases, PReMod is not restricted to modules located proximal to genes, but in fact mostly contains distal predicted CRMs (pCRMs). Through its web interface, PReMod allows users to (i) identify pCRMs around a gene of interest; (ii) identify pCRMs that have binding sites for a given TF (or a set of TFs) or (iii) download the entire dataset for local analyses. Queries can also be refined by filtering for specific chromosomal regions, for specific regions relative to genes or for the presence of CpG islands. The output includes information about the binding sites predicted within the selected pCRMs, and a graphical display of their distribution within the pCRMs. It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the NCBI. PReMod: http://genomequebec.mcgill.ca/PReMod.

  9. GEAR: A database of Genomic Elements Associated with drug Resistance

    Science.gov (United States)

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-01-01

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org. PMID:28294141

  10. Retroviral insertions in the VISION database identify molecular pathways in mouse lymphoid leukemia and lymphoma.

    Science.gov (United States)

    Weiser, Keith C; Liu, Bin; Hansen, Gwenn M; Skapura, Darlene; Hentges, Kathryn E; Yarlagadda, Sujatha; Morse Iii, Herbert C; Justice, Monica J

    2007-10-01

    AKXD recombinant inbred (RI) strains develop a variety of leukemias and lymphomas due to somatically acquired insertions of retroviral DNA into the genome of hematopoetic cells that can mutate cellular proto-oncogenes and tumor suppressor genes. We generated a new set of tumors from nine AKXD RI strains selected for their propensity to develop B-cell tumors, the most common type of human hematopoietic cancers. We employed a PCR technique called viral insertion site amplification (VISA) to rapidly isolate genomic sequence at the site of provirus insertion. Here we describe 550 VISA sequence tags (VSTs) that identify 74 common insertion sites (CISs), of which 21 have not been identified previously. Several suspected proto-oncogenes and tumor suppressor genes lie near CISs, providing supportive evidence for their roles in cancer. Furthermore, numerous previously uncharacterized genes lie near CISs, providing a pool of candidate disease genes for future research. Pathway analysis of candidate genes identified several signaling pathways as common and powerful routes to blood cancer, including Notch, E-protein, NFkappaB, and Ras signaling. Misregulation of several Notch signaling genes was confirmed by quantitative RT-PCR. Our data suggest that analyses of insertional mutagenesis on a single genetic background are biased toward the identification of cooperating mutations. This tumor collection represents the most comprehensive study of the genetics of B-cell leukemia and lymphoma development in mice. We have deposited the VST sequences, CISs in a genome viewer, histopathology, and molecular tumor typing data in a public web database called VISION (Viral Insertion Sites Identifying Oncogenes), which is located at http://www.mouse-genome.bcm.tmc.edu/vision .

  11. Genomic Locus Modulating IOP in the BXD RI Mouse Strains

    Directory of Open Access Journals (Sweden)

    Rebecca King

    2018-05-01

    Full Text Available Intraocular pressure (IOP is the primary risk factor for developing glaucoma, yet little is known about the contribution of genomic background to IOP regulation. The present study leverages an array of systems genetics tools to study genomic factors modulating normal IOP in the mouse. The BXD recombinant inbred (RI strain set was used to identify genomic loci modulating IOP. We measured the IOP in a total of 506 eyes from 38 different strains. Strain averages were subjected to conventional quantitative trait analysis by means of composite interval mapping. Candidate genes were defined, and immunohistochemistry and quantitative PCR (qPCR were used for validation. Of the 38 BXD strains examined the mean IOP ranged from a low of 13.2mmHg to a high of 17.1mmHg. The means for each strain were used to calculate a genome wide interval map. One significant quantitative trait locus (QTL was found on Chr.8 (96 to 103 Mb. Within this 7 Mb region only 4 annotated genes were found: Gm15679, Cdh8, Cdh11 and Gm8730. Only two genes (Cdh8 and Cdh11 were candidates for modulating IOP based on the presence of non-synonymous SNPs. Further examination using SIFT (Sorting Intolerant From Tolerant analysis revealed that the SNPs in Cdh8 (Cadherin 8 were predicted to not change protein function; while the SNPs in Cdh11 (Cadherin 11 would not be tolerated, affecting protein function. Furthermore, immunohistochemistry demonstrated that CDH11 is expressed in the trabecular meshwork of the mouse. We have examined the genomic regulation of IOP in the BXD RI strain set and found one significant QTL on Chr. 8. Within this QTL, there is one good candidate gene, Cdh11.

  12. DRDB: An Online Date Palm Genomic Resource Database

    Directory of Open Access Journals (Sweden)

    Zilong He

    2017-11-01

    Full Text Available Background: Date palm (Phoenix dactylifera L. is a cultivated woody plant with agricultural and economic importance in many countries around the world. With the advantages of next generation sequencing technologies, genome sequences for many date palm cultivars have been released recently. Short sequence repeat (SSR and single nucleotide polymorphism (SNP can be identified from these genomic data, and have been proven to be very useful biomarkers in plant genome analysis and breeding.Results: Here, we first improved the date palm genome assembly using 130X of HiSeq data generated in our lab. Then 246,445 SSRs (214,901 SSRs and 31,544 compound SSRs were annotated in this genome assembly; among the SSRs, mononucleotide SSRs (58.92% were the most abundant, followed by di- (29.92%, tri- (8.14%, tetra- (2.47%, penta- (0.36%, and hexa-nucleotide SSRs (0.19%. The high-quality PCR primer pairs were designed for most (174,497; 70.81% out of total SSRs. We also annotated 6,375,806 SNPs with raw read depth≥3 in 90% cultivars. To further reduce false positive SNPs, we only kept 5,572,650 (87.40% out of total SNPs with at least 20% cultivars support for downstream analyses. The high-quality PCR primer pairs were also obtained for 4,177,778 (65.53% SNPs. We reconstructed the phylogenetic relationships among the 62 cultivars using these variants and found that they can be divided into three clusters, namely North Africa, Egypt – Sudan, and Middle East – South Asian, with Egypt – Sudan being the admixture of North Africa and Middle East – South Asian cultivars; we further confirmed these clusters using principal component analysis. Moreover, 34,346 SSRs and 4,177,778 SNPs with PCR primers were assigned to shared cultivars for cultivar classification and diversity analysis. All these SSRs, SNPs and their classification are available in our database, and can be used for cultivar identification, comparison, and molecular breeding.Conclusion:DRDB is a

  13. GDR (Genome Database for Rosaceae: integrated web resources for Rosaceae genomics and genetics research

    Directory of Open Access Journals (Sweden)

    Ficklin Stephen

    2004-09-01

    Full Text Available Abstract Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  14. GDR (Genome Database for Rosaceae): integrated web resources for Rosaceae genomics and genetics research.

    Science.gov (United States)

    Jung, Sook; Jesudurai, Christopher; Staton, Margaret; Du, Zhidian; Ficklin, Stephen; Cho, Ilhyung; Abbott, Albert; Tomkins, Jeffrey; Main, Dorrie

    2004-09-09

    Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  15. The catfish genome database cBARBEL: an informatic platform for genome biology of ictalurid catfish.

    Science.gov (United States)

    Lu, Jianguo; Peatman, Eric; Yang, Qing; Wang, Shaolin; Hu, Zhiliang; Reecy, James; Kucuktas, Huseyin; Liu, Zhanjiang

    2011-01-01

    The catfish genome database, cBARBEL (abbreviated from catfish Breeder And Researcher Bioinformatics Entry Location) is an online open-access database for genome biology of ictalurid catfish (Ictalurus spp.). It serves as a comprehensive, integrative platform for all aspects of catfish genetics, genomics and related data resources. cBARBEL provides BLAST-based, fuzzy and specific search functions, visualization of catfish linkage, physical and integrated maps, a catfish EST contig viewer with SNP information overlay, and GBrowse-based organization of catfish genomic data based on sequence similarity with zebrafish chromosomes. Subsections of the database are tightly related, allowing a user with a sequence or search string of interest to navigate seamlessly from one area to another. As catfish genome sequencing proceeds and ongoing quantitative trait loci (QTL) projects bear fruit, cBARBEL will allow rapid data integration and dissemination within the catfish research community and to interested stakeholders. cBARBEL can be accessed at http://catfishgenome.org.

  16. Invited review: Genetic and genomic mouse models for livestock research

    Directory of Open Access Journals (Sweden)

    D. Arends

    2018-02-01

    Full Text Available Knowledge about the function and functioning of single or multiple interacting genes is of the utmost significance for understanding the organism as a whole and for accurate livestock improvement through genomic selection. This includes, but is not limited to, understanding the ontogenetic and environmentally driven regulation of gene action contributing to simple and complex traits. Genetically modified mice, in which the functions of single genes are annotated; mice with reduced genetic complexity; and simplified structured populations are tools to gain fundamental knowledge of inheritance patterns and whole system genetics and genomics. In this review, we briefly describe existing mouse resources and discuss their value for fundamental and applied research in livestock.

  17. Exploring Protein Function Using the Saccharomyces Genome Database.

    Science.gov (United States)

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  18. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    Science.gov (United States)

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  19. MaizeGDB: The Maize Genetics and Genomics Database.

    Science.gov (United States)

    Harper, Lisa; Gardiner, Jack; Andorf, Carson; Lawrence, Carolyn J

    2016-01-01

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website ( http://www.maizegdb.org ) are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. In addition, pre-compiled reports are made available for particular types of data and bulletin boards are provided to facilitate communication and coordination among members of the community of maize geneticists.

  20. Xylella fastidiosa comparative genomic database is an information resource to explore the annotation, genomic features, and biology of different strains

    Directory of Open Access Journals (Sweden)

    Alessandro M. Varani

    2012-01-01

    Full Text Available The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.

  1. MetReS, an Efficient Database for Genomic Applications.

    Science.gov (United States)

    Vilaplana, Jordi; Alves, Rui; Solsona, Francesc; Mateo, Jordi; Teixidó, Ivan; Pifarré, Marc

    2018-02-01

    MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis. The study was started with a relational database, MySQL, which is the current database server used by the applications. We also tested the performance of an alternative data-handling framework, Apache Hadoop. Hadoop is currently used for large-scale data processing. We found that this data handling framework is likely to greatly improve the efficiency of the MetReS applications as the dataset and the processing needs increase by several orders of magnitude, as expected to happen in the near future.

  2. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States); Univ. of California, Berkeley, CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  3. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  4. BarleyBase—an expression profiling database for plant genomics

    Science.gov (United States)

    Shen, Lishuang; Gong, Jian; Caldo, Rico A.; Nettleton, Dan; Cook, Dianne; Wise, Roger P.; Dickerson, Julie A.

    2005-01-01

    BarleyBase (BB) (www.barleybase.org) is an online database for plant microarrays with integrated tools for data visualization and statistical analysis. BB houses raw and normalized expression data from the two publicly available Affymetrix genome arrays, Barley1 and Arabidopsis ATH1 with plans to include the new Affymetrix 61K wheat, maize, soybean and rice arrays, as they become available. BB contains a broad set of query and display options at all data levels, ranging from experiments to individual hybridizations to probe sets down to individual probes. Users can perform cross-experiment queries on probe sets based on observed expression profiles and/or based on known biological information. Probe set queries are integrated with visualization and analysis tools such as the R statistical toolbox, data filters and a large variety of plot types. Controlled vocabularies for gene and plant ontologies, as well as interconnecting links to physical or genetic map and other genomic data in PlantGDB, Gramene and GrainGenes, allow users to perform EST alignments and gene function prediction using Barley1 exemplar sequences, thus, enhancing cross-species comparison. PMID:15608273

  5. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies.

    Science.gov (United States)

    Abugessaisa, Imad; Noguchi, Shuhei; Hasegawa, Akira; Harshbarger, Jayson; Kondo, Atsushi; Lizio, Marina; Severin, Jessica; Carninci, Piero; Kawaji, Hideya; Kasukawa, Takeya

    2017-08-29

    The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.

  6. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    NARCIS (Netherlands)

    S.L. Macrae (Sheila L.); Q. Zhang (Quanwei); C. Lemetre (Christophe); I. Seim (Inge); R.B. Calder (Robert B.); J.H.J. Hoeijmakers (Jan); Y. Suh (Yousin); V.N. Gladyshev (Vadim N.); A. Seluanov (Andrei); V. Gorbunova (Vera); J. Vijg (Jan); Z.D. Zhang (Zhengdong D.)

    2015-01-01

    textabstractGenome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM

  7. The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists.

    OpenAIRE

    Sven Heinicke; Michael S Livstone; Charles Lu; Rose Oughtred; Fan Kang; Samuel V Angiuoli; Owen White; David Botstein; Kara Dolinski

    2007-01-01

    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic r...

  8. mouseTube – a database to collaboratively unravel mouse ultrasonic communication [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Nicolas Torquet

    2016-09-01

    Full Text Available Ultrasonic vocalisation is a broadly used proxy to evaluate social communication in mouse models of neuropsychiatric disorders. The efficacy and robustness of testing these models suffer from limited knowledge of the structure and functions of these vocalisations as well as of the way to analyse the data. We created mouseTube, an open database with a web interface, to facilitate sharing and comparison of ultrasonic vocalisations data and metadata attached to a recording file. Metadata describe 1 the acquisition procedure, e.g., hardware, software, sampling frequency, bit depth; 2 the biological protocol used to elicit ultrasonic vocalisations; 3 the characteristics of the individual emitting ultrasonic vocalisations (e.g., strain, sex, age. To promote open science and enable reproducibility, data are made freely available. The website provides searching functions to facilitate the retrieval of recording files of interest. It is designed to enable comparisons of ultrasonic vocalisation emission between strains, protocols or laboratories, as well as to test different analysis algorithms and to search for protocols established to elicit mouse ultrasonic vocalisations. Over the long term, users will be able to download and compare different analysis results for each data file. Such application will boost the knowledge on mouse ultrasonic communication and stimulate sharing and comparison of automatic analysis methods to refine phenotyping techniques in mouse models of neuropsychiatric disorders.

  9. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    Science.gov (United States)

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Genome-wide identification of estrogen receptor alpha-binding sites in mouse liver

    DEFF Research Database (Denmark)

    Gao, Hui; Fält, Susann; Sandelin, Albin

    2007-01-01

    We report the genome-wide identification of estrogen receptor alpha (ERalpha)-binding regions in mouse liver using a combination of chromatin immunoprecipitation and tiled microarrays that cover all nonrepetitive sequences in the mouse genome. This analysis identified 5568 ERalpha-binding regions...... genes. The majority of ERalpha-binding regions lie in regions that are evolutionarily conserved between human and mouse. Motif-finding algorithms identified the estrogen response element, and variants thereof, together with binding sites for activator protein 1, basic-helix-loop-helix proteins, ETS...... signaling in mouse liver, by characterizing the first step in this signaling cascade, the binding of ERalpha to DNA in intact chromatin....

  11. Dioxin induces genomic instability in mouse embryonic fibroblasts.

    Directory of Open Access Journals (Sweden)

    Merja Korkalainen

    Full Text Available Ionizing radiation and certain other exposures have been shown to induce genomic instability (GI, i.e., delayed genetic damage observed many cell generations later in the progeny of the exposed cells. The aim of this study was to investigate induction of GI by a nongenotoxic carcinogen, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD. Mouse embryonic fibroblasts (C3H10T1/2 were exposed to 1, 10 or 100 nM TCDD for 2 days. Micronuclei (MN and expression of selected cancer-related genes were assayed both immediately and at a delayed point in time (8 days. For comparison, similar experiments were done with cadmium, a known genotoxic agent. TCDD treatment induced an elevated frequency of MN at 8 days, but not directly after the exposure. TCDD-induced alterations in gene expression were also mostly delayed, with more changes observed at 8 days than at 2 days. Exposure to cadmium produced an opposite pattern of responses, with pronounced effects immediately after exposure but no increase in MN and few gene expression changes at 8 days. Although all responses to TCDD alone were delayed, menadione-induced DNA damage (measured by the Comet assay, was found to be increased directly after a 2-day TCDD exposure, indicating that the stability of the genome was compromised already at this time point. The results suggested a flat dose-response relationship consistent with dose-response data reported for radiation-induced GI. These findings indicate that TCDD, although not directly genotoxic, induces GI, which is associated with impaired DNA damage response.

  12. License - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us TMBETA-GENOME License License to Use This Database Last updated : 2015/03/09 You may use this database... the license terms regarding the use of this database and the requirements you must follow in using this database.... The license for this database is specified in the Creative Commons Attribu...tion-Share Alike 2.1 Japan . If you use data from this database, please be sure attribute this database as f....1 Japan . The summary of the Creative Commons Attribution-Share Alike 2.1 Japan is found here . With regard to this database

  13. A genome browser database for rice (Oryza sativa) and Chinese ...

    African Journals Online (AJOL)

    STORAGESEVER

    2009-10-19

    Oct 19, 2009 ... sativa) and Chinese cabbage (Brassica rapa) genomes. The genome ... tant staple food for a large part of the world's human population. .... some banding region for selection and the overview panel shows the location of ...

  14. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    Science.gov (United States)

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.

  15. IMG: the integrated microbial genomes database and comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640

  16. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    Science.gov (United States)

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  17. Pathbase: A new reference resource and database for laboratory mouse pathology

    International Nuclear Information System (INIS)

    Schofield, P. N.; Bard, J. B. L.; Boniver, J.; Covelli, V.; Delvenne, P.; Ellender, M.; Engstrom, W.; Goessner, W.; Gruenberger, M.; Hoefler, H.; Hopewell, J. W.; Mancuso, M.; Mothersill, C.; Quintanilla-Martinez, L.; Rozell, B.; Sariola, H.; Sundberg, J. P.; Ward, A.

    2004-01-01

    Pathbase (http:/www.pathbase.net) is a web accessible database of histopathological images of laboratory mice, developed as a resource for the coding and archiving of data derived from the analysis of mutant or genetically engineered mice and their background strains. The metadata for the images, which allows retrieval and inter-operability with other databases, is derived from a series of orthogonal ontologies, and controlled vocabularies. One of these controlled vocabularies, MPATH, was developed by the Pathbase Consortium as a formal description of the content of mouse histopathological images. The database currently has over 1000 images on-line with 2000 more under curation and presents a paradigm for the development of future databases dedicated to aspects of experimental biology. (authors)

  18. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  19. Improving Microbial Genome Annotations in an Integrated Database Context

    Science.gov (United States)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  20. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  1. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

    Science.gov (United States)

    Wiley, Laura K; Sivley, R Michael; Bush, William S

    2013-01-01

    Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

  2. Endonucleases : new tools to edit the mouse genome

    NARCIS (Netherlands)

    Wijshake, Tobias; Baker, Darren J.; van de Sluis, Bart

    2014-01-01

    Mouse transgenesis has been instrumental in determining the function of genes in the pathophysiology of human diseases and modification of genes by homologous recombination in mouse embryonic stem cells remains a widely used technology. However, this approach harbors a number of disadvantages, as it

  3. Fine-scale maps of recombination rates and hotspots in the mouse genome.

    Science.gov (United States)

    Brunschwig, Hadassa; Levi, Liat; Ben-David, Eyal; Williams, Robert W; Yakir, Benjamin; Shifman, Sagiv

    2012-07-01

    Recombination events are not uniformly distributed and often cluster in narrow regions known as recombination hotspots. Several studies using different approaches have dramatically advanced our understanding of recombination hotspot regulation. Population genetic data have been used to map and quantify hotspots in the human genome. Genetic variation in recombination rates and hotspots usage have been explored in human pedigrees, mouse intercrosses, and by sperm typing. These studies pointed to the central role of the PRDM9 gene in hotspot modulation. In this study, we used single nucleotide polymorphisms (SNPs) from whole-genome resequencing and genotyping studies of mouse inbred strains to estimate recombination rates across the mouse genome and identified 47,068 historical hotspots--an average of over 2477 per chromosome. We show by simulation that inbred mouse strains can be used to identify positions of historical hotspots. Recombination hotspots were found to be enriched for the predicted binding sequences for different alleles of the PRDM9 protein. Recombination rates were on average lower near transcription start sites (TSS). Comparing the inferred historical recombination hotspots with the recent genome-wide mapping of double-strand breaks (DSBs) in mouse sperm revealed a significant overlap, especially toward the telomeres. Our results suggest that inbred strains can be used to characterize and study the dynamics of historical recombination hotspots. They also strengthen previous findings on mouse recombination hotspots, and specifically the impact of sequence variants in Prdm9.

  4. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

    Science.gov (United States)

    Wang, Dapeng; Xu, Jiayue; Yu, Jun

    2015-09-16

    The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

  5. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  6. The Ruby UCSC API: accessing the UCSC genome database using Ruby.

    Science.gov (United States)

    Mishima, Hiroyuki; Aerts, Jan; Katayama, Toshiaki; Bonnal, Raoul J P; Yoshiura, Koh-ichiro

    2012-09-21

    The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index-if available-when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.

  7. The Ruby UCSC API: accessing the UCSC genome database using Ruby

    Science.gov (United States)

    2012-01-01

    Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/. PMID:22994508

  8. The Ruby UCSC API: accessing the UCSC genome database using Ruby

    Directory of Open Access Journals (Sweden)

    Mishima Hiroyuki

    2012-09-01

    Full Text Available Abstract Background The University of California, Santa Cruz (UCSC genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser and several means for programmatic queries. A simple application programming interface (API in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby. Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.

  9. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    Science.gov (United States)

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  10. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human.

    Science.gov (United States)

    MacRae, Sheila L; Zhang, Quanwei; Lemetre, Christophe; Seim, Inge; Calder, Robert B; Hoeijmakers, Jan; Suh, Yousin; Gladyshev, Vadim N; Seluanov, Andrei; Gorbunova, Vera; Vijg, Jan; Zhang, Zhengdong D

    2015-04-01

    Genome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM genes appeared to be strongly conserved, with copy number variation in only four genes. Interestingly, we found NMR to have a higher copy number of CEBPG, a regulator of DNA repair, and TINF2, a protector of telomere integrity. NMR, as well as human, was also found to have a lower rate of germline nucleotide substitution than the mouse. Together, the data suggest that the long-lived NMR, as well as human, has more robust GM than mouse and identifies new targets for the analysis of the exceptional longevity of the NMR. © 2015 The Authors. Aging Cell published by the Anatomical Society and John Wiley & Sons Ltd.

  11. Characteristics of the mouse genomic histamine H1 receptor gene

    Energy Technology Data Exchange (ETDEWEB)

    Inoue, Isao; Taniuchi, Ichiro; Kitamura, Daisuke [Kyushu Univ., Fukuoka (Japan)] [and others

    1996-08-15

    We report here the molecular cloning of a mouse histamine H1 receptor gene. The protein deduced from the nucleotide sequence is composed of 488 amino acid residues with characteristic properties of GTP binding protein-coupled receptors. Our results suggest that the mouse histamine H1 receptor gene is a single locus, and no related sequences were detected. Interspecific backcross analysis indicated that the mouse histamine H1 receptor gene (Hrh1) is located in the central region of mouse Chromosome 6 linked to microphthalmia (Mitfmi), ras-related fibrosarcoma oncogene 1 (Raf1), and ret proto-oncogene (Ret) in a region of homology with human chromosome 3p. 12 refs., 3 figs.

  12. Databases and web tools for cancer genomics study.

    Science.gov (United States)

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  13. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model.

    Science.gov (United States)

    Saccone, Scott F; Quan, Jiaxi; Jones, Peter L

    2012-04-15

    Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. BioQ is freely available to the public at http://bioq.saclab.net.

  14. Analysis of disease-associated objects at the Rat Genome Database

    Science.gov (United States)

    Wang, Shur-Jen; Laulederkind, Stanley J. F.; Hayman, G. T.; Smith, Jennifer R.; Petri, Victoria; Lowry, Timothy F.; Nigam, Rajni; Dwinell, Melinda R.; Worthey, Elizabeth A.; Munzenmaier, Diane H.; Shimoyama, Mary; Jacob, Howard J.

    2013-01-01

    The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene–disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, ‘regulation of programmed cell death’ was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where ‘lipid metabolic process’ was the most enriched term. ‘Cytosol’ and ‘nucleus’ were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with ‘nucleus’ annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term–annotated gene list showed enrichment in physiologically related diseases. For example, the ‘regulation of blood pressure’ genes were enriched with cardiovascular disease annotations, and the ‘lipid metabolic process’ genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological

  15. Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

    Directory of Open Access Journals (Sweden)

    Maggi Giorgio P

    2008-06-01

    Full Text Available Abstract Background The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent on the availability of annotated proteins. Results In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. Conclusion Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.

  16. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  17. MIPS PlantsDB: a database framework for comparative plant genome research.

    Science.gov (United States)

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  18. MIPS: a database for protein sequences, homology data and yeast genome information.

    Science.gov (United States)

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  19. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    Directory of Open Access Journals (Sweden)

    Rodrigo Aniceto

    2015-01-01

    Full Text Available Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  20. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    Science.gov (United States)

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  1. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    Science.gov (United States)

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  2. The European dimension for the mouse genome mutagenesis

    Czech Academy of Sciences Publication Activity Database

    Auwerx, J.; Avner, P.; Baldock, R.; Ballabio, A.; Balling, R.; Barbacid, M.; Berns, A.; Bradley, A.; Brown, S.; Carmeliet, P.; Chambon, P.; Cox, R.; Davidson, D.; Davies, K.; Duboule, D.; Forejt, Jiří; Granucci, F.; Hastie, N.; Angelis, M. H. de; Jackson, I.; Kioussis, D.; Kollias, G.; Lathrop, M.; Lendahl, U.; Malumbres, M.; von Melchner, H.; Müller, W.; Partanen, J.; Ricciardi-Castagnoli, P.; Rigby, P.; Rosen, B.; Rosenthal, N.; Skarnes, B.; Stewart, A. F.; Thornton, J.; Tocchini-Valentini, G.; Wagner, E.; Wahli, W.; Wurst, W.

    2004-01-01

    Roč. 16, - (2004), s. 925-927 ISSN 1061-4036 R&D Projects: GA MŠk(CZ) LN00A079 Institutional research plan: CEZ:AV0Z5052915 Keywords : The European Mouse Mutagenesis Consortium Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 24.695, year: 2004

  3. Chromosome-wise dissection of the genome of the extremely big mouse line DU6i

    NARCIS (Netherlands)

    M.R. Bevova (Marianna); Y.S. Aulchenko (Yurii); G. Aksu (Guzide); U. Renne (Ulla); K. Brockmann

    2006-01-01

    textabstractThe extreme high-body-weight-selected mouse line DU6i is a polygenic model for growth research, harboring many small-effect QTL. We dissected the genome of this line into 19 autosomes and the Y chromosome by the construction of a new panel of chromosome substitution strains (CSS). The

  4. Functional organization of the genome may shape the species boundary in the house mouse

    Czech Academy of Sciences Publication Activity Database

    Janoušek, Václav; Munclinger, P.; Wang, L.; Teeter, K. C.; Tucker, P. K.

    2015-01-01

    Roč. 32, č. 5 (2015), s. 1208-1220 ISSN 0737-4038 R&D Projects: GA MŠk EE2.3.20.0303 Institutional support: RVO:68081766 Keywords : hybrid zone * mouse genome * speciation Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 13.649, year: 2015

  5. Towards precision medicine-based therapies for glioblastoma: interrogating human disease genomics and mouse phenotypes.

    Science.gov (United States)

    Chen, Yang; Gao, Zhen; Wang, Bingcheng; Xu, Rong

    2016-08-22

    Glioblastoma (GBM) is the most common and aggressive brain tumors. It has poor prognosis even with optimal radio- and chemo-therapies. Since GBM is highly heterogeneous, drugs that target on specific molecular profiles of individual tumors may achieve maximized efficacy. Currently, the Cancer Genome Atlas (TCGA) projects have identified hundreds of GBM-associated genes. We develop a drug repositioning approach combining disease genomics and mouse phenotype data towards predicting targeted therapies for GBM. We first identified disease specific mouse phenotypes using the most recently discovered GBM genes. Then we systematically searched all FDA-approved drugs for candidates that share similar mouse phenotype profiles with GBM. We evaluated the ranks for approved and novel GBM drugs, and compared with an existing approach, which also use the mouse phenotype data but not the disease genomics data. We achieved significantly higher ranks for the approved and novel GBM drugs than the earlier approach. For all positive examples of GBM drugs, we achieved a median rank of 9.2 45.6 of the top predictions have been demonstrated effective in inhibiting the growth of human GBM cells. We developed a computational drug repositioning approach based on both genomic and phenotypic data. Our approach prioritized existing GBM drugs and outperformed a recent approach. Overall, our approach shows potential in discovering new targeted therapies for GBM.

  6. Accessing the SEED genome databases via Web services API: tools for programmers.

    Science.gov (United States)

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-06-14

    The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  7. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  8. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.

    Science.gov (United States)

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2013-01-01

    The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.

  9. Investigation of mutations in the HBB gene using the 1,000 genomes database.

    Science.gov (United States)

    Carlice-Dos-Reis, Tânia; Viana, Jaime; Moreira, Fabiano Cordeiro; Cardoso, Greice de Lemos; Guerreiro, João; Santos, Sidney; Ribeiro-Dos-Santos, Ândrea

    2017-01-01

    Mutations in the HBB gene are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Sickle cell anemia is one of the most common monogenic diseases worldwide. Due to its prevalence, diverse strategies have been developed for a better understanding of its molecular mechanisms. In silico analysis has been increasingly used to investigate the genotype-phenotype relationship of many diseases, and the sequences of healthy individuals deposited in the 1,000 Genomes database appear to be an excellent tool for such analysis. The objective of this study is to analyze the variations in the HBB gene in the 1,000 Genomes database, to describe the mutation frequencies in the different population groups, and to investigate the pattern of pathogenicity. The computational tool SNPEFF was used to align the data from 2,504 samples of the 1,000 Genomes database with the HG19 genome reference. The pathogenicity of each amino acid change was investigated using the databases CLINVAR, dbSNP and HbVar and five different predictors. Twenty different mutations were found in 209 healthy individuals. The African group had the highest number of individuals with mutations, and the European group had the lowest number. Thus, it is concluded that approximately 8.3% of phenotypically healthy individuals from the 1,000 Genomes database have some mutation in the HBB gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, which 186 (7.4%) have a deleterious mutation. Considering that the 1,000 Genomes database is representative of the world's population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation.

  10. Update History of This Database - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...B link & Genome analysis methods English archive site is opened. 2012/08/08 PGDBj... Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods is opened. About This...ate History of This Database - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  11. Genomic locus modulating corneal thickness in the mouse identifies POU6F2 as a potential risk of developing glaucoma.

    Directory of Open Access Journals (Sweden)

    Rebecca King

    2018-01-01

    Full Text Available Central corneal thickness (CCT is one of the most heritable ocular traits and it is also a phenotypic risk factor for primary open angle glaucoma (POAG. The present study uses the BXD Recombinant Inbred (RI strains to identify novel quantitative trait loci (QTLs modulating CCT in the mouse with the potential of identifying a molecular link between CCT and risk of developing POAG. The BXD RI strain set was used to define mammalian genomic loci modulating CCT, with a total of 818 corneas measured from 61 BXD RI strains (between 60-100 days of age. The mice were anesthetized and the eyes were positioned in front of the lens of the Phoenix Micron IV Image-Guided OCT system or the Bioptigen OCT system. CCT data for each strain was averaged and used to QTLs modulating this phenotype using the bioinformatics tools on GeneNetwork (www.genenetwork.org. The candidate genes and genomic loci identified in the mouse were then directly compared with the summary data from a human POAG genome wide association study (NEIGHBORHOOD to determine if any genomic elements modulating mouse CCT are also risk factors for POAG.This analysis revealed one significant QTL on Chr 13 and a suggestive QTL on Chr 7. The significant locus on Chr 13 (13 to 19 Mb was examined further to define candidate genes modulating this eye phenotype. For the Chr 13 QTL in the mouse, only one gene in the region (Pou6f2 contained nonsynonymous SNPs. Of these five nonsynonymous SNPs in Pou6f2, two resulted in changes in the amino acid proline which could result in altered secondary structure affecting protein function. The 7 Mb region under the mouse Chr 13 peak distributes over 2 chromosomes in the human: Chr 1 and Chr 7. These genomic loci were examined in the NEIGHBORHOOD database to determine if they are potential risk factors for human glaucoma identified using meta-data from human GWAS. The top 50 hits all resided within one gene (POU6F2, with the highest significance level of p = 10-6 for

  12. H2DB: a heritability database across multiple species by annotating trait-associated genomic loci.

    Science.gov (United States)

    Kaminuma, Eli; Fujisawa, Takatomo; Tanizawa, Yasuhiro; Sakamoto, Naoko; Kurata, Nori; Shimizu, Tokurou; Nakamura, Yasukazu

    2013-01-01

    H2DB (http://tga.nig.ac.jp/h2db/), an annotation database of genetic heritability estimates for humans and other species, has been developed as a knowledge database to connect trait-associated genomic loci. Heritability estimates have been investigated for individual species, particularly in human twin studies and plant/animal breeding studies. However, there appears to be no comprehensive heritability database for both humans and other species. Here, we introduce an annotation database for genetic heritabilities of various species that was annotated by manually curating online public resources in PUBMED abstracts and journal contents. The proposed heritability database contains attribute information for trait descriptions, experimental conditions, trait-associated genomic loci and broad- and narrow-sense heritability specifications. Annotated trait-associated genomic loci, for which most are single-nucleotide polymorphisms derived from genome-wide association studies, may be valuable resources for experimental scientists. In addition, we assigned phenotype ontologies to the annotated traits for the purposes of discussing heritability distributions based on phenotypic classifications.

  13. A Ruby API to query the Ensembl database for genomic features.

    Science.gov (United States)

    Strozzi, Francesco; Aerts, Jan

    2011-04-01

    The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api.

  14. A DNMT3A2-HDAC2 Complex Is Essential for Genomic Imprinting and Genome Integrity in Mouse Oocytes

    Directory of Open Access Journals (Sweden)

    Pengpeng Ma

    2015-11-01

    Full Text Available Maternal genomic imprints are established during oogenesis. Histone deacetylases (HDACs 1 and 2 are required for oocyte development in mouse, but their role in genomic imprinting is unknown. We find that Hdac1:Hdac2−/− double-mutant growing oocytes exhibit global DNA hypomethylation and fail to establish imprinting marks for Igf2r, Peg3, and Srnpn. Global hypomethylation correlates with increased retrotransposon expression and double-strand DNA breaks. Nuclear-associated DNMT3A2 is reduced in double-mutant oocytes, and injecting these oocytes with Hdac2 partially restores DNMT3A2 nuclear staining. DNMT3A2 co-immunoprecipitates with HDAC2 in mouse embryonic stem cells. Partial loss of nuclear DNMT3A2 and HDAC2 occurs in Sin3a−/− oocytes, which exhibit decreased DNA methylation of imprinting control regions for Igf2r and Srnpn, but not Peg3. These results suggest seminal roles of HDAC1/2 in establishing maternal genomic imprints and maintaining genomic integrity in oocytes mediated in part through a SIN3A complex that interacts with DNMT3A2.

  15. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics

    OpenAIRE

    Verma, Mohit; Kumar, Vinay; Patel, Ravi K.; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database fea...

  16. Genome-Wide Expression Profiling of Five Mouse Models Identifies Similarities and Differences with Human Psoriasis

    Science.gov (United States)

    Swindell, William R.; Johnston, Andrew; Carbajal, Steve; Han, Gangwen; Wohn, Christian; Lu, Jun; Xing, Xianying; Nair, Rajan P.; Voorhees, John J.; Elder, James T.; Wang, Xiao-Jing; Sano, Shigetoshi; Prens, Errol P.; DiGiovanni, John; Pittelkow, Mark R.; Ward, Nicole L.; Gudjonsson, Johann E.

    2011-01-01

    Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the human disease, and standardized validation criteria for psoriasis mouse models have not been widely applied. In this study, whole-genome transcriptional profiling is used to compare gene expression patterns manifested by human psoriatic skin lesions with those that occur in five psoriasis mouse models (K5-Tie2, imiquimod, K14-AREG, K5-Stat3C and K5-TGFbeta1). While the cutaneous gene expression profiles associated with each mouse phenotype exhibited statistically significant similarity to the expression profile of psoriasis in humans, each model displayed distinctive sets of similarities and differences in comparison to human psoriasis. For all five models, correspondence to the human disease was strong with respect to genes involved in epidermal development and keratinization. Immune and inflammation-associated gene expression, in contrast, was more variable between models as compared to the human disease. These findings support the value of all five models as research tools, each with identifiable areas of convergence to and divergence from the human disease. Additionally, the approach used in this paper provides an objective and quantitative method for evaluation of proposed mouse models of psoriasis, which can be strategically applied in future studies to score strengths of mouse phenotypes relative to specific aspects of human psoriasis. PMID:21483750

  17. RICD: A rice indica cDNA database resource for rice functional genomics

    Directory of Open Access Journals (Sweden)

    Zhang Qifa

    2008-11-01

    Full Text Available Abstract Background The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Results Rice Indica cDNA Database (RICD is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. Conclusion The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.

  18. Systematic discovery of unannotated genes in 11 yeast species using a database of orthologous genomic segments

    LENUS (Irish Health Repository)

    OhEigeartaigh, Sean S

    2011-07-26

    Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external

  19. HpBase: A genome database of a sea urchin, Hemicentrotus pulcherrimus.

    Science.gov (United States)

    Kinjo, Sonoko; Kiyomoto, Masato; Yamamoto, Takashi; Ikeo, Kazuho; Yaguchi, Shunsuke

    2018-04-01

    To understand the mystery of life, it is important to accumulate genomic information for various organisms because the whole genome encodes the commands for all the genes. Since the genome of Strongylocentrotus purpratus was sequenced in 2006 as the first sequenced genome in echinoderms, the genomic resources of other North American sea urchins have gradually been accumulated, but no sea urchin genomes are available in other areas, where many scientists have used the local species and reported important results. In this manuscript, we report a draft genome of the sea urchin Hemincentrotus pulcherrimus because this species has a long history as the target of developmental and cell biology in East Asia. The genome of H. pulcherrimus was assembled into 16,251 scaffold sequences with an N50 length of 143 kbp, and approximately 25,000 genes were identified in the genome. The size of the genome and the sequencing coverage were estimated to be approximately 800 Mbp and 100×, respectively. To provide these data and information of annotation, we constructed a database, HpBase (http://cell-innovation.nig.ac.jp/Hpul/). In HpBase, gene searches, genome browsing, and blast searches are available. In addition, HpBase includes the "recipes" for experiments from each lab using H. pulcherrimus. These recipes will continue to be updated according to the circumstances of individual scientists and can be powerful tools for experimental biologists and for the community. HpBase is a suitable dataset for evolutionary, developmental, and cell biologists to compare H. pulcherrimus genomic information with that of other species and to isolate gene information. © 2018 Japanese Society of Developmental Biologists.

  20. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  1. A Utility Maximizing and Privacy Preserving Approach for Protecting Kinship in Genomic Databases.

    Science.gov (United States)

    Kale, Gulce; Ayday, Erman; Tastan, Oznur

    2017-09-12

    Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship. We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed.We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data. Available at: https://github.com/tastanlab/Kinship-Privacy. erman@cs.bilkent.edu.tr or oznur.tastan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    Science.gov (United States)

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1Office of Research and Developmen...

  3. MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes.

    Science.gov (United States)

    Ardin, Maude; Cahais, Vincent; Castells, Xavier; Bouaoun, Liacine; Byrnes, Graham; Herceg, Zdenko; Zavadil, Jiri; Olivier, Magali

    2016-04-18

    The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec. MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool. MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

  4. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Ussery, David

    2004-01-01

    , these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web...... and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently...... content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues....

  5. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  6. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    Directory of Open Access Journals (Sweden)

    Mohit Verma

    Full Text Available Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB, which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology search and comparative gene expression analysis. The current release of CTDB (v2.0 hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.

  7. GenomeRNAi: a database for cell-based RNAi phenotypes.

    Science.gov (United States)

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.

  8. PairWise Neighbours database: overlaps and spacers among prokaryote genomes

    Directory of Open Access Journals (Sweden)

    Garcia-Vallvé Santiago

    2009-06-01

    Full Text Available Abstract Background Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD sequence, which is responsible of an efficient translation. Description PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL http://genomes.urv.cat/pwneigh. This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence. Conclusion There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites.

  9. Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes

    Science.gov (United States)

    Rowley, Jesse W.; Oler, Andrew J.; Tolley, Neal D.; Hunter, Benjamin N.; Low, Elizabeth N.; Nix, David A.; Yost, Christian C.; Zimmerman, Guy A.

    2011-01-01

    Inbred mice are a useful tool for studying the in vivo functions of platelets. Nonetheless, the mRNA signature of mouse platelets is not known. Here, we use paired-end next-generation RNA sequencing (RNA-seq) to characterize the polyadenylated transcriptomes of human and mouse platelets. We report that RNA-seq provides unprecedented resolution of mRNAs that are expressed across the entire human and mouse genomes. Transcript expression and abundance are often conserved between the 2 species. Several mRNAs, however, are differentially expressed in human and mouse platelets. Moreover, previously described functional disparities between mouse and human platelets are reflected in differences at the transcript level, including protease activated receptor-1, protease activated receptor-3, platelet activating factor receptor, and factor V. This suggests that RNA-seq is a useful tool for predicting differences in platelet function between mice and humans. Our next-generation sequencing analysis provides new insights into the human and murine platelet transcriptomes. The sequencing dataset will be useful in the design of mouse models of hemostasis and a catalyst for discovery of new functions of platelets. Access to the dataset is found in the “Introduction.” PMID:21596849

  10. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R.; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-01-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  11. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi

    2016-12-24

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  12. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk.

    Science.gov (United States)

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B; Huson, Daniel H; Frick, Julia-Stefanie

    2016-04-25

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    Science.gov (United States)

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  14. PGG.Population: a database for understanding the genomic diversity and genetic ancestry of human populations.

    Science.gov (United States)

    Zhang, Chao; Gao, Yang; Liu, Jiaojiao; Xue, Zhe; Lu, Yan; Deng, Lian; Tian, Lei; Feng, Qidi; Xu, Shuhua

    2018-01-04

    There are a growing number of studies focusing on delineating genetic variations that are associated with complex human traits and diseases due to recent advances in next-generation sequencing technologies. However, identifying and prioritizing disease-associated causal variants relies on understanding the distribution of genetic variations within and among populations. The PGG.Population database documents 7122 genomes representing 356 global populations from 107 countries and provides essential information for researchers to understand human genomic diversity and genetic ancestry. These data and information can facilitate the design of research studies and the interpretation of results of both evolutionary and medical studies involving human populations. The database is carefully maintained and constantly updated when new data are available. We included miscellaneous functions and a user-friendly graphical interface for visualization of genomic diversity, population relationships (genetic affinity), ancestral makeup, footprints of natural selection, and population history etc. Moreover, PGG.Population provides a useful feature for users to analyze data and visualize results in a dynamic style via online illustration. The long-term ambition of the PGG.Population, together with the joint efforts from other researchers who contribute their data to our database, is to create a comprehensive depository of geographic and ethnic variation of human genome, as well as a platform bringing influence on future practitioners of medicine and clinical investigators. PGG.Population is available at https://www.pggpopulation.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Integration of mouse and human genome-wide association data identifies KCNIP4 as an asthma gene

    NARCIS (Netherlands)

    Himes, Blanca E.; Sheppard, Keith; Berndt, Annerose; Leme, Adriana S.; Myers, Rachel A.; Gignoux, Christopher R.; Levin, Albert M.; Gauderman, W. James; Yang, James J.; Mathias, Rasika A.; Romieu, Isabelle; Torgerson, Dara G.; Roth, Lindsey A.; Huntsman, Scott; Eng, Celeste; Klanderman, Barbara; Ziniti, John; Senter-Sylvia, Jody; Szefler, Stanley J.; Lemanske, Robert F.; Zeiger, Robert S.; Strunk, Robert C.; Martinez, Fernando D.; Boushey, Homer; Chinchilli, Vernon M.; Israel, Elliot; Mauger, David; Koppelman, Gerard H.; Postma, Dirkje S.; Nieuwenhuis, Maartje A. E.; Vonk, Judith M.; Lima, John J.; Irvin, Charles G.; Peters, Stephen P.; Kubo, Michiaki; Tamari, Mayumi; Nakamura, Yusuke; Litonjua, Augusto A.; Tantisira, Kelan G.; Raby, Benjamin A.; Bleecker, Eugene R.; Meyers, Deborah A.; London, Stephanie J.; Barnes, Kathleen C.; Gilliland, Frank D.; Williams, L. Keoki; Burchard, Esteban G.; Nicolae, Dan L.; Ober, Carole; DeMeo, Dawn L.; Silverman, Edwin K.; Paigen, Beverly; Churchill, Gary; Shapiro, Steve D.; Weiss, Scott

    2013-01-01

    Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR). The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify

  16. A SNP-centric database for the investigation of the human genome

    Directory of Open Access Journals (Sweden)

    Kohane Isaac S

    2004-03-01

    Full Text Available Abstract Background Single Nucleotide Polymorphisms (SNPs are an increasingly important tool for genetic and biomedical research. Although current genomic databases contain information on several million SNPs and are growing at a very fast rate, the true value of a SNP in this context is a function of the quality of the annotations that characterize it. Retrieving and analyzing such data for a large number of SNPs often represents a major bottleneck in the design of large-scale association studies. Description SNPper is a web-based application designed to facilitate the retrieval and use of human SNPs for high-throughput research purposes. It provides a rich local database generated by combining SNP data with the Human Genome sequence and with several other data sources, and offers the user a variety of querying, visualization and data export tools. In this paper we describe the structure and organization of the SNPper database, we review the available data export and visualization options, and we describe how the architecture of SNPper and its specialized data structures support high-volume SNP analysis. Conclusions The rich annotation database and the powerful data manipulation and presentation facilities it offers make SNPper a very useful online resource for SNP research. Its success proves the great need for integrated and interoperable resources in the field of computational biology, and shows how such systems may play a critical role in supporting the large-scale computational analysis of our genome.

  17. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    Science.gov (United States)

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem.

  18. The genomic landscape shaped by selection on transposable elements across 18 mouse strains.

    Science.gov (United States)

    Nellåker, Christoffer; Keane, Thomas M; Yalcin, Binnaz; Wong, Kim; Agam, Avigail; Belgard, T Grant; Flint, Jonathan; Adams, David J; Frankel, Wayne N; Ponting, Chris P

    2012-06-15

    Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.

  19. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    Science.gov (United States)

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.

  20. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    Directory of Open Access Journals (Sweden)

    Arun Prabhu Dhanapal

    2015-01-01

    Full Text Available The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away.

  1. CpGislandEVO: A Database and Genome Browser for Comparative Evolutionary Genomics of CpG Islands

    Directory of Open Access Journals (Sweden)

    Guillermo Barturen

    2013-01-01

    Full Text Available Hypomethylated, CpG-rich DNA segments (CpG islands, CGIs are epigenome markers involved in key biological processes. Aberrant methylation is implicated in the appearance of several disorders as cancer, immunodeficiency, or centromere instability. Furthermore, methylation differences at promoter regions between human and chimpanzee strongly associate with genes involved in neurological/psychological disorders and cancers. Therefore, the evolutionary comparative analyses of CGIs can provide insights on the functional role of these epigenome markers in both health and disease. Given the lack of specific tools, we developed CpGislandEVO. Briefly, we first compile a database of statistically significant CGIs for the best assembled mammalian genome sequences available to date. Second, by means of a coupled browser front-end, we focus on the CGIs overlapping orthologous genes extracted from OrthoDB, thus ensuring the comparison between CGIs located on truly homologous genome segments. This allows comparing the main compositional features between homologous CGIs. Finally, to facilitate nucleotide comparisons, we lifted genome coordinates between assemblies from different species, which enables the analysis of sequence divergence by direct count of nucleotide substitutions and indels occurring between homologous CGIs. The resulting CpGislandEVO database, linking together CGIs and single-cytosine DNA methylation data from several mammalian species, is freely available at our website.

  2. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    Science.gov (United States)

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  3. SoyTEdb: a comprehensive database of transposable elements in the soybean genome

    Directory of Open Access Journals (Sweden)

    Zhu Liucun

    2010-02-01

    Full Text Available Abstract Background Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop. Description Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I and 6,029 DNA transposons (Class II with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95% of these elements (particularly a few hundred low-copy-number families are first described in this study. Conclusion SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually

  4. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  5. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  6. Translating human genetics into mouse: the impact of ultra-rapid in vivo genome editing.

    Science.gov (United States)

    Aida, Tomomi; Imahashi, Risa; Tanaka, Kohichi

    2014-01-01

    Gene-targeted mutant animals, such as knockout or knockin mice, have dramatically improved our understanding of the functions of genes in vivo and the genetic diversity that characterizes health and disease. However, the generation of targeted mice relies on gene targeting in embryonic stem (ES) cells, which is a time-consuming, laborious, and expensive process. The recent groundbreaking development of several genome editing technologies has enabled the targeted alteration of almost any sequence in any cell or organism. These technologies have now been applied to mouse zygotes (in vivo genome editing), thereby providing new avenues for simple, convenient, and ultra-rapid production of knockout or knockin mice without the need for ES cells. Here, we review recent achievements in the production of gene-targeted mice by in vivo genome editing. © 2013 The Authors Development, Growth & Differentiation © 2013 Japanese Society of Developmental Biologists.

  7. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    Science.gov (United States)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  8. EchoBASE: an integrated post-genomic database for Escherichia coli.

    Science.gov (United States)

    Misra, Raju V; Horler, Richard S P; Reindl, Wolfgang; Goryanin, Igor I; Thomas, Gavin H

    2005-01-01

    EchoBASE (http://www.ecoli-york.org) is a relational database designed to contain and manipulate information from post-genomic experiments using the model bacterium Escherichia coli K-12. Its aim is to collate information from a wide range of sources to provide clues to the functions of the approximately 1500 gene products that have no confirmed cellular function. The database is built on an enhanced annotation of the updated genome sequence of strain MG1655 and the association of experimental data with the E.coli genes and their products. Experiments that can be held within EchoBASE include proteomics studies, microarray data, protein-protein interaction data, structural data and bioinformatics studies. EchoBASE also contains annotated information on 'orphan' enzyme activities from this microbe to aid characterization of the proteins that catalyse these elusive biochemical reactions.

  9. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  10. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    Directory of Open Access Journals (Sweden)

    James J Davis

    2016-02-01

    Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  11. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  12. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    Science.gov (United States)

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  13. Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution.

    Science.gov (United States)

    Denas, Olgert; Sandstrom, Richard; Cheng, Yong; Beal, Kathryn; Herrero, Javier; Hardison, Ross C; Taylor, James

    2015-02-14

    Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood. We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function. Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos - target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the

  14. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    Science.gov (United States)

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.

    Science.gov (United States)

    Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh

    2013-01-01

    Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/

  16. Updates to the Cool Season Food Legume Genome Database: Resources for pea, lentil, faba bean and chickpea genetics, genomics and breeding

    Science.gov (United States)

    The Cool Season Food Legume Genome database (CSFL, www.coolseasonfoodlegume.org) is an online resource for genomics, genetics, and breeding research for chickpea, lentil,pea, and faba bean. The user-friendly and curated website allows for all publicly available map,marker,trait, gene,transcript, ger...

  17. GATM, the human ortholog of the mouse imprinted Gatm gene, escapes genomic imprinting in placenta

    Directory of Open Access Journals (Sweden)

    Toshinobu Miyamoto

    2005-03-01

    Full Text Available The GATM gene encodes L-arginine:glycine amidinotransferase, which catalyzes the conversion of L-arginine into guanidinoacetate, the rate-limiting step in the synthesis of creatine. Since, deficiencies in creatine synthesis and transport lead to certain forms of mental retardation in human, the human GATM gene appears to be involved in brain development. Recently it has been demonstrated that the mouse Gatm is expressed during development and is imprinted with maternal expression in the placenta and yolk sac, but not in embryonic tissues. We investigated the imprinting status of the human GATM by analyzing its expression in four human placentas. GATM was biallelically expressed, thus suggesting that this gene escapes genomic imprinting in placentas, differently from what has been reported in mouse extra-embryonic tissues.

  18. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Ratnere, Igor; Wolf, Yuri I.; Koonin, Eugene V.; Dubchak, Inna

    2009-07-23

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov.

  19. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    Science.gov (United States)

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  20. Genomic targets of Brachyury (T in differentiating mouse embryonic stem cells.

    Directory of Open Access Journals (Sweden)

    Amanda L Evans

    Full Text Available The T-box transcription factor Brachyury (T is essential for formation of the posterior mesoderm and the notochord in vertebrate embryos. Work in the frog and the zebrafish has identified some direct genomic targets of Brachyury, but little is known about Brachyury targets in the mouse.Here we use chromatin immunoprecipitation and mouse promoter microarrays to identify targets of Brachyury in embryoid bodies formed from differentiating mouse ES cells. The targets we identify are enriched for sequence-specific DNA binding proteins and include components of signal transduction pathways that direct cell fate in the primitive streak and tailbud of the early embryo. Expression of some of these targets, such as Axin2, Fgf8 and Wnt3a, is down regulated in Brachyury mutant embryos and we demonstrate that they are also Brachyury targets in the human. Surprisingly, we do not observe enrichment of the canonical T-domain DNA binding sequence 5'-TCACACCT-3' in the vicinity of most Brachyury target genes. Rather, we have identified an (AC(n repeat sequence, which is conserved in the rat but not in human, zebrafish or Xenopus. We do not understand the significance of this sequence, but speculate that it enhances transcription factor binding in the regulatory regions of Brachyury target genes in rodents.Our work identifies the genomic targets of a key regulator of mesoderm formation in the early mouse embryo, thereby providing insights into the Brachyury-driven genetic regulatory network and allowing us to compare the function of Brachyury in different species.

  1. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Science.gov (United States)

    Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

    2018-01-04

    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  2. Genomic localization of the Z/EG transgene in the mouse genome.

    Science.gov (United States)

    Colombo, Sophie; Kumasaka, Mayuko; Lobe, Corrinne; Larue, Lionel

    2010-02-01

    The Z/EG transgenic mouse line, produced by Novak et al., displays tissue-specific EGFP expression after Cre-mediated recombination. The autofluorescence of EGFP allows the visualization of cells of interest displaying Cre recombination. The initial construct was designed such that cells without Cre recombination express the beta-galactosidase marker, facilitating counterselection. We used inverse PCR to identify the site of integration of the Z/EG transgene, to improve the efficiency of homozygous Z/EG mouse production. Recombined cells produced large amounts of EGFP protein, resulting in higher levels of fluorescence and therefore greater contrast with nonrecombined cells. We mapped the transgene to the G1 region of chromosome 5. This random insertion was found to have occurred 230-bp upstream from the start codon of the Rasa4 gene. The insertion of the Z/EG transgene in the C57BL/6 genetic background had no effect on Rasa4 expression. Homozygous Z/EG mice therefore had no obvious phenotype. (c) 2009 Wiley-Liss, Inc.

  3. Databases

    Digital Repository Service at National Institute of Oceanography (India)

    Kunte, P.D.

    Information on bibliographic as well as numeric/textual databases relevant to coastal geomorphology has been included in a tabular form. Databases cover a broad spectrum of related subjects like coastal environment and population aspects, coastline...

  4. Developing genomic knowledge bases and databases to support clinical management: current perspectives.

    Science.gov (United States)

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward.

  5. A database of PCR primers for the chloroplast genomes of higher plants

    Science.gov (United States)

    Heinze, Berthold

    2007-01-01

    Background Chloroplast genomes evolve slowly and many primers for PCR amplification and analysis of chloroplast sequences can be used across a wide array of genera. In some cases 'universal' primers have been designed for the purpose of working across species boundaries. However, the essential information on these primer sequences is scattered throughout the literature. Results A database is presented here which assembles published primer information for chloroplast DNA. Additional primers were designed to fill gaps where little or no primer information could be found. Amplicons are either the genes themselves (typically useful in studies of sequence variation in higher-order phylogeny) or they are spacers, introns, and intergenic regions (for studies of phylogeographic patterns within and among species). The current list of 'generic' primers consists of more than 700 sequences. Wherever possible, we give the locations of the primers in the thirteen fully sequenced chloroplast genomes (Nicotiana tabacum, Atropa belladonna, Spinacia oleracea, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, Pinus thunbergii, Marchantia polymorpha, Zea mays, Oenothera elata, Acorus calamus, Eucalyptus globulus, Medicago trunculata). Conclusion The database described here is designed to serve as a resource for researchers who are venturing into the study of poorly described chloroplast genomes, whether for large- or small-scale DNA sequencing projects, to study molecular variation or to investigate chloroplast evolution. PMID:17326828

  6. A database of PCR primers for the chloroplast genomes of higher plants

    Directory of Open Access Journals (Sweden)

    Heinze Berthold

    2007-02-01

    Full Text Available Abstract Background Chloroplast genomes evolve slowly and many primers for PCR amplification and analysis of chloroplast sequences can be used across a wide array of genera. In some cases 'universal' primers have been designed for the purpose of working across species boundaries. However, the essential information on these primer sequences is scattered throughout the literature. Results A database is presented here which assembles published primer information for chloroplast DNA. Additional primers were designed to fill gaps where little or no primer information could be found. Amplicons are either the genes themselves (typically useful in studies of sequence variation in higher-order phylogeny or they are spacers, introns, and intergenic regions (for studies of phylogeographic patterns within and among species. The current list of 'generic' primers consists of more than 700 sequences. Wherever possible, we give the locations of the primers in the thirteen fully sequenced chloroplast genomes (Nicotiana tabacum, Atropa belladonna, Spinacia oleracea, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, Pinus thunbergii, Marchantia polymorpha, Zea mays, Oenothera elata, Acorus calamus, Eucalyptus globulus, Medicago trunculata. Conclusion The database described here is designed to serve as a resource for researchers who are venturing into the study of poorly described chloroplast genomes, whether for large- or small-scale DNA sequencing projects, to study molecular variation or to investigate chloroplast evolution.

  7. Visualizing information across multidimensional post-genomic structured and textual databases.

    Science.gov (United States)

    Tao, Ying; Friedman, Carol; Lussier, Yves A

    2005-04-15

    Visualizing relationships among biological information to facilitate understanding is crucial to biological research during the post-genomic era. Although different systems have been developed to view gene-phenotype relationships for specific databases, very few have been designed specifically as a general flexible tool for visualizing multidimensional genotypic and phenotypic information together. Our goal is to develop a method for visualizing multidimensional genotypic and phenotypic information and a model that unifies different biological databases in order to present the integrated knowledge using a uniform interface. We developed a novel, flexible and generalizable visualization tool, called PhenoGenesviewer (PGviewer), which in this paper was used to display gene-phenotype relationships from a human-curated database (OMIM) and from an automatic method using a Natural Language Processing tool called BioMedLEE. Data obtained from multiple databases were first integrated into a uniform structure and then organized by PGviewer. PGviewer provides a flexible query interface that allows dynamic selection and ordering of any desired dimension in the databases. Based on users' queries, results can be visualized using hierarchical expandable trees that present views specified by users according to their research interests. We believe that this method, which allows users to dynamically organize and visualize multiple dimensions, is a potentially powerful and promising tool that should substantially facilitate biological research. PhenogenesViewer as well as its support and tutorial are available at http://www.dbmi.columbia.edu/pgviewer/ Lussier@dbmi.columbia.edu.

  8. The phytophthora genome initiative database: informatics and analysis for distributed pathogenomic research.

    Science.gov (United States)

    Waugh, M; Hraber, P; Weller, J; Wu, Y; Chen, G; Inman, J; Kiphart, D; Sobral, B

    2000-01-01

    The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi, with database access through the database access hyperlink.

  9. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    Science.gov (United States)

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.

  10. Genome characterization of the selected long- and short-sleep mouse lines.

    Science.gov (United States)

    Dowell, Robin; Odell, Aaron; Richmond, Phillip; Malmer, Daniel; Halper-Stromberg, Eitan; Bennett, Beth; Larson, Colin; Leach, Sonia; Radcliffe, Richard A

    2016-12-01

    The Inbred Long- and Short-Sleep (ILS, ISS) mouse lines were selected for differences in acute ethanol sensitivity using the loss of righting response (LORR) as the selection trait. The lines show an over tenfold difference in LORR and, along with a recombinant inbred panel derived from them (the LXS), have been widely used to dissect the genetic underpinnings of acute ethanol sensitivity. Here we have sequenced the genomes of the ILS and ISS to investigate the DNA variants that contribute to their sensitivity difference. We identified ~2.7 million high-confidence SNPs and small indels and ~7000 structural variants between the lines; variants were found to occur in 6382 annotated genes. Using a hidden Markov model, we were able to reconstruct the genome-wide ancestry patterns of the eight inbred progenitor strains from which the ILS and ISS were derived, and found that quantitative trait loci that have been mapped for LORR were slightly enriched for DNA variants. Finally, by mapping and quantifying RNA-seq reads from the ILS and ISS to their strain-specific genomes rather than to the reference genome, we found a substantial improvement in a differential expression analysis between the lines. This work will help in identifying and characterizing the DNA sequence variants that contribute to the difference in ethanol sensitivity between the ILS and ISS and will also aid in accurate quantification of RNA-seq data generated from the LXS RIs.

  11. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

    Science.gov (United States)

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437

  12. Novel mouse model recapitulates genome and transcriptome alterations in human colorectal carcinomas.

    Science.gov (United States)

    McNeil, Nicole E; Padilla-Nash, Hesed M; Buishand, Floryne O; Hue, Yue; Ried, Thomas

    2017-03-01

    Human colorectal carcinomas are defined by a nonrandom distribution of genomic imbalances that are characteristic for this disease. Often, these imbalances affect entire chromosomes. Understanding the role of these aneuploidies for carcinogenesis is of utmost importance. Currently, established transgenic mice do not recapitulate the pathognonomic genome aberration profile of human colorectal carcinomas. We have developed a novel model based on the spontaneous transformation of murine colon epithelial cells. During this process, cells progress through stages of pre-immortalization, immortalization and, finally, transformation, and result in tumors when injected into immunocompromised mice. We analyzed our model for genome and transcriptome alterations using ArrayCGH, spectral karyotyping (SKY), and array based gene expression profiling. ArrayCGH revealed a recurrent pattern of genomic imbalances. These results were confirmed by SKY. Comparing these imbalances with orthologous maps of human chromosomes revealed a remarkable overlap. We observed focal deletions of the tumor suppressor genes Trp53 and Cdkn2a/p16. High-level focal genomic amplification included the locus harboring the oncogene Mdm2, which was confirmed by FISH in the form of double minute chromosomes. Array-based global gene expression revealed distinct differences between the sequential steps of spontaneous transformation. Gene expression changes showed significant similarities with human colorectal carcinomas. Pathways most prominently affected included genes involved in chromosomal instability and in epithelial to mesenchymal transition. Our novel mouse model therefore recapitulates the most prominent genome and transcriptome alterations in human colorectal cancer, and might serve as a valuable tool for understanding the dynamic process of tumorigenesis, and for preclinical drug testing. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  13. Cre Fused with RVG Peptide Mediates Targeted Genome Editing in Mouse Brain Cells In Vivo.

    Science.gov (United States)

    Zou, Zhiyuan; Sun, Zhaolin; Li, Pan; Feng, Tao; Wu, Sen

    2016-12-14

    Cell penetrating peptides (CPPs) are short peptides that can pass through cell membranes. CPPs can facilitate the cellular entry of proteins, macromolecules, nanoparticles and drugs. RVG peptide (RVG hereinafter) is a 29-amino-acid CPP derived from a rabies virus glycoprotein that can cross the blood-brain barrier (BBB) and enter brain cells. However, whether RVG can be used for genome editing in the brain has not been reported. In this work, we combined RVG with Cre recombinase for bacterial expression. The purified RVG-Cre protein cut plasmids in vitro and traversed cell membranes in cultured Neuro2a cells. By tail vein-injecting RVG-Cre into Cre reporter mouse lines mTmG and Rosa26 lacZ , we demonstrated that RVG-Cre could target brain cells and achieve targeted somatic genome editing in adult mice. This direct delivery of the gene-editing enzyme protein into mouse brains with RVG is much safer than plasmid- or viral-based methods, holding promise for further applications in the treatment of various brain diseases.

  14. Zebrafish syntenic relationship to human/mouse genomes revealed by radiation hybrid mapping

    International Nuclear Information System (INIS)

    Samonte, Irene E.

    2007-01-01

    Zebrafish (Danio rerio) is an excellent model system for vertebrate developmental analysis and a new model for human disorders. In this study, however, zebrafish was used to determine its syntenic relationship to human/mouse genomes using the zebrafish-hamster radiation hybrid panel. The focus was on genes residing on chromosomes 6 and 17 of human and mouse, respectively, and some other genes of either immunologic or evolutionary importance. Gene sequences of interest and zebrafish expressed sequence tags deposited in the GenBank were used in identifying zebrafish homologs. Polymerase chain reaction (PCR) amplification, cloning and subcloning, sequencing, and phylogenetic analysis were done to confirm the homology of the candidate genes in zebrafish. The promising markers were then tested in the 94 zebrafish-hamster radiation hybrid panel cell lines and submitted for logarithm of the odds (LOD) score analysis to position genes on the zebrafish map. A total of 19 loci were successfully mapped to zebrafish linkage groups 1, 14, 15, 19, and 20. Four of these loci were positioned in linkage group 20, whereas, 3 more loci were added in linkage group 19, thus increasing to 34 loci the number of human genes syntenic to the group. With the sequencing of the zebrafish genome, about 20 more MHC genes were reported linked on the same group. (Author)

  15. Cyclone: java-based querying and computing with Pathway/Genome databases.

    Science.gov (United States)

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  16. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    Science.gov (United States)

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  17. Genome-wide screen for universal individual identification SNPs based on the HapMap and 1000 Genomes databases.

    Science.gov (United States)

    Huang, Erwen; Liu, Changhui; Zheng, Jingjing; Han, Xiaolong; Du, Weian; Huang, Yuanjian; Li, Chengshi; Wang, Xiaoguang; Tong, Dayue; Ou, Xueling; Sun, Hongyu; Zeng, Zhaoshu; Liu, Chao

    2018-04-03

    Differences among SNP panels for individual identification in SNP-selecting and populations led to few common SNPs, compromising their universal applicability. To screen all universal SNPs, we performed a genome-wide SNP mining in multiple populations based on HapMap and 1000Genomes databases. SNPs with high minor allele frequencies (MAF) in 37 populations were selected. With MAF from ≥0.35 to ≥0.43, the number of selected SNPs decreased from 2769 to 0. A total of 117 SNPs with MAF ≥0.39 have no linkage disequilibrium with each other in every population. For 116 of the 117 SNPs, cumulative match probability (CMP) ranged from 2.01 × 10-48 to 1.93 × 10-50 and cumulative exclusion probability (CEP) ranged from 0.9999999996653 to 0.9999999999945. In 134 tested Han samples, 110 of the 117 SNPs remained within high MAF and conformed to Hardy-Weinberg equilibrium, with CMP = 4.70 × 10-47 and CEP = 0.999999999862. By analyzing the same number of autosomal SNPs as in the HID-Ion AmpliSeq Identity Panel, i.e. 90 randomized out of the 110 SNPs, our panel yielded preferable CMP and CEP. Taken together, the 110-SNPs panel is advantageous for forensic test, and this study provided plenty of highly informative SNPs for compiling final universal panels.

  18. Databases

    Directory of Open Access Journals (Sweden)

    Nick Ryan

    2004-01-01

    Full Text Available Databases are deeply embedded in archaeology, underpinning and supporting many aspects of the subject. However, as well as providing a means for storing, retrieving and modifying data, databases themselves must be a result of a detailed analysis and design process. This article looks at this process, and shows how the characteristics of data models affect the process of database design and implementation. The impact of the Internet on the development of databases is examined, and the article concludes with a discussion of a range of issues associated with the recording and management of archaeological data.

  19. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  20. BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers.

    Science.gov (United States)

    Meyer, Michael J; Geske, Philip; Yu, Haiyuan

    2016-05-15

    Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. SNPpy--database management for SNP data from genome wide association studies.

    Directory of Open Access Journals (Sweden)

    Faheem Mitha

    Full Text Available BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS. This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP data. SNPpy and its dependencies are open source software. RESULTS: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. CONCLUSIONS: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

  2. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing

    Directory of Open Access Journals (Sweden)

    Guangtu Gao

    2018-04-01

    heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.

  3. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    Science.gov (United States)

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  4. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.

    Science.gov (United States)

    Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

    2013-07-09

    The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with

  5. Simulated space radiation-induced mutants in the mouse kidney display widespread genomic change.

    Directory of Open Access Journals (Sweden)

    Mitchell S Turker

    Full Text Available Exposure to a small number of high-energy heavy charged particles (HZE ions, as found in the deep space environment, could significantly affect astronaut health following prolonged periods of space travel if these ions induce mutations and related cancers. In this study, we used an in vivo mutagenesis assay to define the mutagenic effects of accelerated 56Fe ions (1 GeV/amu, 151 keV/μm in the mouse kidney epithelium exposed to doses ranging from 0.25 to 2.0 Gy. These doses represent fluences ranging from 1 to 8 particle traversals per cell nucleus. The Aprt locus, located on chromosome 8, was used to select induced and spontaneous mutants. To fully define the mutagenic effects, we used multiple endpoints including mutant frequencies, mutation spectrum for chromosome 8, translocations involving chromosome 8, and mutations affecting non-selected chromosomes. The results demonstrate mutagenic effects that often affect multiple chromosomes for all Fe ion doses tested. For comparison with the most abundant sparsely ionizing particle found in space, we also examined the mutagenic effects of high-energy protons (1 GeV, 0.24 keV/μm at 0.5 and 1.0 Gy. Similar doses of protons were not as mutagenic as Fe ions for many assays, though genomic effects were detected in Aprt mutants at these doses. Considered as a whole, the data demonstrate that Fe ions are highly mutagenic at the low doses and fluences of relevance to human spaceflight, and that cells with considerable genomic mutations are readily induced by these exposures and persist in the kidney epithelium. The level of genomic change produced by low fluence exposure to heavy ions is reminiscent of the extensive rearrangements seen in tumor genomes suggesting a potential initiation step in radiation carcinogenesis.

  6. Simulated space radiation-induced mutants in the mouse kidney display widespread genomic change.

    Science.gov (United States)

    Turker, Mitchell S; Grygoryev, Dmytro; Lasarev, Michael; Ohlrich, Anna; Rwatambuga, Furaha A; Johnson, Sorrel; Dan, Cristian; Eckelmann, Bradley; Hryciw, Gwen; Mao, Jian-Hua; Snijders, Antoine M; Gauny, Stacey; Kronenberg, Amy

    2017-01-01

    Exposure to a small number of high-energy heavy charged particles (HZE ions), as found in the deep space environment, could significantly affect astronaut health following prolonged periods of space travel if these ions induce mutations and related cancers. In this study, we used an in vivo mutagenesis assay to define the mutagenic effects of accelerated 56Fe ions (1 GeV/amu, 151 keV/μm) in the mouse kidney epithelium exposed to doses ranging from 0.25 to 2.0 Gy. These doses represent fluences ranging from 1 to 8 particle traversals per cell nucleus. The Aprt locus, located on chromosome 8, was used to select induced and spontaneous mutants. To fully define the mutagenic effects, we used multiple endpoints including mutant frequencies, mutation spectrum for chromosome 8, translocations involving chromosome 8, and mutations affecting non-selected chromosomes. The results demonstrate mutagenic effects that often affect multiple chromosomes for all Fe ion doses tested. For comparison with the most abundant sparsely ionizing particle found in space, we also examined the mutagenic effects of high-energy protons (1 GeV, 0.24 keV/μm) at 0.5 and 1.0 Gy. Similar doses of protons were not as mutagenic as Fe ions for many assays, though genomic effects were detected in Aprt mutants at these doses. Considered as a whole, the data demonstrate that Fe ions are highly mutagenic at the low doses and fluences of relevance to human spaceflight, and that cells with considerable genomic mutations are readily induced by these exposures and persist in the kidney epithelium. The level of genomic change produced by low fluence exposure to heavy ions is reminiscent of the extensive rearrangements seen in tumor genomes suggesting a potential initiation step in radiation carcinogenesis.

  7. Isolation of three novel rat and mouse papillomaviruses and their genomic characterization.

    Directory of Open Access Journals (Sweden)

    Eric Schulz

    Full Text Available Despite a growing knowledge about the biological diversity of papillomaviruses (PV, only little is known about non-human PV in general and about PV mice models in particular. We cloned and sequenced the complete genomes of two novel PV types from the Norway rat (Rattus norvegicus; RnPV2 and the wood mouse (Apodemus sylvaticus; AsPV1 as well as a novel variant of the recently described MmuPV1 (originally designated as MusPV from a house mouse (Mus musculus; MmuPV1 variant. In addition, we conducted phylogenetic analyses using a systematically representative set of 79 PV types, including the novel sequences. As inferred from concatenated amino acid sequences of six proteins, MmuPV1 variant and AsPV1 nested within the Beta+Xi-PV super taxon as members of the Pi-PV. RnPV2 is a member of the Iota-PV that has a distant phylogenetic position from Pi-PV. The phylogenetic results support a complex scenario of PV diversification driven by different evolutionary forces including co-divergence with hosts and adaptive radiations to new environments. PV types particularly isolated from mice and rats are the basis for new animal models, which are valuable to study PV induced tumors and new treatment options.

  8. SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases

    Science.gov (United States)

    Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A

    2018-01-01

    Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this

  9. Genetic localization of Cd63, a member of the transmembrane 4 superfamily, reveals two distinct loci in the mouse genome

    Energy Technology Data Exchange (ETDEWEB)

    Gwynn, B.; Eicher, E.M.; Peters, L.L. [Jackson Lab., Bar Harbor, ME (United States)

    1996-07-15

    The membrane protein CD63, a molecular marker for early stages of melanoma progression, has been associated with platelet storage pool deficiency disorders (SPD). CD63 localizes to the membranes of platelets, lysosomes, and melanosomes, all of which are affected in a specific subgroup of SPD. The cDNA encoding CD63 detects two closely related sequences that map to different regions of the mouse genome. One locus maps to mouse Chromosome (Chr) 10 in a region that shares linkage homology with the human chromosome encoding human CD63. The second locus maps to mouse Chr 18 in a region that bears no known human CD63-related genes. No SPD has been localized to these regions of either the mouse or the human chromosomes. 15 refs., 2 figs.

  10. Generation of Mouse Haploid Somatic Cells by Small Molecules for Genome-wide Genetic Screening

    Directory of Open Access Journals (Sweden)

    Zheng-Quan He

    2017-08-01

    Full Text Available The recent success of derivation of mammalian haploid embryonic stem cells (haESCs has provided a powerful tool for large-scale functional analysis of the mammalian genome. However, haESCs rapidly become diploidized after differentiation, posing challenges for genetic analysis. Here, we show that the spontaneous diploidization of haESCs happens in metaphase due to mitotic slippage. Diploidization can be suppressed by small-molecule-mediated inhibition of CDK1 and ROCK. Through ROCK inhibition, we can generate haploid somatic cells of all three germ layers from haESCs, including terminally differentiated neurons. Using piggyBac transposon-based insertional mutagenesis, we generated a haploid neural cell library harboring genome-wide mutations for genetic screening. As a proof of concept, we screened for Mn2+-mediated toxicity and identified the Park2 gene. Our findings expand the applications of mouse haploid cell technology to somatic cell types and may also shed light on the mechanisms of ploidy maintenance.

  11. Genome editing in mouse spermatogonial stem/progenitor cells using engineered nucleases.

    Directory of Open Access Journals (Sweden)

    Danielle A Fanslow

    Full Text Available Editing the genome to create specific sequence modifications is a powerful way to study gene function and promises future applicability to gene therapy. Creation of precise modifications requires homologous recombination, a very rare event in most cell types that can be stimulated by introducing a double strand break near the target sequence. One method to create a double strand break in a particular sequence is with a custom designed nuclease. We used engineered nucleases to stimulate homologous recombination to correct a mutant gene in mouse "GS" (germline stem cells, testicular derived cell cultures containing spermatogonial stem cells and progenitor cells. We demonstrated that gene-corrected cells maintained several properties of spermatogonial stem/progenitor cells including the ability to colonize following testicular transplantation. This proof of concept for genome editing in GS cells impacts both cell therapy and basic research given the potential for GS cells to be propagated in vitro, contribute to the germline in vivo following testicular transplantation or become reprogrammed to pluripotency in vitro.

  12. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Science.gov (United States)

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  13. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, Tatiparthi B. K. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Thomas, Alex D. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Stamatis, Dimitri [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Bertsch, Jon [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Isbandi, Michelle [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Jansson, Jakob [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Mallajosyula, Jyothi [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Pagani, Ioanna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lobos, Elizabeth A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); King Abdulaziz Univ., Jeddah (Saudi Arabia)

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  14. Using FlyBase, a Database of Drosophila Genes and Genomes.

    Science.gov (United States)

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  15. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    Science.gov (United States)

    Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.

    2010-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718

  16. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    Science.gov (United States)

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  17. An analysis of possible off target effects following CAS9/CRISPR targeted deletions of neuropeptide gene enhancers from the mouse genome.

    Science.gov (United States)

    Hay, Elizabeth Anne; Khalaf, Abdulla Razak; Marini, Pietro; Brown, Andrew; Heath, Karyn; Sheppard, Darrin; MacKenzie, Alasdair

    2017-08-01

    We have successfully used comparative genomics to identify putative regulatory elements within the human genome that contribute to the tissue specific expression of neuropeptides such as galanin and receptors such as CB1. However, a previous inability to rapidly delete these elements from the mouse genome has prevented optimal assessment of their function in-vivo. This has been solved using CAS9/CRISPR genome editing technology which uses a bacterial endonuclease called CAS9 that, in combination with specifically designed guide RNA (gRNA) molecules, cuts specific regions of the mouse genome. However, reports of "off target" effects, whereby the CAS9 endonuclease is able to cut sites other than those targeted, limits the appeal of this technology. We used cytoplasmic microinjection of gRNA and CAS9 mRNA into 1-cell mouse embryos to rapidly generate enhancer knockout mouse lines. The current study describes our analysis of the genomes of these enhancer knockout lines to detect possible off-target effects. Bioinformatic analysis was used to identify the most likely putative off-target sites and to design PCR primers that would amplify these sequences from genomic DNA of founder enhancer deletion mouse lines. Amplified DNA was then sequenced and blasted against the mouse genome sequence to detect off-target effects. Using this approach we were unable to detect any evidence of off-target effects in the genomes of three founder lines using any of the four gRNAs used in the analysis. This study suggests that the problem of off-target effects in transgenic mice have been exaggerated and that CAS9/CRISPR represents a highly effective and accurate method of deleting putative neuropeptide gene enhancer sequences from the mouse genome. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  18. Genome wide analysis of inbred mouse lines identifies a locus containing Ppar-gamma as contributing to enhanced malaria survival.

    Directory of Open Access Journals (Sweden)

    Selina E R Bopp

    2010-05-01

    Full Text Available The genetic background of a patient determines in part if a person develops a mild form of malaria and recovers, or develops a severe form and dies. We have used a mouse model to detect genes involved in the resistance or susceptibility to Plasmodium berghei malaria infection. To this end we first characterized 32 different mouse strains infected with P. berghei and identified survival as the best trait to discriminate between the strains. We found a locus on chromosome 6 by linking the survival phenotypes of the mouse strains to their genetic variations using genome wide analyses such as haplotype associated mapping and the efficient mixed-model for association. This new locus involved in malaria resistance contains only two genes and confirms the importance of Ppar-gamma in malaria infection.

  19. Maternal-foetal genomic conflict and speciation: no evidence for hybrid placental dysplasia in crosses between two house mouse subspecies

    Czech Academy of Sciences Publication Activity Database

    Kropáčková, L.; Piálek, Jaroslav; Gergelits, Václav; Forejt, Jiří; Reifová, R.

    2015-01-01

    Roč. 28, č. 3 (2015), s. 688-698 ISSN 1010-061X R&D Projects: GA ČR GA13-08078S Institutional support: RVO:68081766 ; RVO:68378050 Keywords : hybrid placental dysplasia * genomic conflicts * speciation * X chromosome * house mouse * Mus musculus musculus * Mus musculus domesticus Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 2.747, year: 2015

  20. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome

    Science.gov (United States)

    Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.

    2002-01-01

    Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263

  1. Chromosome-wise dissection of the genome of the extremely big mouse line DU6i.

    Science.gov (United States)

    Bevova, Marianna R; Aulchenko, Yurii S; Aksu, Soner; Renne, Ulla; Brockmann, Gudrun A

    2006-01-01

    The extreme high-body-weight-selected mouse line DU6i is a polygenic model for growth research, harboring many small-effect QTL. We dissected the genome of this line into 19 autosomes and the Y chromosome by the construction of a new panel of chromosome substitution strains (CSS). The DU6i chromosomes were transferred to a DBA/2 mice genetic background by marker-assisted recurrent backcrossing. Mitochondria and the X chromosome were of DBA/2 origin in the backcross. During the construction of these novel strains, >4000 animals were generated, phenotyped, and genotyped. Using these data, we studied the genetic control of variation in body weight and weight gain at 21, 42, and 63 days. The unique data set facilitated the analysis of chromosomal interaction with sex and parent-of-origin effects. All analyzed chromosomes affected body weight and weight gain either directly or in interaction with sex or parent of origin. The effects were age specific, with some chromosomes showing opposite effects at different stages of development.

  2. A geographically-diverse collection of 418 human gut microbiome pathway genome databases

    KAUST Repository

    Hahn, Aria S.

    2017-04-11

    Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.

  3. FunCoup 3.0: database of genome-wide functional coupling networks.

    Science.gov (United States)

    Schmitt, Thomas; Ogris, Christoph; Sonnhammer, Erik L L

    2014-01-01

    We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.

  4. High-resolution comparative mapping among man, cattle and mouse suggests a role for repeat sequences in mammalian genome evolution

    Directory of Open Access Journals (Sweden)

    Rodolphe François

    2006-08-01

    Full Text Available Abstract Background Comparative mapping provides new insights into the evolutionary history of genomes. In particular, recent studies in mammals have suggested a role for segmental duplication in genome evolution. In some species such as Drosophila or maize, transposable elements (TEs have been shown to be involved in chromosomal rearrangements. In this work, we have explored the presence of interspersed repeats in regions of chromosomal rearrangements, using an updated high-resolution integrated comparative map among cattle, man and mouse. Results The bovine, human and mouse comparative autosomal map has been constructed using data from bovine genetic and physical maps and from FISH-mapping studies. We confirm most previous results but also reveal some discrepancies. A total of 211 conserved segments have been identified between cattle and man, of which 33 are new segments and 72 correspond to extended, previously known segments. The resulting map covers 91% and 90% of the human and bovine genomes, respectively. Analysis of breakpoint regions revealed a high density of species-specific interspersed repeats in the human and mouse genomes. Conclusion Analysis of the breakpoint regions has revealed specific repeat density patterns, suggesting that TEs may have played a significant role in chromosome evolution and genome plasticity. However, we cannot rule out that repeats and breakpoints accumulate independently in the few same regions where modifications are better tolerated. Likewise, we cannot ascertain whether increased TE density is the cause or the consequence of chromosome rearrangements. Nevertheless, the identification of high density repeat clusters combined with a well-documented repeat phylogeny should highlight probable breakpoints, and permit their precise dating. Combining new statistical models taking the present information into account should help reconstruct ancestral karyotypes.

  5. LDSplitDB: a database for studies of meiotic recombination hotspots in MHC using human genomic data.

    Science.gov (United States)

    Guo, Jing; Chen, Hao; Yang, Peng; Lee, Yew Ti; Wu, Min; Przytycka, Teresa M; Kwoh, Chee Keong; Zheng, Jie

    2018-04-20

    Meiotic recombination happens during the process of meiosis when chromosomes inherited from two parents exchange genetic materials to generate chromosomes in the gamete cells. The recombination events tend to occur in narrow genomic regions called recombination hotspots. Its dysregulation could lead to serious human diseases such as birth defects. Although the regulatory mechanism of recombination events is still unclear, DNA sequence polymorphisms have been found to play crucial roles in the regulation of recombination hotspots. To facilitate the studies of the underlying mechanism, we developed a database named LDSplitDB which provides an integrative and interactive data mining and visualization platform for the genome-wide association studies of recombination hotspots. It contains the pre-computed association maps of the major histocompatibility complex (MHC) region in the 1000 Genomes Project and the HapMap Phase III datasets, and a genome-scale study of the European population from the HapMap Phase II dataset. Besides the recombination profiles, related data of genes, SNPs and different types of epigenetic modifications, which could be associated with meiotic recombination, are provided for comprehensive analysis. To meet the computational requirement of the rapidly increasing population genomics data, we prepared a lookup table of 400 haplotypes for recombination rate estimation using the well-known LDhat algorithm which includes all possible two-locus haplotype configurations. To the best of our knowledge, LDSplitDB is the first large-scale database for the association analysis of human recombination hotspots with DNA sequence polymorphisms. It provides valuable resources for the discovery of the mechanism of meiotic recombination hotspots. The information about MHC in this database could help understand the roles of recombination in human immune system. DATABASE URL: http://histone.scse.ntu.edu.sg/LDSplitDB.

  6. TcruziDB, an Integrated Database, and the WWW Information Server for the Trypanosoma cruzi Genome Project

    Directory of Open Access Journals (Sweden)

    Degrave Wim

    1997-01-01

    Full Text Available Data analysis, presentation and distribution is of utmost importance to a genome project. A public domain software, ACeDB, has been chosen as the common basis for parasite genome databases, and a first release of TcruziDB, the Trypanosoma cruzi genome database, is available by ftp from ftp://iris.dbbm.fiocruz.br/pub/genomedb/TcruziDB as well as versions of the software for different operating systems (ftp://iris.dbbm.fiocruz.br/pub/unixsoft/. Moreover, data originated from the project are available from the WWW server at http://www.dbbm.fiocruz.br. It contains biological and parasitological data on CL Brener, its karyotype, all available T. cruzi sequences from Genbank, data on the EST-sequencing project and on available libraries, a T. cruzi codon table and a listing of activities and participating groups in the genome project, as well as meeting reports. T. cruzi discussion lists (tcruzi-l@iris.dbbm.fiocruz.br and tcgenics@iris.dbbm.fiocruz.br are being maintained for communication and to promote collaboration in the genome project

  7. KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses.

    Science.gov (United States)

    Kim, Jungeun; Weber, Jessica A; Jho, Sungwoong; Jang, Jinho; Jun, JeHoon; Cho, Yun Sung; Kim, Hak-Min; Kim, Hyunho; Kim, Yumi; Chung, OkSung; Kim, Chang Geun; Lee, HyeJin; Kim, Byung Chul; Han, Kyudong; Koh, InSong; Chae, Kyun Shik; Lee, Semin; Edwards, Jeremy S; Bhak, Jong

    2018-04-04

    High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.

  8. Absence of respiratory inflammatory reaction of elemental sulfur using the California Pesticide Illness Database and a mouse model.

    Science.gov (United States)

    Lee, Kiyoung; Smith, Jodi L; Last, Jerold A

    2005-01-01

    Elemental sulfur, a natural substance, is used as a fungicide. Elemental sulfur is the most heavily used agricultural chemical in California. In 2003, annual sulfur usage in California was about 34% of the total weight of pesticide active ingredient used in production agriculture. Even though sulfur is mostly used in dust form, the respiratory health effects of elemental sulfur are not well documented. The purpose of this paper is to address the possible respiratory effect of elemental sulfur using the California Pesticide Illness Database and laboratory experiments with mice. We analyzed the California Pesticide Illness Database between 1991 and 2001. Among 127 reports of definite, probable, and possible illness involving sulfur, 21 cases (16%) were identified as respiratory related. A mouse model was used to examine whether there was an inflammatory or fibrotic response to elemental sulfur. Dust solutions were injected intratracheally into ovalbumin sensitized mice and lung damage was evaluated. Lung inflammatory response was analyzed via total lavage cell counts and differentials, and airway collagen content was analyzed histologically and biochemically. No significant differences from controls were seen in animals exposed to sulfur particles. The findings suggest that acute exposure of elemental sulfur itself may not cause an inflammatory reaction. However, further studies are needed to understand the possible health effects of chronic sulfur exposure and environmental weathering of sulfur dust.

  9. Nuclear-like Seq in mt Genome - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ar-like Seq in mt Genome Data detail Data name Nuclear-like Seq in mt Genome DOI 10...e Site Policy | Contact Us Nuclear-like Seq in mt Genome - RMG | LSDB Archive ... ...switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us RMG Nucle

  10. A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm

    Directory of Open Access Journals (Sweden)

    Allen Eric E

    2008-10-01

    Full Text Available Abstract Background The process of horizontal gene transfer (HGT is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not. Description The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource http://darkhorse.ucsd.edu. Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence. Conclusion The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and

  11. The development of large-scale de-identified biomedical databases in the age of genomics-principles and challenges.

    Science.gov (United States)

    Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K

    2018-04-10

    Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.

  12. A Guide to the PLAZA 3.0 Plant Comparative Genomic Database.

    Science.gov (United States)

    Vandepoele, Klaas

    2017-01-01

    PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .

  13. Download - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...t_db_link_en.zip (36.3 KB) - 6 Genome analysis methods pgdbj_dna_marker_linkage_map_genome_analysis_methods_... of This Database Site Policy | Contact Us Download - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  14. Genomic organization and the tissue distribution of alternatively spliced isoforms of the mouse Spatial gene

    Directory of Open Access Journals (Sweden)

    Mattei Marie-Geneviève

    2004-07-01

    Full Text Available Abstract Background The stromal component of the thymic microenvironment is critical for T lymphocyte generation. Thymocyte differentiation involves a cascade of coordinated stromal genes controlling thymocyte survival, lineage commitment and selection. The "Stromal Protein Associated with Thymii And Lymph-node" (Spatial gene encodes a putative transcription factor which may be involved in T-cell development. In the testis, the Spatial gene is also expressed by round spermatids during spermatogenesis. Results The Spatial gene maps to the B3-B4 region of murine chromosome 10 corresponding to the human syntenic region 10q22.1. The mouse Spatial genomic DNA is organised into 10 exons and is alternatively spliced to generate two short isoforms (Spatial-α and -γ and two other long isoforms (Spatial-δ and -ε comprising 5 additional exons on the 3' site. Here, we report the cloning of a new short isoform, Spatial-β, which differs from other isoforms by an additional alternative exon of 69 bases. This new exon encodes an interesting proline-rich signature that could confer to the 34 kDa Spatial-β protein a particular function. By quantitative TaqMan RT-PCR, we have shown that the short isoforms are highly expressed in the thymus while the long isoforms are highly expressed in the testis. We further examined the inter-species conservation of Spatial between several mammals and identified that the protein which is rich in proline and positive amino acids, is highly conserved. Conclusions The Spatial gene generates at least five alternative spliced variants: three short isoforms (Spatial-α, -β and -γ highly expressed in the thymus and two long isoforms (Spatial-δ and -ε highly expressed in the testis. These alternative spliced variants could have a tissue specific function.

  15. Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

    Science.gov (United States)

    Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

    2006-01-01

    Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.

  16. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...

  17. Pharmacokinetic and Genomic Effects of Arsenite in Drinking Water on Mouse Lung in a 30-Day Exposure

    Directory of Open Access Journals (Sweden)

    Jaya Chilakapati

    2015-06-01

    Full Text Available The 2 objectives of this subchronic study were to determine the arsenite drinking water exposure dependent increases in female C3H mouse liver and lung tissue arsenicals and to characterize the dose response (to 0, 0.05, 0.25, 1, 10, and 85 ppm arsenite in drinking water for 30 days and a purified AIN-93M diet for genomic mouse lung expression patterns. Mouse lungs were analyzed for inorganic arsenic, monomethylated, and dimethylated arsenicals by hydride generation atomic absorption spectroscopy. The total lung mean arsenical levels were 1.4, 22.5, 30.1, 50.9, 105.3, and 316.4 ng/g lung tissue after 0, 0.05, 0.25, 1, 10, and 85 ppm, respectively. At 85 ppm, the total mean lung arsenical levels increased 14-fold and 131-fold when compared to either the lowest noncontrol dose (0.05 ppm or the control dose, respectively. We found that arsenic exposure elicited minimal numbers of differentially expressed genes (DEGs; 77, 38, 90, 87, and 87 DEGs after 0.05, 0.25, 1, 10, and 85 ppm, respectively, which were associated with cardiovascular disease, development, differentiation, apoptosis, proliferation, and stress response. After 30 days of arsenite exposure, this study showed monotonic increases in mouse lung arsenical (total arsenic and dimethylarsinic acid concentrations but no clear dose-related increases in DEG numbers.

  18. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) progr...... is freely available on a web server at http://fgf.genomics.org.cn/...

  19. Histone variant H3.3-mediated chromatin remodeling is essential for paternal genome activation in mouse preimplantation embryos.

    Science.gov (United States)

    Kong, Qingran; Banaszynski, Laura A; Geng, Fuqiang; Zhang, Xiaolei; Zhang, Jiaming; Zhang, Heng; O'Neill, Claire L; Yan, Peidong; Liu, Zhonghua; Shido, Koji; Palermo, Gianpiero D; Allis, C David; Rafii, Shahin; Rosenwaks, Zev; Wen, Duancheng

    2018-03-09

    Derepression of chromatin-mediated transcriptional repression of paternal and maternal genomes is considered the first major step that initiates zygotic gene expression after fertilization. The histone variant H3.3 is present in both male and female gametes and is thought to be important for remodeling the paternal and maternal genomes for activation during both fertilization and embryogenesis. However, the underlying mechanisms remain poorly understood. Using our H3.3B-HA-tagged mouse model, engineered to report H3.3 expression in live animals and to distinguish different sources of H3.3 protein in embryos, we show here that sperm-derived H3.3 (sH3.3) protein is removed from the sperm genome shortly after fertilization and extruded from the zygotes via the second polar bodies (PBII) during embryogenesis. We also found that the maternal H3.3 (mH3.3) protein is incorporated into the paternal genome as early as 2 h postfertilization and is detectable in the paternal genome until the morula stage. Knockdown of maternal H3.3 resulted in compromised embryonic development both of fertilized embryos and of androgenetic haploid embryos. Furthermore, we report that mH3.3 depletion in oocytes impairs both activation of the Oct4 pluripotency marker gene and global de novo transcription from the paternal genome important for early embryonic development. Our results suggest that H3.3-mediated paternal chromatin remodeling is essential for the development of preimplantation embryos and the activation of the paternal genome during embryogenesis. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  20. Comparing genomes: databases and computational tools for comparative analysis of prokaryotic genomes - DOI: 10.3395/reciis.v1i2.Sup.105en

    Directory of Open Access Journals (Sweden)

    Marcos Catanho

    2007-12-01

    Full Text Available Since the 1990's, the complete genetic code of more than 600 living organisms has been deciphered, such as bacteria, yeasts, protozoan parasites, invertebrates and vertebrates, including Homo sapiens, and plants. More than 2,000 other genome projects representing medical, commercial, environmental and industrial interests, or comprising model organisms, important for the development of the scientific research, are currently in progress. The achievement of complete genome sequences of numerous species combined with the tremendous progress in computation that occurred in the last few decades allowed the use of new holistic approaches in the study of genome structure, organization and evolution, as well as in the field of gene prediction and functional classification. Numerous public or proprietary databases and computational tools have been created attempting to optimize the access to this information through the web. In this review, we present the main resources available through the web for comparative analysis of prokaryotic genomes. We concentrated on the group of mycobacteria that contains important human and animal pathogens. The birth of Bioinformatics and Computational Biology and the contributions of these disciplines to the scientific development of this field are also discussed.

  1. Genome Transfer Prevents Fragmentation and Restores Developmental Potential of Developmentally Compromised Postovulatory Aged Mouse Oocytes

    Directory of Open Access Journals (Sweden)

    Mitsutoshi Yamada

    2017-03-01

    Full Text Available Changes in oocyte quality can have great impact on the developmental potential of early embryos. Here we test whether nuclear genome transfer from a developmentally incompetent to a developmentally competent oocyte can restore developmental potential. Using in vitro oocyte aging as a model system we performed nuclear transfer in mouse oocytes at metaphase II or at the first interphase, and observed that development to the blastocyst stage and to term was as efficient as in control embryos. The increased developmental potential is explained primarily by correction of abnormal cytokinesis at anaphase of meiosis and mitosis, by a reduction in chromosome segregation errors, and by normalization of the localization of chromosome passenger complex components survivin and cyclin B1. These observations demonstrate that developmental decline is primarily due to abnormal function of cytoplasmic factors involved in cytokinesis, while the genome remains developmentally fully competent.

  2. Ensembl 2002: accommodating comparative genomics.

    Science.gov (United States)

    Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

    2003-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

  3. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

    Science.gov (United States)

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

    2018-01-04

    ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    Science.gov (United States)

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  5. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus

    Science.gov (United States)

    Harr, Bettina; Karakoc, Emre; Neme, Rafik; Teschke, Meike; Pfeifle, Christine; Pezer, Željka; Babiker, Hiba; Linnenbrink, Miriam; Montero, Inka; Scavetta, Rick; Abai, Mohammad Reza; Molins, Marta Puente; Schlegel, Mathias; Ulrich, Rainer G.; Altmüller, Janine; Franitza, Marek; Büntge, Anna; Künzel, Sven; Tautz, Diethard

    2016-01-01

    Wild populations of the house mouse (Mus musculus) represent the raw genetic material for the classical inbred strains in biomedical research and are a major model system for evolutionary biology. We provide whole genome sequencing data of individuals representing natural populations of M. m. domesticus (24 individuals from 3 populations), M. m. helgolandicus (3 individuals), M. m. musculus (22 individuals from 3 populations) and M. spretus (8 individuals from one population). We use a single pipeline to map and call variants for these individuals and also include 10 additional individuals of M. m. castaneus for which genomic data are publically available. In addition, RNAseq data were obtained from 10 tissues of up to eight adult individuals from each of the three M. m. domesticus populations for which genomic data were collected. Data and analyses are presented via tracks viewable in the UCSC or IGV genome browsers. We also provide information on available outbred stocks and instructions on how to keep them in the laboratory. PMID:27622383

  6. ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome

    Directory of Open Access Journals (Sweden)

    Zhang Zhenhai

    2010-10-01

    Full Text Available Abstract Background Maize (Zea mays ssp. mays L. is an important model for plant basic and applied research. In 2009, the B73 maize genome sequencing made a great step forward, using clone by clone strategy; however, functional annotation and gene classification of the maize genome are still limited. Thus, a well-annotated datasets and informative database will be important for further research discoveries. Signal transduction is a fundamental biological process in living cells, and many protein families participate in this process in sensing, amplifying and responding to various extracellular or internal stimuli. Therefore, it is a good starting point to integrate information on the maize functional genes involved in signal transduction. Results Here we introduce a comprehensive database 'ProFITS' (Protein Families Involved in the Transduction of Signalling, which endeavours to identify and classify protein kinases/phosphatases, transcription factors and ubiquitin-proteasome-system related genes in the B73 maize genome. Users can explore gene models, corresponding transcripts and FLcDNAs using the three abovementioned protein hierarchical categories, and visualize them using an AJAX-based genome browser (JBrowse or Generic Genome Browser (GBrowse. Functional annotations such as GO annotation, protein signatures, protein best-hits in the Arabidopsis and rice genome are provided. In addition, pre-calculated transcription factor binding sites of each gene are generated and mutant information is incorporated into ProFITS. In short, ProFITS provides a user-friendly web interface for studies in signal transduction process in maize. Conclusion ProFITS, which utilizes both the B73 maize genome and full length cDNA (FLcDNA datasets, provides users a comprehensive platform of maize annotation with specific focus on the categorization of families involved in the signal transduction process. ProFITS is designed as a user-friendly web interface and it is

  7. Marker list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...Database Site Policy | Contact Us Marker list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  8. Transgenerational developmental effects and genomic instability after X-irradiation of preimplantation embryos: Studies on two mouse strains

    International Nuclear Information System (INIS)

    Jacquet, P.; Buset, J.; Neefs, M.; Vankerkom, J.; Benotmane, M.A.; Derradji, H.; Hildebrandt, G.; Baatout, S.

    2010-01-01

    Recent results have shown that irradiation of a single cell, the zygote or 1-cell embryo of various mouse strains, could lead to congenital anomalies in the fetuses. In the Heiligenberger strain, a link between the radiation-induced congenital anomalies and the development of a genomic instability was also suggested. Moreover, further studies showed that in that strain, both congenital anomalies and genomic instability could be transmitted to the next generation. The aim of the experiments described in this paper was to investigate whether such non-targeted transgenerational effects could also be observed in two other radiosensitive mouse strains (CF1 and ICR), using lower radiation doses. Irradiation of the CF1 and ICR female zygotes with 0.2 or 0.4 Gy did not result in a decrease of their fertility after birth, when they had reached sexual maturity. Moreover, females of both strains that had been X-irradiated with 0.2 Gy exhibited higher rates of pregnancy, less resorptions and more living fetuses. Additionally, the mean weight of living fetuses in these groups had significantly increased. Exencephaly and dwarfism were observed in CF1 fetuses issued from control and X-irradiated females. In the control group of that strain, polydactyly and limb deformity were also found. The yields of abnormal fetuses did not differ significantly between the control and X-irradiated groups. Polydactyly, exencephaly and dwarfism were observed in fetuses issued from ICR control females. In addition to these anomalies, gastroschisis, curly tail and open eye were observed at low frequencies in ICR fetuses issued from X-irradiated females. Again, the frequencies of abnormal fetuses found in the different groups did not differ significantly. In both CF1 and ICR mouse strains, irradiation of female zygotes did not result in the development of a genomic instability in the next generation embryos. Overall, our results suggest that, at the moderate doses used, developmental defects

  9. Transgenerational developmental effects and genomic instability after X-irradiation of preimplantation embryos: Studies on two mouse strains

    Energy Technology Data Exchange (ETDEWEB)

    Jacquet, P., E-mail: pjacquet@sckcen.be [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium); Buset, J.; Neefs, M. [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium); Vankerkom, J. [Division of Environmental Research, VITO, Boeretang 200, B-2400 Mol (Belgium); Benotmane, M.A.; Derradji, H. [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium); Hildebrandt, G. [Department of Radiotherapy and Radiation Oncology, University of Leipzig, Stephanstrasse 9a, D-04103 Leipzig (Germany); Department of Radiotherapy, University of Rostock, Suedring 75, D-18059 Rostock (Germany); Baatout, S. [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium)

    2010-05-01

    Recent results have shown that irradiation of a single cell, the zygote or 1-cell embryo of various mouse strains, could lead to congenital anomalies in the fetuses. In the Heiligenberger strain, a link between the radiation-induced congenital anomalies and the development of a genomic instability was also suggested. Moreover, further studies showed that in that strain, both congenital anomalies and genomic instability could be transmitted to the next generation. The aim of the experiments described in this paper was to investigate whether such non-targeted transgenerational effects could also be observed in two other radiosensitive mouse strains (CF1 and ICR), using lower radiation doses. Irradiation of the CF1 and ICR female zygotes with 0.2 or 0.4 Gy did not result in a decrease of their fertility after birth, when they had reached sexual maturity. Moreover, females of both strains that had been X-irradiated with 0.2 Gy exhibited higher rates of pregnancy, less resorptions and more living fetuses. Additionally, the mean weight of living fetuses in these groups had significantly increased. Exencephaly and dwarfism were observed in CF1 fetuses issued from control and X-irradiated females. In the control group of that strain, polydactyly and limb deformity were also found. The yields of abnormal fetuses did not differ significantly between the control and X-irradiated groups. Polydactyly, exencephaly and dwarfism were observed in fetuses issued from ICR control females. In addition to these anomalies, gastroschisis, curly tail and open eye were observed at low frequencies in ICR fetuses issued from X-irradiated females. Again, the frequencies of abnormal fetuses found in the different groups did not differ significantly. In both CF1 and ICR mouse strains, irradiation of female zygotes did not result in the development of a genomic instability in the next generation embryos. Overall, our results suggest that, at the moderate doses used, developmental defects

  10. Integration of mouse and human genome-wide association data identifies KCNIP4 as an asthma gene.

    Directory of Open Access Journals (Sweden)

    Blanca E Himes

    Full Text Available Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR. The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify asthma-related genes by integrating AHR associations in mouse with human genome-wide association study (GWAS data. We used Efficient Mixed Model Association (EMMA analysis to conduct a GWAS of baseline AHR measures from males and females of 31 mouse strains. Genes near or containing SNPs with EMMA p-values <0.001 were selected for further study in human GWAS. The results of the previously reported EVE consortium asthma GWAS meta-analysis consisting of 12,958 diverse North American subjects from 9 study centers were used to select a subset of homologous genes with evidence of association with asthma in humans. Following validation attempts in three human asthma GWAS (i.e., Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG and two human AHR GWAS (i.e., SHARP, DAG, the Kv channel interacting protein 4 (KCNIP4 gene was identified as nominally associated with both asthma and AHR at a gene- and SNP-level. In EVE, the smallest KCNIP4 association was at rs6833065 (P-value 2.9e-04, while the strongest associations for Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG were 1.5e-03, 1.0e-03, 3.1e-03 at rs7664617, rs4697177, rs4696975, respectively. At a SNP level, the strongest association across all asthma GWAS was at rs4697177 (P-value 1.1e-04. The smallest P-values for association with AHR were 2.3e-03 at rs11947661 in SHARP and 2.1e-03 at rs402802 in DAG. Functional studies are required to validate the potential involvement of KCNIP4 in modulating asthma susceptibility and/or AHR. Our results suggest that a useful approach to identify genes associated with human asthma is to leverage mouse AHR association data.

  11. PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease

    NARCIS (Netherlands)

    Schlüter, Agatha; Fourcade, Stéphane; Domènech-Estévez, Enric; Gabaldón, Toni; Huerta-Cepas, Jaime; Berthommier, Guillaume; Ripp, Raymond; Wanders, Ronald J. A.; Poch, Olivier; Pujol, Aurora

    2007-01-01

    Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database

  12. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    Science.gov (United States)

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  13. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    Science.gov (United States)

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Genome patterns of selection and introgression of haplotypes in natural populations of the house mouse (Mus musculus.

    Directory of Open Access Journals (Sweden)

    Fabian Staubach

    Full Text Available General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1, homologues of human genes involved in adaptations (e.g. alpha-amylase genes or in genetic diseases (e.g. Huntingtin and Parkin. Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice

  15. Comparison of genomic-enhanced EPD systems using an external phenotypic database

    Science.gov (United States)

    The American Angus Association (AAA) is currently evaluating two methods to incorporate genomic information into their genetic evaluation program: 1) multi-trait incorporation of an externally produced molecular breeding value as an indicator trait (MT) and 2) single-step evaluation with an unweight...

  16. Molecular Genetics Information System (MOLGENIS) : alternatives in developing local experimental genomics databases

    NARCIS (Netherlands)

    Swertz, Morris A.; Brock, E.O. (Bert) de; Hijum, Sacha A.F.T. van; Jong, Anne de; Buist, Girbe; Baerends, Richard J.S.; Kok, Jan; Kuipers, Oscar P.; Jansen, Ritsert C.

    2004-01-01

    Motivation: Genomic research laboratories need adequate infrastructure to support management of their data production and research workflow. But what makes infrastructure adequate? A lack of appropriate criteria makes any decision on buying or developing a system difficult. Here, we report on the

  17. Bridging the gap between Big Genome Data Analysis and Database Management Systems

    NARCIS (Netherlands)

    C.P. Cijvat (Robin)

    2014-01-01

    textabstractThe bioinformatics field has encountered a data deluge over the last years, due to in- creasing speed and decreasing cost of DNA sequencing technology. Today, sequencing the DNA of a single genome only takes about a week, and it can result in up to a ter- abyte of data. The sequencing

  18. Genomic organization of the mouse peroxisome proliferator-activated receptor beta/delta gene

    DEFF Research Database (Denmark)

    Larsen, Leif K; Amri, Ez-Zoubir; Mandrup, Susanne

    2002-01-01

    Peroxisome proliferator-activated receptor (PPAR) beta/delta is ubiquitously expressed, but the level of expression differs markedly between different cell types. In order to determine the molecular mechanisms governing PPARbeta/delta gene expression, we have isolated and characterized the mouse...

  19. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella

    OpenAIRE

    Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

    2013-01-01

    Background The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to fa...

  20. Sequence relationships between the genome and the intracellular RNA species 1,3,6 and 7 of mouse hepatitis virus strain A59

    NARCIS (Netherlands)

    Horzinek, M.C.; Spaan, W.J.M.; Rottier, P.J.M.; Zeijst, B.A.M. van der

    1982-01-01

    We have shown by T1 oligonucleotide fingerprinting that the genome of mouse hepatitis virus strain A59 and its intracellular RNA 1 have identical fingerprints and that RNA 1 and the subgenomic RNAs 3, 6, and 7 contain common sequences. To localize the homologous region between the RNAs, we compared

  1. Genome Editing in Mouse Spermatogonial Stem Cell Lines Using TALEN and Double-Nicking CRISPR/Cas9

    Directory of Open Access Journals (Sweden)

    Takuya Sato

    2015-07-01

    Full Text Available Mouse spermatogonial stem cells (SSCs can be cultured for multiplication and maintained for long periods while preserving their spermatogenic ability. Although the cultured SSCs, named germline stem (GS cells, are targets of genome modification, this process remains technically difficult. In the present study, we tested TALEN and double-nicking CRISPR/Cas9 on GS cells, targeting Rosa26 and Stra8 loci as representative genes dispensable and indispensable in spermatogenesis, respectively. Harvested GS cell colonies showed a high targeting efficiency with both TALEN and CRISPR/Cas9. The Rosa26-targeted GS cells differentiated into fertility-competent sperm following transplantation. On the other hand, Stra8-targeted GS cells showed defective spermatogenesis following transplantation, confirming its prime role in the initiation of meiosis. TALEN and CRISPR/Cas9, when applied in GS cells, will be valuable tools in the study of spermatogenesis and for revealing the genetic mechanism of spermatogenic failure.

  2. Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions.

    Science.gov (United States)

    Turner, Leslie M; Harr, Bettina

    2014-12-09

    Mapping hybrid defects in contact zones between incipient species can identify genomic regions contributing to reproductive isolation and reveal genetic mechanisms of speciation. The house mouse features a rare combination of sophisticated genetic tools and natural hybrid zones between subspecies. Male hybrids often show reduced fertility, a common reproductive barrier between incipient species. Laboratory crosses have identified sterility loci, but each encompasses hundreds of genes. We map genetic determinants of testis weight and testis gene expression using offspring of mice captured in a hybrid zone between M. musculus musculus and M. m. domesticus. Many generations of admixture enables high-resolution mapping of loci contributing to these sterility-related phenotypes. We identify complex interactions among sterility loci, suggesting multiple, non-independent genetic incompatibilities contribute to barriers to gene flow in the hybrid zone.

  3. Genomics Portals: integrative web-platform for mining genomics data.

    Science.gov (United States)

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  4. Genomics Portals: integrative web-platform for mining genomics data

    Directory of Open Access Journals (Sweden)

    Ghosh Krishnendu

    2010-01-01

    Full Text Available Abstract Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc, and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  5. GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Directory of Open Access Journals (Sweden)

    Weiller Georg

    2007-03-01

    Full Text Available Abstract Background To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. Results We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. Conclusion GeneBins provides resources to interpret gene expression results from microarray experiments. It is available at http://bioinfoserver.rsbs.anu.edu.au/utils/GeneBins/

  6. Orthologous microRNA genes are located in cancer-associated genomic regions in human and mouse.

    Directory of Open Access Journals (Sweden)

    Igor V Makunin

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are short non-coding RNAs that regulate differentiation and development in many organisms and play an important role in cancer. METHODOLOGY/PRINCIPAL FINDINGS: Using a public database of mapped retroviral insertion sites from various mouse models of cancer we demonstrate that MLV-derived retroviral inserts are enriched in close proximity to mouse miRNA loci. Clustered inserts from cancer-associated regions (Common Integration Sites, CIS have a higher association with miRNAs than non-clustered inserts. Ten CIS-associated miRNA loci containing 22 miRNAs are located within 10 kb of known CIS insertions. Only one CIS-associated miRNA locus overlaps a RefSeq protein-coding gene and six loci are located more than 10 kb from any RefSeq gene. CIS-associated miRNAs on average are more conserved in vertebrates than miRNAs associated with non-CIS inserts and their human homologs are also located in regions perturbed in cancer. In addition we show that miRNA genes are enriched around promoter and/or terminator regions of RefSeq genes in both mouse and human. CONCLUSIONS/SIGNIFICANCE: We provide a list of ten miRNA loci potentially involved in the development of blood cancer or brain tumors. There is independent experimental support from other studies for the involvement of miRNAs from at least three CIS-associated miRNA loci in cancer development.

  7. Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases.

    Science.gov (United States)

    Lysenko, Artem; Lysenko, Atem; Hindle, Matthew Morritt; Taubert, Jan; Saqi, Mansoor; Rawlings, Christopher John

    2009-11-01

    The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

  8. Genome-wide data-mining of candidate human splice translational efficiency polymorphisms (STEPs and an online database.

    Directory of Open Access Journals (Sweden)

    Christopher A Raistrick

    2010-10-01

    Full Text Available Variation in pre-mRNA splicing is common and in some cases caused by genetic variants in intronic splicing motifs. Recent studies into the insulin gene (INS discovered a polymorphism in a 5' non-coding intron that influences the likelihood of intron retention in the final mRNA, extending the 5' untranslated region and maintaining protein quality. Retention was also associated with increased insulin levels, suggesting that such variants--splice translational efficiency polymorphisms (STEPs--may relate to disease phenotypes through differential protein expression. We set out to explore the prevalence of STEPs in the human genome and validate this new category of protein quantitative trait loci (pQTL using publicly available data.Gene transcript and variant data were collected and mined for candidate STEPs in motif regions. Sequences from transcripts containing potential STEPs were analysed for evidence of splice site recognition and an effect in expressed sequence tags (ESTs. 16 publicly released genome-wide association data sets of common diseases were searched for association to candidate polymorphisms with HapMap frequency data. Our study found 3324 candidate STEPs lying in motif sequences of 5' non-coding introns and further mining revealed 170 with transcript evidence of intron retention. 21 potential STEPs had EST evidence of intron retention or exon extension, as well as population frequency data for comparison.Results suggest that the insulin STEP was not a unique example and that many STEPs may occur genome-wide with potentially causal effects in complex disease. An online database of STEPs is freely accessible at http://dbstep.genes.org.uk/.

  9. MouseMine: a new data warehouse for MGI.

    Science.gov (United States)

    Motenko, H; Neuhauser, S B; O'Keefe, M; Richardson, J E

    2015-08-01

    MouseMine (www.mousemine.org) is a new data warehouse for accessing mouse data from Mouse Genome Informatics (MGI). Based on the InterMine software framework, MouseMine supports powerful query, reporting, and analysis capabilities, the ability to save and combine results from different queries, easy integration into larger workflows, and a comprehensive Web Services layer. Through MouseMine, users can access a significant portion of MGI data in new and useful ways. Importantly, MouseMine is also a member of a growing community of online data resources based on InterMine, including those established by other model organism databases. Adopting common interfaces and collaborating on data representation standards are critical to fostering cross-species data analysis. This paper presents a general introduction to MouseMine, presents examples of its use, and discusses the potential for further integration into the MGI interface.

  10. LC-MS/MS-based proteome profiling in Daphnia pulex and Daphnia longicephala: the Daphnia pulex genome database as a key for high throughput proteomics in Daphnia

    Directory of Open Access Journals (Sweden)

    Mayr Tobias

    2009-04-01

    Full Text Available Abstract Background Daphniids, commonly known as waterfleas, serve as important model systems for ecology, evolution and the environmental sciences. The sequencing and annotation of the Daphnia pulex genome both open future avenues of research on this model organism. As proteomics is not only essential to our understanding of cell function, and is also a powerful validation tool for predicted genes in genome annotation projects, a first proteomic dataset is presented in this article. Results A comprehensive set of 701,274 peptide tandem-mass-spectra, derived from Daphnia pulex, was generated, which lead to the identification of 531 proteins. To measure the impact of the Daphnia pulex filtered models database for mass spectrometry based Daphnia protein identification, this result was compared with results obtained with the Swiss-Prot and the Drosophila melanogaster database. To further validate the utility of the Daphnia pulex database for research on other Daphnia species, additional 407,778 peptide tandem-mass-spectra, obtained from Daphnia longicephala, were generated and evaluated, leading to the identification of 317 proteins. Conclusion Peptides identified in our approach provide the first experimental evidence for the translation of a broad variety of predicted coding regions within the Daphnia genome. Furthermore it could be demonstrated that identification of Daphnia longicephala proteins using the Daphnia pulex protein database is feasible but shows a slightly reduced identification rate. Data provided in this article clearly demonstrates that the Daphnia genome database is the key for mass spectrometry based high throughput proteomics in Daphnia.

  11. Functional role of bacteriophage transfer RNAs: codon usage analysis of genomic sequences stored in the GENBANK/EMBL/DDBJ databases

    Directory of Open Access Journals (Sweden)

    T Kunisawa

    2006-01-01

    Full Text Available Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The data-analysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.

  12. The Importance of Biological Databases in Biological Discovery.

    Science.gov (United States)

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  13. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics

    Science.gov (United States)

    Cooper, Laurel; Meier, Austin; Laporte, Marie-Angélique; Elser, Justin L; Mungall, Chris; Sinn, Brandon T; Cavaliere, Dario; Carbon, Seth; Dunn, Nathan A; Smith, Barry; Qu, Botong; Preece, Justin; Zhang, Eugene; Todorovic, Sinisa; Gkoutos, Georgios; Doonan, John H; Stevenson, Dennis W; Arnaud, Elizabeth

    2018-01-01

    Abstract The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository. PMID:29186578

  14. Database Description - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ... QTL list, Plant DB link & Genome analysis methods Alternative name - DOI 10.18908/lsdba.nbdc01194-01-000 Cr...ers and QTLs are curated manually from the published literature. The marker information includes marker sequences, genotyping methods... Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  15. Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks.

    Science.gov (United States)

    Balaur, Irina; Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J; Auffray, Charles

    2017-04-01

    The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . ibalaur@eisbm.org. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. The master two-dimensional gel database of human AMA cell proteins: towards linking protein and genome sequence and mapping information (update 1991)

    DEFF Research Database (Denmark)

    Celis, J E; Leffers, H; Rasmussen, H H

    1991-01-01

    autoantigens" and "cDNAs". For convenience we have included an alphabetical list of all known proteins recorded in this database. In the long run, the main goal of this database is to link protein and DNA sequencing and mapping information (Human Genome Program) and to provide an integrated picture......The master two-dimensional gel database of human AMA cells currently lists 3801 cellular and secreted proteins, of which 371 cellular polypeptides (306 IEF; 65 NEPHGE) were added to the master images during the last 10 months. These include: (i) very basic and acidic proteins that do not focus...

  17. Mouse genome-wide association and systems genetics identify Asxl2 as a regulator of bone mineral density and osteoclastogenesis.

    Directory of Open Access Journals (Sweden)

    Charles R Farber

    2011-04-01

    Full Text Available Significant advances have been made in the discovery of genes affecting bone mineral density (BMD; however, our understanding of its genetic basis remains incomplete. In the current study, genome-wide association (GWA and co-expression network analysis were used in the recently described Hybrid Mouse Diversity Panel (HMDP to identify and functionally characterize novel BMD genes. In the HMDP, a GWA of total body, spinal, and femoral BMD revealed four significant associations (-log10P>5.39 affecting at least one BMD trait on chromosomes (Chrs. 7, 11, 12, and 17. The associations implicated a total of 163 genes with each association harboring between 14 and 112 genes. This list was reduced to 26 functional candidates by identifying those genes that were regulated by local eQTL in bone or harbored potentially functional non-synonymous (NS SNPs. This analysis revealed that the most significant BMD SNP on Chr. 12 was a NS SNP in the additional sex combs like-2 (Asxl2 gene that was predicted to be functional. The involvement of Asxl2 in the regulation of bone mass was confirmed by the observation that Asxl2 knockout mice had reduced BMD. To begin to unravel the mechanism through which Asxl2 influenced BMD, a gene co-expression network was created using cortical bone gene expression microarray data from the HMDP strains. Asxl2 was identified as a member of a co-expression module enriched for genes involved in the differentiation of myeloid cells. In bone, osteoclasts are bone-resorbing cells of myeloid origin, suggesting that Asxl2 may play a role in osteoclast differentiation. In agreement, the knockdown of Asxl2 in bone marrow macrophages impaired their ability to form osteoclasts. This study identifies a new regulator of BMD and osteoclastogenesis and highlights the power of GWA and systems genetics in the mouse for dissecting complex genetic traits.

  18. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  19. Genome analysis methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods Genome analysis... methods Data detail Data name Genome analysis methods DOI 10.18908/lsdba.nbdc01194-01-005 De...scription of data contents The current status and related information of the genomic analysis about each org...anism (March, 2014). In the case of organisms carried out genomic analysis, the d...e File name: pgdbj_dna_marker_linkage_map_genome_analysis_methods_en.zip File URL: ftp://ftp.biosciencedbc.j

  20. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation.

    Science.gov (United States)

    Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V

    2017-01-04

    The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of 'index' orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  1. Modeling genome-wide dynamic regulatory network in mouse lungs with influenza infection using high-dimensional ordinary differential equations.

    Science.gov (United States)

    Wu, Shuang; Liu, Zhi-Ping; Qiu, Xing; Wu, Hulin

    2014-01-01

    The immune response to viral infection is regulated by an intricate network of many genes and their products. The reverse engineering of gene regulatory networks (GRNs) using mathematical models from time course gene expression data collected after influenza infection is key to our understanding of the mechanisms involved in controlling influenza infection within a host. A five-step pipeline: detection of temporally differentially expressed genes, clustering genes into co-expressed modules, identification of network structure, parameter estimate refinement, and functional enrichment analysis, is developed for reconstructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. Applying the pipeline to the time course gene expression data from influenza-infected mouse lungs, we have identified 20 distinct temporal expression patterns in the differentially expressed genes and constructed a module-based dynamic network using a linear ODE model. Both intra-module and inter-module annotations and regulatory relationships of our inferred network show some interesting findings and are highly consistent with existing knowledge about the immune response in mice after influenza infection. The proposed method is a computationally efficient, data-driven pipeline bridging experimental data, mathematical modeling, and statistical analysis. The application to the influenza infection data elucidates the potentials of our pipeline in providing valuable insights into systematic modeling of complicated biological processes.

  2. Cpf1-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cpf1.

    Science.gov (United States)

    Park, Jeongbin; Bae, Sangsu

    2018-03-15

    Following the type II CRISPR-Cas9 system, type V CRISPR-Cpf1 endonucleases have been found to be applicable for genome editing in various organisms in vivo. However, there are as yet no web-based tools capable of optimally selecting guide RNAs (gRNAs) among all possible genome-wide target sites. Here, we present Cpf1-Database, a genome-wide gRNA library design tool for LbCpf1 and AsCpf1, which have DNA recognition sequences of 5'-TTTN-3' at the 5' ends of target sites. Cpf1-Database provides a sophisticated but simple way to design gRNAs for AsCpf1 nucleases on the genome scale. One can easily access the data using a straightforward web interface, and using the powerful collections feature one can easily design gRNAs for thousands of genes in short time. Free access at http://www.rgenome.net/cpf1-database/. sangsubae@hanyang.ac.kr.

  3. ATM-deficiency increases genomic instability and metastatic potential in a mouse model of pancreatic cancer.

    Science.gov (United States)

    Drosos, Yiannis; Escobar, David; Chiang, Ming-Yi; Roys, Kathryn; Valentine, Virginia; Valentine, Marc B; Rehg, Jerold E; Sahai, Vaibhav; Begley, Lesa A; Ye, Jianming; Paul, Leena; McKinnon, Peter J; Sosa-Pineda, Beatriz

    2017-09-11

    Germline mutations in ATM (encoding the DNA-damage signaling kinase, ataxia-telangiectasia-mutated) increase Familial Pancreatic Cancer (FPC) susceptibility, and ATM somatic mutations have been identified in resected human pancreatic tumors. Here we investigated how Atm contributes to pancreatic cancer by deleting this gene in a murine model of the disease expressing oncogenic Kras (Kras G12D ). We show that partial or total ATM deficiency cooperates with Kras G12D to promote highly metastatic pancreatic cancer. We also reveal that ATM is activated in pancreatic precancerous lesions in the context of DNA damage and cell proliferation, and demonstrate that ATM deficiency leads to persistent DNA damage in both precancerous lesions and primary tumors. Using low passage cultures from primary tumors and liver metastases we show that ATM loss accelerates Kras-induced carcinogenesis without conferring a specific phenotype to pancreatic tumors or changing the status of the tumor suppressors p53, p16 Ink4a and p19 Arf . However, ATM deficiency markedly increases the proportion of chromosomal alterations in pancreatic primary tumors and liver metastases. More importantly, ATM deficiency also renders murine pancreatic tumors highly sensitive to radiation. These and other findings in our study conclusively establish that ATM activity poses a major barrier to oncogenic transformation in the pancreas via maintaining genomic stability.

  4. CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies

    Directory of Open Access Journals (Sweden)

    Furue Motoki

    2010-07-01

    Full Text Available Abstract Background Immunoglobulin (IG or antibody and the T-cell receptor (TR are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]. Description This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I and 1,470 on hematological tumors (Group II. Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable. Conclusions The CIG-DB serves as a bridge between immunological gene data and cancer studies, presenting

  5. The detailed 3D multi-loop aggregate/rosette chromatin architecture and functional dynamic organization of the human and mouse genomes

    DEFF Research Database (Denmark)

    Knoch, Tobias A; Wachsmuth, Malte; Kepper, Nick

    2016-01-01

    BACKGROUND: The dynamic three-dimensional chromatin architecture of genomes and its co-evolutionary connection to its function-the storage, expression, and replication of genetic information-is still one of the central issues in biology. Here, we describe the much debated 3D architecture...... of the human and mouse genomes from the nucleosomal to the megabase pair level by a novel approach combining selective high-throughput high-resolution chromosomal interaction capture (T2C), polymer simulations, and scaling analysis of the 3D architecture and the DNA sequence. RESULTS: The genome is compacted...... into a chromatin quasi-fibre with ~5 ± 1 nucleosomes/11 nm, folded into stable ~30-100 kbp loops forming stable loop aggregates/rosettes connected by similar sized linkers. Minor but significant variations in the architecture are seen between cell types and functional states. The architecture and the DNA sequence...

  6. Correlation between sequence conservation and structural thermodynamics of microRNA precursors from human, mouse, and chicken genomes

    Directory of Open Access Journals (Sweden)

    Wang Shengqi

    2010-10-01

    Full Text Available Abstract Background Previous studies have shown that microRNA precursors (pre-miRNAs have considerably more stable secondary structures than other native RNAs (tRNA, rRNA, and mRNA and artificial RNA sequences. However, pre-miRNAs with ultra stable secondary structures have not been investigated. It is not known if there is a tendency in pre-miRNA sequences towards or against ultra stable structures? Furthermore, the relationship between the structural thermodynamic stability of pre-miRNA and their evolution remains unclear. Results We investigated the correlation between pre-miRNA sequence conservation and structural stability as measured by adjusted minimum folding free energies in pre-miRNAs isolated from human, mouse, and chicken. The analysis revealed that conserved and non-conserved pre-miRNA sequences had structures with similar average stabilities. However, the relatively ultra stable and unstable pre-miRNAs were more likely to be non-conserved than pre-miRNAs with moderate stability. Non-conserved pre-miRNAs had more G+C than A+U nucleotides, while conserved pre-miRNAs contained more A+U nucleotides. Notably, the U content of conserved pre-miRNAs was especially higher than that of non-conserved pre-miRNAs. Further investigations showed that conserved and non-conserved pre-miRNAs exhibited different structural element features, even though they had comparable levels of stability. Conclusions We proposed that there is a correlation between structural thermodynamic stability and sequence conservation for pre-miRNAs from human, mouse, and chicken genomes. Our analyses suggested that pre-miRNAs with relatively ultra stable or unstable structures were less favoured by natural selection than those with moderately stable structures. Comparison of nucleotide compositions between non-conserved and conserved pre-miRNAs indicated the importance of U nucleotides in the pre-miRNA evolutionary process. Several characteristic structural elements were

  7. Database Description - RMOS | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name RMOS Alternative nam...arch Unit Shoshi Kikuchi E-mail : Database classification Plant databases - Rice Microarray Data and other Gene Expression Database...s Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description The Ric...19&lang=en Whole data download - Referenced database Rice Expression Database (RED) Rice full-length cDNA Database... (KOME) Rice Genome Integrated Map Database (INE) Rice Mutant Panel Database (Tos17) Rice Genome Annotation Database

  8. REDIdb: the RNA editing database.

    Science.gov (United States)

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

  9. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  10. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine.

    Science.gov (United States)

    Stenson, Peter D; Mort, Matthew; Ball, Edward V; Shaw, Katy; Phillips, Andrew; Cooper, David N

    2014-01-01

    The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.

  11. TcoF-DB v2: update of the database of human and mouse transcription co-factors and transcription factor interactions

    KAUST Repository

    Schmeier, Sebastian

    2016-10-17

    Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect information on TcoFs. In this article, we describe updates and improvements implemented in TcoF-DB v2. TcoF-DB v2 provides several new features that enables exploration of the roles of TcoFs. The content of the database has significantly expanded, and is enriched with information from Gene Ontology, biological pathways, diseases and molecular signatures. TcoF-DB v2 now includes many more TFs; has substantially increased the number of human TcoFs to 958, and now includes information on mouse (418 new TcoFs). TcoF-DB v2 enables the exploration of information on TcoFs and allows investigations into their influence on transcriptional regulation in humans and mice. TcoF-DB v2 can be accessed at http://tcofdb.org/.

  12. TcoF-DB v2: update of the database of human and mouse transcription co-factors and transcription factor interactions

    KAUST Repository

    Schmeier, Sebastian; Alam, Tanvir; Essack, Magbubah; Bajic, Vladimir B.

    2016-01-01

    Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect information on TcoFs. In this article, we describe updates and improvements implemented in TcoF-DB v2. TcoF-DB v2 provides several new features that enables exploration of the roles of TcoFs. The content of the database has significantly expanded, and is enriched with information from Gene Ontology, biological pathways, diseases and molecular signatures. TcoF-DB v2 now includes many more TFs; has substantially increased the number of human TcoFs to 958, and now includes information on mouse (418 new TcoFs). TcoF-DB v2 enables the exploration of information on TcoFs and allows investigations into their influence on transcriptional regulation in humans and mice. TcoF-DB v2 can be accessed at http://tcofdb.org/.

  13. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  14. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999; FINAL

    International Nuclear Information System (INIS)

    Harger, Carol A.

    1999-01-01

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence.[The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  15. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    Science.gov (United States)

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  16. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  17. Genome engineering via homologous recombination in mouse embryonic stem (ES cells: an amazingly versatile tool for the study of mammalian biology

    Directory of Open Access Journals (Sweden)

    BABINET CHARLES

    2001-01-01

    Full Text Available The ability to introduce genetic modifications in the germ line of complex organisms has been a long-standing goal of those who study developmental biology. In this regard, the mouse, a favorite model for the study of the mammals, is unique: indeed not only is it possible since the late seventies, to add genes to the mouse genome like in several other complex organisms but also to perform gene replacement and modification. This has been made possible via two technological breakthroughs: 1 the isolation and culture of embryonic stem cells (ES, which have the unique ability to colonize all the tissues of an host embryo including its germ line; 2 the development of methods allowing homologous recombination between an incoming DNA and its cognate chromosomal sequence (gene ''targeting''. As a result, it has become possible to create mice bearing null mutations in any cloned gene (knock-out mice. Such a possibility has revolutionized the genetic approach of almost all aspects of the biology of the mouse. In recent years, the scope of gene targeting has been widened even more, due to the refinement of the knock-out technology: other types of genetic modifications may now be created, including subtle mutations (point mutations, micro deletions or insertions, etc. and chromosomal rearrangements such as large deletions, duplications and translocations. Finally, methods have been devised which permit the creation of conditional mutations, allowing the study of gene function throughout the life of an animal, when gene inactivation entails embryonic lethality. In this paper, we present an overview of the methods and scenarios used for the programmed modification of mouse genome, and we underline their enormous interest for the study of mammalian biology.

  18. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    Science.gov (United States)

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by artichoke accessions, as templates. PMID:27648830

  19. The dose of HBV genome contained plasmid has a great impact on HBV persistence in hydrodynamic injection mouse model.

    Science.gov (United States)

    Li, Lei; Li, Sheng; Zhou, Yun; Yang, Lu; Zhou, Di; Yang, Yan; Lu, Mengji; Yang, Dongliang; Song, Jingjiao

    2017-10-25

    Hydrodynamic injection (HI) of hepatitis B virus (HBV) mouse model is an useful tool for HBV related research in vivo. However, only 40% of C57/BL6 mice injected with 10 μg HBV genome contained plasmid (pAAV-HBV1.2), serum HBsAg more than 6 months and none of the BALB/c mice injected with 10 μg pAAV-HBV1.2 plasmid DNA, serum HBsAg positive more than 4 weeks in the previous study. In this study, C57/BL6 and BALB/c mice were hydrodynamic injected with different doses of pAAV-HBV1.2 plasmid DNA. HBV related serum markers were detected by ELISA. ALT levels in the serum were measured using full automated biochemistry analyzer. HBcAg positive cells in the liver were detected by immunohistochemical staining. The mRNA levels of IRF3, ISGs including ISG15, OAS, PKR and immune factors including IFNγ, TNFα, TGFβ, IL-6, IL-10, PDL1 in liver of the mice were quantified by qRT-PCR. The results showed that the mice injected with 100 μg high-concentration or 1 μg low-concentration of pAAV-HBV1.2 plasmid DNA did not excert dominant influence on HBV persistence. In contrast, injection of 5 μg intermediate-dose of pAAV-HBV1.2 plasmid DNA led to significant prolonged HBsAg expression and HBV persistence in both C57/BL6 (80% of the mice with HBsAg positive more than 6 months) and BALB/c (60% of the mice with HBsAg positive more than 3 months) mice. IFNγ was significant up-regulated in liver of the mice injected with 1 μg or 100 μg pAAV-HBV1.2 plasmid DNA. TNFα was up-regulated significantly in liver of the mice injected with 100 μg pAAV-HBV1.2 plasmid DNA. Moreover, PDL1 was significant up-regulated in liver of the mice injected with 5 μg pAAV-HBV1.2 plasmid DNA. In this paper we demonstrated that, in the HBV HI mouse model, the concentration of injected pAAV-HBV1.2 plasmid DNA contributes to the diverse kinetics of HBsAg and HBeAg in the serum as well as HBcAg expression level in the liver, which then determined the HBV persisternce, while the antiviral

  20. Genome-Wide Profiling of Liver X Receptor, Retinoid X Receptor, and Peroxisome Proliferator-Activated Receptor α in Mouse Liver Reveals Extensive Sharing of Binding Sites

    DEFF Research Database (Denmark)

    Boergesen, Michael; Pedersen, Thomas Åskov; Gross, Barbara

    2012-01-01

    and correlate with an LXR-dependent hepatic induction of lipogenic genes. To further investigate the roles of RXR and LXR in the regulation of hepatic gene expression, we have mapped the ligand-regulated genome-wide binding of these factors in mouse liver. We find that the RXR agonist bexarotene primarily......The liver X receptors (LXRs) are nuclear receptors that form permissive heterodimers with retinoid X receptor (RXR) and are important regulators of lipid metabolism in the liver. We have recently shown that RXR agonist-induced hypertriglyceridemia and hepatic steatosis in mice are dependent on LXRs...

  1. Dysregulation of mitotic machinery genes precedes genome instability during spontaneous pre-malignant transformation of mouse ovarian surface epithelial cells

    Directory of Open Access Journals (Sweden)

    Ulises Urzúa

    2016-10-01

    Full Text Available Abstract Background Based in epidemiological evidence, repetitive ovulation has been proposed to play a role in the origin of ovarian cancer by inducing an aberrant wound rupture-repair process of the ovarian surface epithelium (OSE. Accordingly, long term cultures of isolated OSE cells undergo in vitro spontaneous transformation thus developing tumorigenic capacity upon extensive subcultivation. In this work, C57BL/6 mouse OSE (MOSE cells were cultured up to passage 28 and their RNA and DNA copy number profiles obtained at passages 2, 5, 7, 10, 14, 18, 23, 25 and 28 by means of DNA microarrays. Gene ontology, pathway and network analyses were focused in passages earlier than 20, which is a hallmark of malignancy in this model. Results At passage 14, 101 genes were up-regulated in absence of significant DNA copy number changes. Among these, the top-3 enriched functions (>30 fold, adj p < 0.05 comprised 7 genes coding for centralspindlin, chromosome passenger and minichromosome maintenance protein complexes. The genes Ccnb1 (Cyclin B1, Birc5 (Survivin, Nusap1 and Kif23 were the most recurrent in over a dozen GO terms related to the mitotic process. On the other hand, Pten plus the large non-coding RNAs Malat1 and Neat1 were among the 80 down-regulated genes with mRNA processing, nuclear bodies, ER-stress response and tumor suppression as relevant terms. Interestingly, the earliest discrete segmental aneuploidies arose by passage 18 in chromosomes 7, 10, 11, 13, 15, 17 and 19. By passage 23, when MOSE cells express the malignant phenotype, the dysregulated gene expression repertoire expanded, DNA imbalances enlarged in size and covered additional loci. Conclusion Prior to early aneuploidies, overexpression of genes coding for the mitotic apparatus in passage-14 pre-malignant MOSE cells indicate an increased proliferation rate suggestive of replicative stress. Concomitant down-regulation of nuclear bodies and RNA processing related genes

  2. Development of a Method to Implement Whole-Genome Bisulfite Sequencing of cfDNA from Cancer Patients and a Mouse Tumor Model

    Directory of Open Access Journals (Sweden)

    Elaine C. Maggi

    2018-01-01

    Full Text Available The goal of this study was to develop a method for whole genome cell-free DNA (cfDNA methylation analysis in humans and mice with the ultimate goal to facilitate the identification of tumor derived DNA methylation changes in the blood. Plasma or serum from patients with pancreatic neuroendocrine tumors or lung cancer, and plasma from a murine model of pancreatic adenocarcinoma was used to develop a protocol for cfDNA isolation, library preparation and whole-genome bisulfite sequencing of ultra low quantities of cfDNA, including tumor-specific DNA. The protocol developed produced high quality libraries consistently generating a conversion rate >98% that will be applicable for the analysis of human and mouse plasma or serum to detect tumor-derived changes in DNA methylation.

  3. Conserved cis-regulatory regions in a large genomic landscape control SHH and BMP-regulated Gremlin1 expression in mouse limb buds

    Directory of Open Access Journals (Sweden)

    Zuniga Aimée

    2012-08-01

    Full Text Available Abstract Background Mouse limb bud is a prime model to study the regulatory interactions that control vertebrate organogenesis. Major aspects of limb bud development are controlled by feedback loops that define a self-regulatory signalling system. The SHH/GREM1/AER-FGF feedback loop forms the core of this signalling system that operates between the posterior mesenchymal organiser and the ectodermal signalling centre. The BMP antagonist Gremlin1 (GREM1 is a critical node in this system, whose dynamic expression is controlled by BMP, SHH, and FGF signalling and key to normal progression of limb bud development. Previous analysis identified a distant cis-regulatory landscape within the neighbouring Formin1 (Fmn1 locus that is required for Grem1 expression, reminiscent of the genomic landscapes controlling HoxD and Shh expression in limb buds. Results Three highly conserved regions (HMCO1-3 were identified within the previously defined critical genomic region and tested for their ability to regulate Grem1 expression in mouse limb buds. Using a combination of BAC and conventional transgenic approaches, a 9 kb region located ~70 kb downstream of the Grem1 transcription unit was identified. This region, termed Grem1 Regulatory Sequence 1 (GRS1, is able to recapitulate major aspects of Grem1 expression, as it drives expression of a LacZ reporter into the posterior and, to a lesser extent, in the distal-anterior mesenchyme. Crossing the GRS1 transgene into embryos with alterations in the SHH and BMP pathways established that GRS1 depends on SHH and is modulated by BMP signalling, i.e. integrates inputs from these pathways. Chromatin immunoprecipitation revealed interaction of endogenous GLI3 proteins with the core cis-regulatory elements in the GRS1 region. As GLI3 is a mediator of SHH signal transduction, these results indicated that SHH directly controls Grem1 expression through the GRS1 region. Finally, all cis-regulatory regions within the Grem1

  4. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    Science.gov (United States)

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  5. Using the Pathogen-Host Interactions database (PHI-base to investigate plant pathogen genomes and genes implicated in virulence

    Directory of Open Access Journals (Sweden)

    Martin eUrban

    2015-08-01

    Full Text Available New pathogen-host interaction mechanisms can be revealed by integrating mutant phenotype data with genetic information. PHI-base is a multi-species manually curated database combining peer-reviewed published phenotype data from plant and animal pathogens and gene/protein information in a single database.

  6. BGDB: a database of bivalent genes.

    Science.gov (United States)

    Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua

    2013-01-01

    Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/

  7. Database Description - RED | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RED Alternative name Rice Expression Database...enome Research Unit Shoshi Kikuchi E-mail : Database classification Plant databases - Rice Database classifi...cation Microarray, Gene Expression Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database descripti... Article title: Rice Expression Database: the gateway to rice functional genomics...nt Science (2002) Dec 7 (12):563-564 External Links: Original website information Database maintenance site

  8. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  9. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

    Science.gov (United States)

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID

  10. Database Description - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RMG Alternative name ...raki 305-8602, Japan National Institute of Agrobiological Sciences E-mail : Database... classification Nucleotide Sequence Databases Organism Taxonomy Name: Oryza sativa Japonica Group Taxonomy ID: 39947 Database...rnal: Mol Genet Genomics (2002) 268: 434–445 External Links: Original website information Database...available URL of Web services - Need for user registration Not available About This Database Database Descri

  11. Database Description - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name KOME Alternative nam... Sciences Plant Genome Research Unit Shoshi Kikuchi E-mail : Database classification Plant databases - Rice ...Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description Information about approximately ...Hayashizaki Y, Kikuchi S. Journal: PLoS One. 2007 Nov 28; 2(11):e1235. External Links: Original website information Database...OS) Rice mutant panel database (Tos17) A Database of Plant Cis-acting Regulatory

  12. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

    Science.gov (United States)

    Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

    2014-04-15

    Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

  13. DNA repair efficiency in germ cells and early mouse embryos and consequences for radiation-induced transgenerational genomic damage

    Energy Technology Data Exchange (ETDEWEB)

    Marchetti, Francesco; Wyrobek, Andrew J.

    2009-01-18

    Exposure to ionizing radiation and other environmental agents can affect the genomic integrity of germ cells and induce adverse health effects in the progeny. Efficient DNA repair during gametogenesis and the early embryonic cycles after fertilization is critical for preventing transmission of DNA damage to the progeny and relies on maternal factors stored in the egg before fertilization. The ability of the maternal repair machinery to repair DNA damage in both parental genomes in the fertilizing egg is especially crucial for the fertilizing male genome that has not experienced a DNA repair-competent cellular environment for several weeks prior to fertilization. During the DNA repair-deficient period of spermatogenesis, DNA lesions may accumulate in sperm and be carried into the egg where, if not properly repaired, could result in the formation of heritable chromosomal aberrations or mutations and associated birth defects. Studies with female mice deficient in specific DNA repair genes have shown that: (i) cell cycle checkpoints are activated in the fertilized egg by DNA damage carried by the sperm; and (ii) the maternal genotype plays a major role in determining the efficiency of repairing genomic lesions in the fertilizing sperm and directly affect the risk for abnormal reproductive outcomes. There is also growing evidence that implicates DNA damage carried by the fertilizing gamete as a mediator of postfertilization processes that contribute to genomic instability in subsequent generations. Transgenerational genomic instability most likely involves epigenetic mechanisms or error-prone DNA repair processes in the early embryo. Maternal and embryonic DNA repair processes during the early phases of mammalian embryonic development can have far reaching consequences for the genomic integrity and health of subsequent generations.

  14. Comparison of gene coverage of mouse oligonucleotide microarray platforms

    Directory of Open Access Journals (Sweden)

    Medrano Juan F

    2006-03-01

    Full Text Available Abstract Background The increasing use of DNA microarrays for genetical genomics studies generates a need for platforms with complete coverage of the genome. We have compared the effective gene coverage in the mouse genome of different commercial and noncommercial oligonucleotide microarray platforms by performing an in-house gene annotation of probes. We only used information about probes that is available from vendors and followed a process that any researcher may take to find the gene targeted by a given probe. In order to make consistent comparisons between platforms, probes in each microarray were annotated with an Entrez Gene id and the chromosomal position for each gene was obtained from the UCSC Genome Browser Database. Gene coverage was estimated as the percentage of Entrez Genes with a unique position in the UCSC Genome database that is tested by a given microarray platform. Results A MySQL relational database was created to store the mapping information for 25,416 mouse genes and for the probes in five microarray platforms (gene coverage level in parenthesis: Affymetrix430 2.0 (75.6%, ABI Genome Survey (81.24%, Agilent (79.33%, Codelink (78.09%, Sentrix (90.47%; and four array-ready oligosets: Sigma (47.95%, Operon v.3 (69.89%, Operon v.4 (84.03%, and MEEBO (84.03%. The differences in coverage between platforms were highly conserved across chromosomes. Differences in the number of redundant and unspecific probes were also found among arrays. The database can be queried to compare specific genomic regions using a web interface. The software used to create, update and query the database is freely available as a toolbox named ArrayGene. Conclusion The software developed here allows researchers to create updated custom databases by using public or proprietary information on genes for any organisms. ArrayGene allows easy comparisons of gene coverage between microarray platforms for any region of the genome. The comparison presented here

  15. Genome-wide Analysis of RARβ Transcriptional Targets in Mouse Striatum Links Retinoic Acid Signaling with Huntington's Disease and Other Neurodegenerative Disorders.

    Science.gov (United States)

    Niewiadomska-Cimicka, Anna; Krzyżosiak, Agnieszka; Ye, Tao; Podleśny-Drabiniok, Anna; Dembélé, Doulaye; Dollé, Pascal; Krężel, Wojciech

    2017-07-01

    Retinoic acid (RA) signaling through retinoic acid receptors (RARs), known for its multiple developmental functions, emerged more recently as an important regulator of adult brain physiology. How RAR-mediated regulation is achieved is poorly known, partly due to the paucity of information on critical target genes in the brain. Also, it is not clear how reduced RA signaling may contribute to pathophysiology of diverse neuropsychiatric disorders. We report the first genome-wide analysis of RAR transcriptional targets in the brain. Using chromatin immunoprecipitation followed by high-throughput sequencing and transcriptomic analysis of RARβ-null mutant mice, we identified genomic targets of RARβ in the striatum. Characterization of RARβ transcriptional targets in the mouse striatum points to mechanisms through which RAR may control brain functions and display neuroprotective activity. Namely, our data indicate with statistical significance (FDR 0.1) a strong contribution of RARβ in controlling neurotransmission, energy metabolism, and transcription, with a particular involvement of G-protein coupled receptor (p = 5.0e -5 ), cAMP (p = 4.5e -4 ), and calcium signaling (p = 3.4e -3 ). Many identified RARβ target genes related to these pathways have been implicated in Alzheimer's, Parkinson's, and Huntington's disease (HD), raising the possibility that compromised RA signaling in the striatum may be a mechanistic link explaining the similar affective and cognitive symptoms in these diseases. The RARβ transcriptional targets were particularly enriched for transcripts affected in HD. Using the R6/2 transgenic mouse model of HD, we show that partial sequestration of RARβ in huntingtin protein aggregates may account for reduced RA signaling reported in HD.

  16. Genome-wide and phase-specific DNA-binding rhythms of BMAL1 control circadian output functions in mouse liver.

    Directory of Open Access Journals (Sweden)

    Guillaume Rey

    2011-02-01

    Full Text Available The mammalian circadian clock uses interlocked negative feedback loops in which the heterodimeric basic helix-loop-helix transcription factor BMAL1/CLOCK is a master regulator. While there is prominent control of liver functions by the circadian clock, the detailed links between circadian regulators and downstream targets are poorly known. Using chromatin immunoprecipitation combined with deep sequencing we obtained a time-resolved and genome-wide map of BMAL1 binding in mouse liver, which allowed us to identify over 2,000 binding sites, with peak binding narrowly centered around Zeitgeber time 6. Annotation of BMAL1 targets confirms carbohydrate and lipid metabolism as the major output of the circadian clock in mouse liver. Moreover, transcription regulators are largely overrepresented, several of which also exhibit circadian activity. Genes of the core circadian oscillator stand out as strongly bound, often at promoter and distal sites. Genomic sequence analysis of the sites identified E-boxes and tandem E1-E2 consensus elements. Electromobility shift assays showed that E1-E2 sites are bound by a dimer of BMAL1/CLOCK heterodimers with a spacing-dependent cooperative interaction, a finding that was further validated in transactivation assays. BMAL1 target genes showed cyclic mRNA expression profiles with a phase distribution centered at Zeitgeber time 10. Importantly, sites with E1-E2 elements showed tighter phases both in binding and mRNA accumulation. Finally, analyzing the temporal profiles of BMAL1 binding, precursor mRNA and mature mRNA levels showed how transcriptional and post-transcriptional regulation contribute differentially to circadian expression phase. Together, our analysis of a dynamic protein-DNA interactome uncovered how genes of the core circadian oscillator crosstalk and drive phase-specific circadian output programs in a complex tissue.

  17. Genome-wide ENU mutagenesis in combination with high density SNP analysis and exome sequencing provides rapid identification of novel mouse models of developmental disease.

    Directory of Open Access Journals (Sweden)

    Georgina Caruana

    Full Text Available Mice harbouring gene mutations that cause phenotypic abnormalities during organogenesis are invaluable tools for linking gene function to normal development and human disorders. To generate mouse models harbouring novel alleles that are involved in organogenesis we conducted a phenotype-driven, genome-wide mutagenesis screen in mice using the mutagen N-ethyl-N-nitrosourea (ENU.ENU was injected into male C57BL/6 mice and the mutations transmitted through the germ-line. ENU-induced mutations were bred to homozygosity and G3 embryos screened at embryonic day (E 13.5 and E18.5 for abnormalities in limb and craniofacial structures, skin, blood, vasculature, lungs, gut, kidneys, ureters and gonads. From 52 pedigrees screened 15 were detected with anomalies in one or more of the structures/organs screened. Using single nucleotide polymorphism (SNP-based linkage analysis in conjunction with candidate gene or next-generation sequencing (NGS we identified novel recessive alleles for Fras1, Ift140 and Lig1.In this study we have generated mouse models in which the anomalies closely mimic those seen in human disorders. The association between novel mutant alleles and phenotypes will lead to a better understanding of gene function in normal development and establish how their dysfunction causes human anomalies and disease.

  18. A Genome-wide Gene-Expression Analysis and Database in Transgenic Mice during Development of Amyloid or Tau Pathology

    Directory of Open Access Journals (Sweden)

    Mar Matarin

    2015-02-01

    Full Text Available We provide microarray data comparing genome-wide differential expression and pathology throughout life in four lines of “amyloid” transgenic mice (mutant human APP, PSEN1, or APP/PSEN1 and “TAU” transgenic mice (mutant human MAPT gene. Microarray data were validated by qPCR and by comparison to human studies, including genome-wide association study (GWAS hits. Immune gene expression correlated tightly with plaques whereas synaptic genes correlated negatively with neurofibrillary tangles. Network analysis of immune gene modules revealed six hub genes in hippocampus of amyloid mice, four in common with cortex. The hippocampal network in TAU mice was similar except that Trem2 had hub status only in amyloid mice. The cortical network of TAU mice was entirely different with more hub genes and few in common with the other networks, suggesting reasons for specificity of cortical dysfunction in FTDP17. This Resource opens up many areas for investigation. All data are available and searchable at http://www.mouseac.org.

  19. Complete genome characterisation and phylogenetic position of Tigray hantavirus from the Ethiopian white-footed mouse, Stenocephalemys albipes

    Czech Academy of Sciences Publication Activity Database

    Goüy de Bellocq, Joëlle; Těšíková, Jana; Meheretu, Y.; Čížková, Dagmar; Bryjová, Anna; Leirs, H.; Bryja, Josef

    2016-01-01

    Roč. 45, November (2016), s. 242-245 ISSN 1567-1348 R&D Projects: GA ČR GCP502/11/J070 Institutional support: RVO:68081766 Keywords : Hantavirus * Murinae * Ethiopia * High throughput sequencing * Genomics Subject RIV: EE - Microbiology, Virology Impact factor: 2.885, year: 2016

  20. Developmental defects and genomic instability after x-irradiation of wild-type and genetically modified mouse pre-implantation and early post-implantation embryos

    International Nuclear Information System (INIS)

    Jacquet, P

    2012-01-01

    Results obtained from the end of the 1950s suggested that ionizing radiation could induce foetal malformations in some mouse strains when administered during early pre-implantation stages. Starting in 1989, data obtained in Germany also showed that radiation exposure during that period could lead to a genomic instability in the surviving foetuses. Furthermore, the same group reported that both malformations and genomic instability could be transmitted to the next generation foetuses after exposure of zygotes to relatively high doses of radiation. As such results were of concern for radiation protection, we investigated this in more detail during recent years, using mice with varying genetic backgrounds including mice heterozygous for mutations involved in important cellular processes like DNA repair, cell cycle regulation or apoptosis. The main parameters which were investigated included morphological development, genomic instability and gene expression in the irradiated embryos or their own progeny. The aim of this review is to critically reassess the results obtained in that field in the different laboratories and to try to draw general conclusions on the risks of developmental defects and genomic instability from an exposure of early embryos to moderate doses of ionizing radiation. Altogether and in the range of doses normally used in diagnostic radiology, the risk of induction of embryonic death and of congenital malformation following the irradiation of a newly fertilised egg is certainly very low when compared to the ‘spontaneous’ risks for such effects. Similarly, the risk of radiation induction of a genomic instability under such circumstances seems to be very small. However, this is not a reason to not apply some precaution principles when possible. One way of doing this is to restrict the use of higher dose examinations on all potentially pregnant women to the first ten days of their menstrual cycle when conception is very unlikely to have occurred

  1. Mining biological databases for candidate disease genes

    Science.gov (United States)

    Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

    2001-07-01

    The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

  2. Genomic organization and phylogenetic utility of deer mouse (Peromyscus maniculatus lymphotoxin-alpha and lymphotoxin-beta

    Directory of Open Access Journals (Sweden)

    Prescott Joseph

    2008-10-01

    Full Text Available Abstract Background Deer mice (Peromyscus maniculatus are among the most common mammals in North America and are important reservoirs of several human pathogens, including Sin Nombre hantavirus (SNV. SNV can establish a life-long apathogenic infection in deer mice, which can shed virus in excrement for transmission to humans. Patients that die from hantavirus cardiopulmonary syndrome (HCPS have been found to express several proinflammatory cytokines, including lymphotoxin (LT, in the lungs. It is thought that these cytokines contribute to the pathogenesis of HCPS. LT is not expressed by virus-specific CD4+ T cells from infected deer mice, suggesting a limited role for this pathway in reservoir responses to hantaviruses. Results We have cloned the genes encoding deer mouse LTα and LTβ and have found them to be highly similar to orthologous rodent sequences but with some differences in promoters elements. The phylogenetic analyses performed on the LTα, LTβ, and combined data sets yielded a strongly-supported sister-group relationship between the two murines (the house mouse and the rat. The deer mouse, a sigmodontine, appeared as the sister group to the murine clade in all of the analyses. High bootstrap values characterized the grouping of murids. Conclusion No conspicuous differences compared to other species are present in the predicted amino acid sequences of LTα or LTβ; however, some promoter differences were noted in LTβ. Although more extensive taxonomic sampling is required to confirm the results of our analyses, the preliminary findings indicate that both genes (analyzed both separately and in combination hold potential for resolving relationships among rodents and other mammals at the subfamily level.

  3. Linkage of cDNA expression profiles of mesencephalic dopaminergic neurons to a genome-wide in situ hybridization database

    Directory of Open Access Journals (Sweden)

    Simon Horst H

    2009-01-01

    Full Text Available Abstract Midbrain dopaminergic neurons are involved in control of emotion, motivation and motor behavior. The loss of one of the subpopulations, substantia nigra pars compacta, is the pathological hallmark of one of the most prominent neurological disorders, Parkinson's disease. Several groups have looked at the molecular identity of midbrain dopaminergic neurons and have suggested the gene expression profile of these neurons. Here, after determining the efficiency of each screen, we provide a linked database of the genes, expressed in this neuronal population, by combining and comparing the results of six previous studies and verification of expression of each gene in dopaminergic neurons, using the collection of in situ hybridization in the Allen Brain Atlas.

  4. SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

    Science.gov (United States)

    Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri

    2018-04-20

    Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The

  5. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grö tzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jö rg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant

  6. Advantages of using the CRISPR/Cas9 system of genome editing to investigate male reproductive mechanisms using mouse models.

    Science.gov (United States)

    Young, Samantha A M; Aitken, R John; Ikawa, Masahito

    2015-01-01

    Gene disruption technology has long been beneficial for the study of male reproductive biology. However, because of the time and cost involved, this technology was not a viable method except in specialist laboratories. The advent of the CRISPR/Cas9 system of gene disruption has ushered in a new era of genetic investigation. Now, it is possible to generate gene-disrupted mouse models in very little time and at very little cost. This Highlight article discusses the application of this technology to study the genetics of male fertility and looks at some of the future uses of this system that could be used to reveal the essential and nonessential genetic components of male reproductive mechanisms.

  7. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  8. Genomic sequences of murine gamma B- and gamma C-crystallin-encoding genes: promoter analysis and complete evolutionary pattern of mouse, rat and human gamma-crystallins.

    Science.gov (United States)

    Graw, J; Liebstein, A; Pietrowski, D; Schmitt-John, T; Werner, T

    1993-12-22

    The murine genes, gamma B-cry and gamma C-cry, encoding the gamma B- and gamma C-crystallins, were isolated from a genomic DNA library. The complete nucleotide (nt) sequences of both genes were determined from 661 and 711 bp, respectively, upstream from the first exon to the corresponding polyadenylation sites, comprising more than 2650 and 2890 bp, respectively. The new sequences were compared to the partial cDNA sequences available for the murine gamma B-cry and gamma C-cry, as well as to the corresponding genomic sequences from rat and man, at both the nt and predicted amino acid (aa) sequence levels. In the gamma B-cry promoter region, a canonical CCAAT-box, a TATA-box, putative NF-I and C/EBP sites were detected. An R-repeat is inserted 366 bp upstream from the transcription start point. In contrast, the gamma C-cry promoter does not contain a CCAAT-box, but some other putative binding sites for transcription factors (AP-2, UBP-1, LBP-1) were located by computer analysis. The promoter regions of all six gamma-cry from mouse, rat and human, except human psi gamma F-cry, were analyzed for common sequence elements. A complex sequence element of about 70-80 bp was found in the proximal promoter, which contains a gamma-cry-specific and almost invariant sequence (crygpel) of 14 nt, and ends with the also invariant TATA-box. Within the complex sequence element, a minimum of three further features specific for the gamma A-, gamma B- and gamma D/E/F-cry genes can be defined, at least two of which were recently shown to be functional. In addition to these four sequence elements, a subtype-specific structure of inverted repeats with different-sized spacers can be deduced from the multiple sequence alignment. A phylogenetic analysis based on the promoter region, as well as the complete exon 3 of all gamma-cry from mouse, rat and man, suggests separation of only five gamma-cry subtypes (gamma A-, gamma B-, gamma C-, gamma D- and gamma E/F-cry) prior to species separation.

  9. An Integrated Molecular Database on Indian Insects.

    Science.gov (United States)

    Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil

    2018-01-01

    MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.

  10. The database of chromosome imbalance regions and genes resided in lung cancer from Asian and Caucasian identified by array-comparative genomic hybridization

    Directory of Open Access Journals (Sweden)

    Lo Fang-Yi

    2012-06-01

    Full Text Available Abstract Background Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Methods Array-comparative genomic hybridization (array-CGH was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR, chromogenic in situ hybridization (CISH, reverse transcriptase-qPCR (RT-qPCR, and immunohistochemistry (IHC in more patients. Results We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1 functioning in Rho activity control, FRAT2 (10q24.1 involved in Wnt signaling, PAFAH1B1 (17p13.3 functioning in motility control, and ZNF322A (6p22.1 involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (PP=0.06. In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of

  11. The database of chromosome imbalance regions and genes resided in lung cancer from Asian and Caucasian identified by array-comparative genomic hybridization

    International Nuclear Information System (INIS)

    Lo, Fang-Yi; Nandi, Suvobroto; Salgia, Ravi; Wang, Yi-Ching; Chang, Jer-Wei; Chang, I-Shou; Chen, Yann-Jang; Hsu, Han-Shui; Huang, Shiu-Feng Kathy; Tsai, Fang-Yu; Jiang, Shih Sheng; Kanteti, Rajani

    2012-01-01

    Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Array-comparative genomic hybridization (array-CGH) was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR), chromogenic in situ hybridization (CISH), reverse transcriptase-qPCR (RT-qPCR), and immunohistochemistry (IHC) in more patients. We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1) functioning in Rho activity control, FRAT2 (10q24.1) involved in Wnt signaling, PAFAH1B1 (17p13.3) functioning in motility control, and ZNF322A (6p22.1) involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (P<0.001~P=0.06). In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of PAFAH1B1 protein overexpression was 68

  12. RatMap--rat genome tools and data.

    Science.gov (United States)

    Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M; Ståhl, Fredrik

    2005-01-01

    The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB-Genetics at Goteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided.

  13. Sequences within both the 5' UTR and Gag are required for optimal in vivo packaging and propagation of mouse mammary tumor virus (MMTV genomic RNA.

    Directory of Open Access Journals (Sweden)

    Farah Mustafa

    Full Text Available BACKGROUND: This study mapped regions of genomic RNA (gRNA important for packaging and propagation of mouse mammary tumor virus (MMTV. MMTV is a type B betaretrovirus which preassembles intracellularly, a phenomenon distinct from retroviruses that assemble the progeny virion at cell surface just before budding such as the type C human and feline immunodeficiency viruses (HIV and FIV. Studies of FIV and Mason-Pfizer monkey virus (MPMV, a type D betaretrovirus with similar intracellular virion assembly processes as MMTV, have shown that the 5' untranslated region (5' UTR and 5' end of gag constitute important packaging determinants for gRNA. METHODOLOGY: Three series of MMTV transfer vectors containing incremental amounts of gag or 5' UTR sequences, or incremental amounts of 5' UTR in the presence of 400 nucleotides (nt of gag were constructed to delineate the extent of 5' sequences that may be involved in MMTV gRNA packaging. Real time PCR measured the packaging efficiency of these vector RNAs into MMTV particles generated by co-transfection of MMTV Gag/Pol, vesicular stomatitis virus envelope glycoprotein (VSV-G Env, and individual transfer vectors into human 293T cells. Transfer vector RNA propagation was monitored by measuring transduction of target HeLaT4 cells following infection with viral particles containing a hygromycin resistance gene expression cassette on the packaged RNA. PRINCIPAL FINDINGS: MMTV requires the entire 5' UTR and a minimum of ~120 nucleotide (nt at the 5' end of gag for not only efficient gRNA packaging but also propagation of MMTV-based transfer vector RNAs. Vector RNAs without the entire 5' UTR were defective for both efficient packaging and propagation into target cells. CONCLUSIONS/SIGNIFICANCE: These results reveal that the 5' end of MMTV genome is critical for both gRNA packaging and propagation, unlike the recently delineated FIV and MPMV packaging determinants that have been shown to be of bipartite nature.

  14. Effect of Duplicate Genes on Mouse Genetic Robustness: An Update

    Directory of Open Access Journals (Sweden)

    Zhixi Su

    2014-01-01

    Full Text Available In contrast to S. cerevisiae and C. elegans, analyses based on the current knockout (KO mouse phenotypes led to the conclusion that duplicate genes had almost no role in mouse genetic robustness. It has been suggested that the bias of mouse KO database toward ancient duplicates may possibly cause this knockout duplicate puzzle, that is, a very similar proportion of essential genes (PE between duplicate genes and singletons. In this paper, we conducted an extensive and careful analysis for the mouse KO phenotype data and corroborated a strong effect of duplicate genes on mouse genetics robustness. Moreover, the effect of duplicate genes on mouse genetic robustness is duplication-age dependent, which holds after ruling out the potential confounding effect from coding-sequence conservation, protein-protein connectivity, functional bias, or the bias of duplicates generated by whole genome duplication (WGD. Our findings suggest that two factors, the sampling bias toward ancient duplicates and very ancient duplicates with a proportion of essential genes higher than that of singletons, have caused the mouse knockout duplicate puzzle; meanwhile, the effect of genetic buffering may be correlated with sequence conservation as well as protein-protein interactivity.

  15. Renal cell tumors with clear cell histology and intact VHL and chromosome 3p: a histological review of tumors from the Cancer Genome Atlas database.

    Science.gov (United States)

    Favazza, Laura; Chitale, Dhananjay A; Barod, Ravi; Rogers, Craig G; Kalyana-Sundaram, Shanker; Palanisamy, Nallasivam; Gupta, Nilesh S; Williamson, Sean R

    2017-11-01

    Clear cell renal cell carcinoma is by far the most common form of kidney cancer; however, a number of histologically similar tumors are now recognized and considered distinct entities. The Cancer Genome Atlas published data set was queried (http://cbioportal.org) for clear cell renal cell carcinoma tumors lacking VHL gene mutation and chromosome 3p loss, for which whole-slide images were reviewed. Of the 418 tumors in the published Cancer Genome Atlas clear cell renal cell carcinoma database, 387 had VHL mutation, copy number loss for chromosome 3p, or both (93%). Of the remaining, 27/31 had whole-slide images for review. One had 3p loss based on karyotype but not sequencing, and three demonstrated VHL promoter hypermethylation. Nine could be reclassified as distinct or emerging entities: translocation renal cell carcinoma (n=3), TCEB1 mutant renal cell carcinoma (n=3), papillary renal cell carcinoma (n=2), and clear cell papillary renal cell carcinoma (n=1). Of the remaining, 6 had other clear cell renal cell carcinoma-associated gene alterations (PBRM1, SMARCA4, BAP1, SETD2), leaving 11 specimens, including 2 high-grade or sarcomatoid renal cell carcinomas and 2 with prominent fibromuscular stroma (not TCEB1 mutant). One of the remaining tumors exhibited gain of chromosome 7 but lacked histological features of papillary renal cell carcinoma. Two tumors previously reported to harbor TFE3 gene fusions also exhibited VHL mutation, chromosome 3p loss, and morphology indistinguishable from clear cell renal cell carcinoma, the significance of which is uncertain. In summary, almost all clear cell renal cell carcinomas harbor VHL mutation, 3p copy number loss, or both. Of tumors with clear cell histology that lack these alterations, a subset can now be reclassified as other entities. Further study will determine whether additional entities exist, based on distinct genetic pathways that may have implications for treatment.

  16. Genomic Testing

    Science.gov (United States)

    ... this database. Top of Page Evaluation of Genomic Applications in Practice and Prevention (EGAPP™) In 2004, the Centers for Disease Control and Prevention launched the EGAPP initiative to establish and test a ... and other applications of genomic technology that are in transition from ...

  17. Site-specific modification of genome with cell-permeable Cre fusion protein in preimplantation mouse embryo

    International Nuclear Information System (INIS)

    Kim, Kyoungmi; Kim, Hwain; Lee, Daekee

    2009-01-01

    Site-specific recombination (SSR) by Cre recombinase and its target sequence, loxP, is a valuable tool in genetic analysis of gene function. Recently, several studies reported successful application of Cre fusion protein containing protein transduction peptide for inducing gene modification in various mammalian cells including ES cell as well as in the whole animal. In this study, we show that a short incubation of preimplantation mouse embryos with purified cell-permeable Cre fusion protein results in efficient SSR. X-Gal staining of preimplantation embryos, heterozygous for Gtrosa26 tm1Sor , revealed that treatment of 1-cell or 2-cell embryos with 3 μM of Cre fusion protein for 2 h leads to Cre-mediated excision in 70-85% of embryos. We have examined the effect of the concentration of the Cre fusion protein and the duration of the treatment on embryonic development, established a condition for full term development and survival to adulthood, and demonstrated the germ line transmission of excised Gtrosa26 allele. Potential applications and advantages of the highly efficient technique described here are discussed.

  18. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    Science.gov (United States)

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

  19. Registered plant list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...the Plant DB link list in simple search page) Genome analysis methods Presence or... absence of Genome analysis methods information in this DB (link to the Genome analysis methods information ...base Site Policy | Contact Us Registered plant list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  20. Genome-wide screen for salmonella genes required for long-term systemic infection of the mouse.

    Directory of Open Access Journals (Sweden)

    2006-02-01

    Full Text Available A microarray-based negative selection screen was performed to identify Salmonella enterica serovar Typhimurium (serovar Typhimurium genes that contribute to long-term systemic infection in 129X1/SvJ (Nramp1(r mice. A high-complexity transposon-mutagenized library was used to infect mice intraperitoneally, and the selective disappearance of mutants was monitored after 7, 14, 21, and 28 d postinfection. One hundred and eighteen genes were identified to contribute to serovar Typhimurium infection of the spleens of mice by 28 d postinfection. The negatively selected mutants represent many known aspects of Salmonella physiology and pathogenesis, although the majority of the identified genes are of putative or unknown function. Approximately 30% of the negatively selected genes correspond to horizontally acquired regions such as those within Salmonella pathogenicity islands (SPI 1-5, prophages (Gifsy-1 and -2 and remnant, and the pSLT virulence plasmid. In addition, mutations in genes responsible for outer membrane structure and remodeling, such as LPS- and PhoP-regulated and fimbrial genes, were also selected against. Competitive index experiments demonstrated that the secreted SPI2 effectors SseK2 and SseJ as well as the SPI4 locus are attenuated relative to wild-type bacteria during systemic infection. Interestingly, several SPI1-encoded type III secretion system effectors/translocases are required by serovar Typhimurium to establish and, unexpectedly, to persist systemically, challenging the present description of Salmonella pathogenesis. Moreover, we observed a progressive selection against serovar Typhimurium mutants based upon the duration of the infection, suggesting that different classes of genes may be required at distinct stages of infection. Overall, these data indicate that Salmonella long-term systemic infection in the mouse requires a diverse repertoire of virulence factors. This diversity of genes presumably reflects the fact that

  1. Genome-wide analysis of DHEA- and DHT-induced gene expression in mouse hypothalamus and hippocampus.

    Science.gov (United States)

    Mo, Qianxing; Lu, Shifang; Garippa, Carrie; Brownstein, Michael J; Simon, Neal G

    2009-04-01

    Dehydroepiandrosterone (DHEA) is the most abundant steroid in humans and a multi-functional neuroactive steroid that has been implicated in a variety of biological effects in both the periphery and central nervous system. Mechanistic studies of DHEA in the periphery have emphasized its role as a prohormone and those in the brain have focused on effects exerted at cell surface receptors. Recent results demonstrated that DHEA is intrinsically androgenic. It competes with DHT for binding to androgen receptor (AR), induces AR-regulated reporter gene expression in vitro, and exogenous DHEA administration regulates gene expression in peripheral androgen-dependent tissues and LnCAP prostate cancer cells, indicating genomic effects and adding a level of complexity to functional models. The absence of information about the effect of DHEA on gene expression in the CNS is a significant gap in light of continuing clinical interest in the compound as a hormone replacement therapy in older individuals, patients with adrenal insufficiency, and as a treatment that improves sense of well-being, increases libido, relieves depressive symptoms, and serves as a neuroprotective agent. In the present study, ovariectomized CF-1 female mice, an established model for assessing CNS effects of androgens, were treated with DHEA (1mg/day), dihydrotestosterone (DHT, a potent androgen used as a positive control; 0.1mg/day) or vehicle (negative control) for 7 days. The effects of DHEA on gene expression were assessed in two regions of the CNS that are enriched in AR, hypothalamus and hippocampus, using DNA microarray, real-time RT-PCR, and immunohistochemistry. RIA of serum samples assessed treatment effects on circulating levels of major steroids. In hypothalamus, DHEA and DHT significantly up-regulated the gene expression of hypocretin (Hcrt; also called orexin), pro-melanin-concentrating hormone (Pmch), and protein kinase C delta (Prkcd), and down-regulated the expression of deleted in bladder

  2. Genome-wide mouse mutagenesis reveals CD45-mediated T cell function as critical in protective immunity to HSV-1.

    Directory of Open Access Journals (Sweden)

    Grégory Caignard

    2013-09-01

    Full Text Available Herpes simplex encephalitis (HSE is a lethal neurological disease resulting from infection with Herpes Simplex Virus 1 (HSV-1. Loss-of-function mutations in the UNC93B1, TLR3, TRIF, TRAF3, and TBK1 genes have been associated with a human genetic predisposition to HSE, demonstrating the UNC93B-TLR3-type I IFN pathway as critical in protective immunity to HSV-1. However, the TLR3, UNC93B1, and TRIF mutations exhibit incomplete penetrance and represent only a minority of HSE cases, perhaps reflecting the effects of additional host genetic factors. In order to identify new host genes, proteins and signaling pathways involved in HSV-1 and HSE susceptibility, we have implemented the first genome-wide mutagenesis screen in an in vivo HSV-1 infectious model. One pedigree (named P43 segregated a susceptible trait with a fully penetrant phenotype. Genetic mapping and whole exome sequencing led to the identification of the causative nonsense mutation L3X in the Receptor-type tyrosine-protein phosphatase C gene (Ptprc(L3X, which encodes for the tyrosine phosphatase CD45. Expression of MCP1, IL-6, MMP3, MMP8, and the ICP4 viral gene were significantly increased in the brain stems of infected Ptprc(L3X mice accounting for hyper-inflammation and pathological damages caused by viral replication. Ptprc(L3X mutation drastically affects the early stages of thymocytes development but also the final stage of B cell maturation. Transfer of total splenocytes from heterozygous littermates into Ptprc(L3X mice resulted in a complete HSV-1 protective effect. Furthermore, T cells were the only cell population to fully restore resistance to HSV-1 in the mutants, an effect that required both the CD4⁺ and CD8⁺ T cells and could be attributed to function of CD4⁺ T helper 1 (Th1 cells in CD8⁺ T cell recruitment to the site of infection. Altogether, these results revealed the CD45-mediated T cell function as potentially critical for infection and viral spread to the

  3. Structural basis of genomic RNA (gRNA) dimerization and packaging determinants of mouse mammary tumor virus (MMTV).

    Science.gov (United States)

    Aktar, Suriya J; Vivet-Boudou, Valérie; Ali, Lizna M; Jabeen, Ayesha; Kalloush, Rawan M; Richer, Delphine; Mustafa, Farah; Marquet, Roland; Rizvi, Tahir A

    2014-11-14

    One of the hallmarks of retroviral life cycle is the efficient and specific packaging of two copies of retroviral gRNA in the form of a non-covalent RNA dimer by the assembling virions. It is becoming increasingly clear that the process of dimerization is closely linked with gRNA packaging, and in some retroviruses, the latter depends on the former. Earlier mutational analysis of the 5' end of the MMTV genome indicated that MMTV gRNA packaging determinants comprise sequences both within the 5' untranslated region (5' UTR) and the beginning of gag. The RNA secondary structure of MMTV gRNA packaging sequences was elucidated employing selective 2'hydroxyl acylation analyzed by primer extension (SHAPE). SHAPE analyses revealed the presence of a U5/Gag long-range interaction (U5/Gag LRI), not predicted by minimum free-energy structure predictions that potentially stabilizes the global structure of this region. Structure conservation along with base-pair covariations between different strains of MMTV further supported the SHAPE-validated model. The 5' region of the MMTV gRNA contains multiple palindromic (pal) sequences that could initiate intermolecular interaction during RNA dimerization. In vitro RNA dimerization, SHAPE analysis, and structure prediction approaches on a series of pal mutants revealed that MMTV RNA utilizes a palindromic point of contact to initiate intermolecular interactions between two gRNAs, leading to dimerization. This contact point resides within pal II (5' CGGCCG 3') at the 5' UTR and contains a canonical "GC" dyad and therefore likely constitutes the MMTV RNA dimerization initiation site (DIS). Further analyses of these pal mutants employing in vivo genetic approaches indicate that pal II, as well as pal sequences located in the primer binding site (PBS) are both required for efficient MMTV gRNA packaging. Employing structural prediction, biochemical, and genetic approaches, we show that pal II functions as a primary point of contact between

  4. License - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...t list, Marker list, QTL list, Plant DB link & Genome analysis methods © Satoshi ... Policy | Contact Us License - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  5. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    Directory of Open Access Journals (Sweden)

    Charles Richard Bradshaw

    Full Text Available Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10, a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in

  6. An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database

    International Nuclear Information System (INIS)

    Kobayashi, Naohiro; Harano, Yoko; Tochio, Naoya; Nakatani, Eiichi; Kigawa, Takanori; Yokoyama, Shigeyuki; Mading, Steve; Ulrich, Eldon L.; Markley, John L.; Akutsu, Hideo; Fujiwara, Toshimichi

    2012-01-01

    Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including 1 H, 13 C and 15 N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001–2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.

  7. Did androgen-binding protein paralogs undergo neo- and/or Subfunctionalization as the Abp gene region expanded in the mouse genome?

    Science.gov (United States)

    Karn, Robert C; Chung, Amanda G; Laukaitis, Christina M

    2014-01-01

    The Androgen-binding protein (Abp) region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1) no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2) substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3) that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution.

  8. The mouse-human anatomy ontology mapping project.

    Science.gov (United States)

    Hayamizu, Terry F; de Coronado, Sherri; Fragoso, Gilberto; Sioutos, Nicholas; Kadin, James A; Ringwald, Martin

    2012-01-01

    The overall objective of the Mouse-Human Anatomy Project (MHAP) was to facilitate the mapping and harmonization of anatomical terms used for mouse and human models by Mouse Genome Informatics (MGI) and the National Cancer Institute (NCI). The anatomy resources designated for this study were the Adult Mouse Anatomy (MA) ontology and the set of anatomy concepts contained in the NCI Thesaurus (NCIt). Several methods and software tools were identified and evaluated, then used to conduct an in-depth comparative analysis of the anatomy ontologies. Matches between mouse and human anatomy terms were determined and validated, resulting in a highly curated set of mappings between the two ontologies that has been used by other resources. These mappings will enable linking of data from mouse and human. As the anatomy ontologies have been expanded and refined, the mappings have been updated accordingly. Insights are presented into the overall process of comparing and mapping between ontologies, which may prove useful for further comparative analyses and ontology mapping efforts, especially those involving anatomy ontologies. Finally, issues concerning further development of the ontologies, updates to the mapping files, and possible additional applications and significance were considered. DATABASE URL: http://obofoundry.org/cgi-bin/detail.cgi?id=ma2ncit.

  9. MIPS plant genome information resources.

    Science.gov (United States)

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  10. Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ster List) Data detail Data name Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap...se Site Policy | Contact Us Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) - GETDB | LSDB Archive ... ...stering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Clu...switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us GETDB Clu

  11. Database Description - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name GETDB Alternative n...ame Gal4 Enhancer Trap Insertion Database DOI 10.18908/lsdba.nbdc00236-000 Creator Creator Name: Shigeo Haya... Chuo-ku, Kobe 650-0047 Tel: +81-78-306-3185 FAX: +81-78-306-3183 E-mail: Database classification Expression... Invertebrate genome database Organism Taxonomy Name: Drosophila melanogaster Taxonomy ID: 7227 Database des...riginal website information Database maintenance site Drosophila Genetic Resource

  12. QTL list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...Policy | Contact Us QTL list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  13. Plant DB link - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...e Site Policy | Contact Us Plant DB link - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  14. Combined genome-wide expression profiling and targeted RNA interference in primary mouse macrophages reveals perturbation of transcriptional networks associated with interferon signalling

    Directory of Open Access Journals (Sweden)

    Craigon Marie

    2009-08-01

    Full Text Available Abstract Background Interferons (IFNs are potent antiviral cytokines capable of reprogramming the macrophage phenotype through the induction of interferon-stimulated genes (ISGs. Here we have used targeted RNA interference to suppress the expression of a number of key genes associated with IFN signalling in murine macrophages prior to stimulation with interferon-gamma. Genome-wide changes in transcript abundance caused by siRNA activity were measured using exon-level microarrays in the presence or absence of IFNγ. Results Transfection of murine bone-marrow derived macrophages (BMDMs with a non-targeting (control siRNA and 11 sequence-specific siRNAs was performed using a cationic lipid transfection reagent (Lipofectamine2000 prior to stimulation with IFNγ. Total RNA was harvested from cells and gene expression measured on Affymetrix GeneChip Mouse Exon 1.0 ST Arrays. Network-based analysis of these data revealed six siRNAs to cause a marked shift in the macrophage transcriptome in the presence or absence IFNγ. These six siRNAs targeted the Ifnb1, Irf3, Irf5, Stat1, Stat2 and Nfkb2 transcripts. The perturbation of the transcriptome by the six siRNAs was highly similar in each case and affected the expression of over 600 downstream transcripts. Regulated transcripts were clustered based on co-expression into five major groups corresponding to transcriptional networks associated with the type I and II IFN response, cell cycle regulation, and NF-KB signalling. In addition we have observed a significant non-specific immune stimulation of cells transfected with siRNA using Lipofectamine2000, suggesting use of this reagent in BMDMs, even at low concentrations, is enough to induce a type I IFN response. Conclusion Our results provide evidence that the type I IFN response in murine BMDMs is dependent on Ifnb1, Irf3, Irf5, Stat1, Stat2 and Nfkb2, and that siRNAs targeted to these genes results in perturbation of key transcriptional networks associated

  15. 9. international mouse genome conference

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1995-12-31

    This conference was held November 12--16, 1995 in Ann Arbor, Michigan. The purpose of this conference was to provide a multidisciplinary forum for exchange of state-of-the-art information on genetic mapping in mice. This report contains abstracts of presentations, focusing on the following areas: mutation identification; comparative mapping; informatics and complex traits; mutagenesis; gene identification and new technology; and genetic and physical mapping.

  16. Legume and Lotus japonicus Databases

    DEFF Research Database (Denmark)

    Hirakawa, Hideki; Mun, Terry; Sato, Shusei

    2014-01-01

    Since the genome sequence of Lotus japonicus, a model plant of family Fabaceae, was determined in 2008 (Sato et al. 2008), the genomes of other members of the Fabaceae family, soybean (Glycine max) (Schmutz et al. 2010) and Medicago truncatula (Young et al. 2011), have been sequenced. In this sec....... In this section, we introduce representative, publicly accessible online resources related to plant materials, integrated databases containing legume genome information, and databases for genome sequence and derived marker information of legume species including L. japonicus...

  17. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice.

    Science.gov (United States)

    Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K

    2015-01-01

    Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data.

    Science.gov (United States)

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-10-01

    Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.

  19. HOLLYWOOD: a comparative relational database of alternative splicing.

    Science.gov (United States)

    Holste, Dirk; Huo, George; Tung, Vivian; Burge, Christopher B

    2006-01-01

    RNA splicing is an essential step in gene expression, and is often variable, giving rise to multiple alternatively spliced mRNA and protein isoforms from a single gene locus. The design of effective databases to support experimental and computational investigations of alternative splicing (AS) is a significant challenge. In an effort to integrate accurate exon and splice site annotation with current knowledge about splicing regulatory elements and predicted AS events, and to link information about the splicing of orthologous genes in different species, we have developed the Hollywood system. This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags, and links features such as splice site sequence and strength, exonic splicing enhancers and silencers, conserved and non-conserved patterns of splicing, and cDNA library information for inferred alternative exons. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene. A streamlined graphical representation of gene splicing patterns is provided, and these patterns can alternatively be layered onto existing information in the UCSC Genome Browser. The database is accessible at http://hollywood.mit.edu.

  20. Ageing, chronic alcohol consumption and folate are determinants of genomic DNA methylation, p16 promoter methylation and the expression of p16 in the mouse colon

    Science.gov (United States)

    Elder age and chronic alcohol consumption are important risk factors for the development of colon cancer. Each factor can alter genomic and gene-specific DNA methylation. This study examined the effects of aging and chronic alcohol consumption on genomic and p16-specific methylation, and p16 express...

  1. Aging and chronic alcohol consumption are determinants of p16 gene expression, genomic DNA methylation and p16 promoter methylation in the mouse colon

    Science.gov (United States)

    Elder age and chronic alcohol consumption are important risk factors for the development of colon cancer. Each factor can alter genomic and gene-specific DNA methylation. This study examined the effects of aging and chronic alcohol consumption on genomic and p16-specific methylation, and p16 express...

  2. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes

    DEFF Research Database (Denmark)

    Treu, Laura; Kougias, Panagiotis; Campanaro, Stefano

    2016-01-01

    strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome......This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning...... is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common...

  3. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

    Science.gov (United States)

    Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

    2016-09-01

    This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Quantitative LC-MS Provides No Evidence for m6 dA or m4 dC in the Genome of Mouse Embryonic Stem Cells and Tissues.

    Science.gov (United States)

    Schiffers, Sarah; Ebert, Charlotte; Rahimoff, René; Kosmatchev, Olesea; Steinbacher, Jessica; Bohne, Alexandra-Viola; Spada, Fabio; Michalakis, Stylianos; Nickelsen, Jörg; Müller, Markus; Carell, Thomas

    2017-09-04

    Until recently, it was believed that the genomes of higher organisms contain, in addition to the four canonical DNA bases, only 5-methyl-dC (m 5 dC) as a modified base to control epigenetic processes. In recent years, this view has changed dramatically with the discovery of 5-hydroxymethyl-dC (hmdC), 5-formyl-dC (fdC), and 5-carboxy-dC (cadC) in DNA from stem cells and brain tissue. N 6 -methyldeoxyadenosine (m 6 dA) is the most recent base reported to be present in the genome of various eukaryotic organisms. This base, together with N 4 -methyldeoxycytidine (m 4 dC), was first reported to be a component of bacterial genomes. In this work, we investigated the levels and distribution of these potentially epigenetically relevant DNA bases by using a novel ultrasensitive UHPLC-MS method. We further report quantitative data for m 5 dC, hmdC, fdC, and cadC, but we were unable to detect either m 4 dC or m 6 dA in DNA isolated from mouse embryonic stem cells or brain and liver tissue, which calls into question their epigenetic relevance. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    Science.gov (United States)

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  6. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W.

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  7. CardioTF, a database of deconstructing transcriptional circuits in the heart system.

    Science.gov (United States)

    Zhen, Yisong

    2016-01-01

    Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.

  8. Relational databases

    CERN Document Server

    Bell, D A

    1986-01-01

    Relational Databases explores the major advances in relational databases and provides a balanced analysis of the state of the art in relational databases. Topics covered include capture and analysis of data placement requirements; distributed relational database systems; data dependency manipulation in database schemata; and relational database support for computer graphics and computer aided design. This book is divided into three sections and begins with an overview of the theory and practice of distributed systems, using the example of INGRES from Relational Technology as illustration. The

  9. Biofuel Database

    Science.gov (United States)

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  10. Community Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such...

  11. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  12. Database Administrator

    Science.gov (United States)

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  13. Mouse models of Fanconi anemia

    International Nuclear Information System (INIS)

    Parmar, Kalindi; D'Andrea, Alan; Niedernhofer, Laura J.

    2009-01-01

    Fanconi anemia is a rare inherited disease characterized by congenital anomalies, growth retardation, aplastic anemia and an increased risk of acute myeloid leukemia and squamous cell carcinomas. The disease is caused by mutation in genes encoding proteins required for the Fanconi anemia pathway, a response mechanism to replicative stress, including that caused by genotoxins that cause DNA interstrand crosslinks. Defects in the Fanconi anemia pathway lead to genomic instability and apoptosis of proliferating cells. To date, 13 complementation groups of Fanconi anemia were identified. Five of these genes have been deleted or mutated in the mouse, as well as a sixth key regulatory gene, to create mouse models of Fanconi anemia. This review summarizes the phenotype of each of the Fanconi anemia mouse models and highlights how genetic and interventional studies using the strains have yielded novel insight into therapeutic strategies for Fanconi anemia and into how the Fanconi anemia pathway protects against genomic instability.

  14. Mouse models of Fanconi anemia

    Energy Technology Data Exchange (ETDEWEB)

    Parmar, Kalindi; D' Andrea, Alan [Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, 44 Binney Street, Boston, MA 02115 (United States); Niedernhofer, Laura J., E-mail: niedernhoferl@upmc.edu [Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine and Cancer Institute, 5117 Centre Avenue, Hillman Cancer Center, Research Pavilion 2.6, Pittsburgh, PA 15213-1863 (United States)

    2009-07-31

    Fanconi anemia is a rare inherited disease characterized by congenital anomalies, growth retardation, aplastic anemia and an increased risk of acute myeloid leukemia and squamous cell carcinomas. The disease is caused by mutation in genes encoding proteins required for the Fanconi anemia pathway, a response mechanism to replicative stress, including that caused by genotoxins that cause DNA interstrand crosslinks. Defects in the Fanconi anemia pathway lead to genomic instability and apoptosis of proliferating cells. To date, 13 complementation groups of Fanconi anemia were identified. Five of these genes have been deleted or mutated in the mouse, as well as a sixth key regulatory gene, to create mouse models of Fanconi anemia. This review summarizes the phenotype of each of the Fanconi anemia mouse models and highlights how genetic and interventional studies using the strains have yielded novel insight into therapeutic strategies for Fanconi anemia and into how the Fanconi anemia pathway protects against genomic instability.

  15. Genome update: the 1000th genome - a cautionary tale

    DEFF Research Database (Denmark)

    Lagesen, Karin; Ussery, David; Wassenaar, Gertrude Maria

    2010-01-01

    conclusions for example about the largest bacterial genome sequenced. Biological diversity is far greater than many have thought. For example, analysis of multiple Escherichia coli genomes has led to an estimate of around 45 000 gene families more genes than are recognized in the human genome. Moreover......There are now more than 1000 sequenced prokaryotic genomes deposited in public databases and available for analysis. Currently, although the sequence databases GenBank, DNA Database of Japan and EMBL are synchronized continually, there are slight differences in content at the genomes level...... for a variety of logistical reasons, including differences in format and loading errors, such as those caused by file transfer protocol interruptions. This means that the 1000th genome will be different in the various databases. Some of the data on the highly accessed web pages are inaccurate, leading to false...

  16. Federal databases

    International Nuclear Information System (INIS)

    Welch, M.J.; Welles, B.W.

    1988-01-01

    Accident statistics on all modes of transportation are available as risk assessment analytical tools through several federal agencies. This paper reports on the examination of the accident databases by personal contact with the federal staff responsible for administration of the database programs. This activity, sponsored by the Department of Energy through Sandia National Laboratories, is an overview of the national accident data on highway, rail, air, and marine shipping. For each mode, the definition or reporting requirements of an accident are determined and the method of entering the accident data into the database is established. Availability of the database to others, ease of access, costs, and who to contact were prime questions to each of the database program managers. Additionally, how the agency uses the accident data was of major interest

  17. Identification of genomic biomarkers for concurrent diagnosis of drug-induced renal tubular injury using a large-scale toxicogenomics database

    International Nuclear Information System (INIS)

    Kondo, Chiaki; Minowa, Yohsuke; Uehara, Takeki; Okuno, Yasushi; Nakatsu, Noriyuki; Ono, Atsushi; Maruyama, Toshiyuki; Kato, Ikuo; Yamate, Jyoji; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro

    2009-01-01

    Drug-induced renal tubular injury is one of the major concerns in preclinical safety evaluations. Toxicogenomics is becoming a generally accepted approach for identifying chemicals with potential safety problems. In the present study, we analyzed 33 nephrotoxicants and 8 non-nephrotoxic hepatotoxicants to elucidate time- and dose-dependent global gene expression changes associated with proximal tubular toxicity. The compounds were administered orally or intravenously once daily to male Sprague-Dawley rats. The animals were exposed to four different doses of the compounds, and kidney tissues were collected on days 4, 8, 15, and 29. Gene expression profiles were generated from kidney RNA by using Affymetrix GeneChips and analyzed in conjunction with the histopathological changes. We used the filter-type gene selection algorithm based on t-statistics conjugated with the SVM classifier, and achieved a sensitivity of 90% with a selectivity of 90%. Then, 92 genes were extracted as the genomic biomarker candidates that were used to construct the classifier. The gene list contains well-known biomarkers, such as Kidney injury molecule 1, Ceruloplasmin, Clusterin, Tissue inhibitor of metallopeptidase 1, and also novel biomarker candidates. Most of the genes involved in tissue remodeling, the immune/inflammatory response, cell adhesion/proliferation/migration, and metabolism were predominantly up-regulated. Down-regulated genes participated in cell adhesion/proliferation/migration, membrane transport, and signal transduction. Our classifier has better prediction accuracy than any of the well-known biomarkers. Therefore, the toxicogenomics approach would be useful for concurrent diagnosis of renal tubular injury.

  18. Genetic analysis of radiation-induced mouse thymic lymphomas

    International Nuclear Information System (INIS)

    Kominami, R.; Wakabayashi, Y.; Niwa, O.

    2003-01-01

    Mouse thymic lymphomas are one of the classic models of radiation-induced malignancies, and the model has been used for the study of genes involved in carcinogenesis. ras oncogenes are the first isolate which undergoes mutations in 10 to 30 % of lymphomas, and p16INK4a and p19ARF in the INK4a-ARF locus are also frequently inactivated. In our previous study, the inactivation of Ikaros, a key regurator of lymphoid system, was found in those lymphomas, and it was suggested that there are other responsible genes yet to be discovered. On the other hand, genetic predisposition to radiation-induced lymphoma often differs in different strains, and this reflects the presence of low penetrance genes that can modify the impact of a given mutation. Little study of such modifiers or susceptibility genes has been performed, either. Recent availability of databases on mouse genome information and the power of mouse genetic system underline usefulness of the lymphoma model in search for novel genes involved, which may provide clues to molecular mechanisms of development of the radiogenic lymphoma and also genes involved in human lymphomas and other malignancies. Accordingly, we have carried out positional cloning for the two different types of tumor-related genes. In this symposium, our current progress is presented that includes genetic mapping of susceptibility/ resistance loci on mouse chromosomes 4, 5 and 19, and also functional analysis of a novel tumor suppressor gene, Rit1/Bcl11b, that has been isolated from allelic loss (LOH) mapping and sequence analysis for γ -ray induced mouse thymic lymphomas

  19. The Degradome database: mammalian proteases and diseases of proteolysis.

    Science.gov (United States)

    Quesada, Víctor; Ordóñez, Gonzalo R; Sánchez, Luis M; Puente, Xose S; López-Otín, Carlos

    2009-01-01

    The degradome is defined as the complete set of proteases present in an organism. The recent availability of whole genomic sequences from multiple organisms has led us to predict the contents of the degradomes of several mammalian species. To ensure the fidelity of these predictions, our methods have included manual curation of individual sequences and, when necessary, direct cloning and sequencing experiments. The results of these studies in human, chimpanzee, mouse and rat have been incorporated into the Degradome database, which can be accessed through a web interface at http://degradome.uniovi.es. The annotations about each individual protease can be retrieved by browsing catalytic classes and families or by searching specific terms. This web site also provides detailed information about genetic diseases of proteolysis, a growing field of great importance for multiple users. Finally, the user can find additional information about protease structures, protease inhibitors, ancillary domains of proteases and differences between mammalian degradomes.

  20. Mouse adhalin

    DEFF Research Database (Denmark)

    Liu, L; Vachon, P H; Kuang, W

    1997-01-01

    . To analyze the biological roles of adhalin, we cloned the mouse adhalin cDNA, raised peptide-specific antibodies to its cytoplasmic domain, and examined its expression and localization in vivo and in vitro. The mouse adhalin sequence was 80% identical to that of human, rabbit, and hamster. Adhalin...... was specifically expressed in striated muscle cells and their immediate precursors, and absent in many other cell types. Adhalin expression in embryonic mouse muscle was coincident with primary myogenesis. Its expression was found to be up-regulated at mRNA and protein levels during myogenic differentiation...

  1. RatMap—rat genome tools and data

    Science.gov (United States)

    Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M.; Ståhl, Fredrik

    2005-01-01

    The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB–Genetics at Göteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided. PMID:15608244

  2. Evolutionary Genomics of Life in (and from) the Sea

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Dehal, Paramvir; Fuerstenberg, Susan I.

    2006-01-09

    High throughput genome sequencing centers that were originally built for the Human Genome Project (Lander et al., 2001; Venter et al., 2001) have now become an engine for comparative genomics. The six largest centers alone are now producing over 150 billion nucleotides per year, more than 50 times the amount of DNA in the human genome, and nearly all of this is directed at projects that promise great insights into the pattern and processes of evolution. Unfortunately, this data is being produced at a pace far exceeding the capacity of the scientific community to provide insightful analysis, and few scientists with training and experience in evolutionary biology have played prominent roles to date. One of the consequences is that poor quality analyses are typical; for example, orthology among genes is generally determined by simple measures of sequence similarity, when this has been discredited by molecular evolutionary biologists decades ago. Here we discuss the how genomes are chosen for sequencing and how the scientific community can have input. We describe the PhIGs database and web tools (Dehal and Boore 2005a; http://PhIGs.org), which provide phylogenetic analysis of all gene families for all completely sequenced genomes and the associated 'Synteny Viewer', which allows comparisons of the relative positions of orthologous genes. This is the best tool available for inferring gene function across multiple genomes. We also describe how we have used the PhIGs methods with the whole genome sequences of a tunicate, fish, mouse, and human to conclusively demonstrate that two rounds of whole genome duplication occurred at the base of vertebrates (Dehal and Boore 2005b). This evidence is found in the large scale structure of the positions of paralogous genes that arose from duplications inferred by evolutionary analysis to have occurred at the base of vertebrates.

  3. Database Replication

    CERN Document Server

    Kemme, Bettina

    2010-01-01

    Database replication is widely used for fault-tolerance, scalability and performance. The failure of one database replica does not stop the system from working as available replicas can take over the tasks of the failed replica. Scalability can be achieved by distributing the load across all replicas, and adding new replicas should the load increase. Finally, database replication can provide fast local access, even if clients are geographically distributed clients, if data copies are located close to clients. Despite its advantages, replication is not a straightforward technique to apply, and

  4. Optimizing the Targeting of Mouse Parvovirus 1 to Murine Melanoma Selects for Recombinant Genomes and Novel Mutations in the Viral Capsid Gene

    Directory of Open Access Journals (Sweden)

    Matthew Marr

    2018-01-01

    Full Text Available Combining virus-enhanced immunogenicity with direct delivery of immunomodulatory molecules would represent a novel treatment modality for melanoma, and would require development of new viral vectors capable of targeting melanoma cells preferentially. Here we explore the use of rodent protoparvoviruses targeting cells of the murine melanoma model B16F10. An uncloned stock of mouse parvovirus 1 (MPV1 showed some efficacy, which was substantially enhanced following serial passage in the target cell. Molecular cloning of the genes of both starter and selected virus pools revealed considerable sequence diversity. Chimera analysis mapped the majority of the improved infectivity to the product of the major coat protein gene, VP2, in which linked blocks of amino acid changes and one or other of two apparently spontaneous mutations were selected. Intragenic chimeras showed that these represented separable components, both contributing to enhanced infection. Comparison of biochemical parameters of infection by clonal viruses indicated that the enhancement due to changes in VP2 operates after the virus has bound to the cell surface and penetrated into the cell. Construction of an in silico homology model for MPV1 allowed placement of these changes within the capsid shell, and revealed aspects of the capsid involved in infection initiation that had not been previously recognized.

  5. The Sycp1 loci of the mouse genome: successive retropositions of a meiotic gene during the recent evolution of the genus.

    Science.gov (United States)

    Sage, J; Yuan, L; Martin, L; Mattei, M G; Guénet, J L; Liu, J G; Hoög, C; Rassoulzadegan, M; Cuzin, F

    1997-08-15

    The murine Sycp1 gene is expressed at the early stages of meiosis. We show that it is composed of a number of small exons and localized on mouse chromosome 3. In the laboratory strains, two retrogenes were also identified. The first one (Sycp1-ps1), on chromosome 7, has accumulated point mutations and deletions and is not transcribed. A second retrogene (Sycp1-ps2), on chromosome 8, is inserted within the continuity of a moderately repeated element, in an intron of another gene (Cad11). The two retroposition events can be dated to distinct periods in the evolution of the Muridae. Sycp1-ps2 has kept features indicative of a relatively recent origin, namely a nearly intact coding region, a poly(A) tail, and 14-bp terminal repeats. Its recent origin was confirmed by the fact that it is found in all the laboratory strains of mice, but neither in a recent isolate from Mus musculus domesticus wild stocks nor in the closely related subspecies M. musculus musculus, M. m. molossinus, M. m. castaneus, and M. m. bactrianus. Appearance of the more ancient Sycp1-ps1 retrogene is concomitant with the radiation of the genus. It is present in various Mus species (M. spretus, M. spicilegus, M. macedonicus, and M. cookii), but neither in the rat nor in the more closely related Pyromis genus. Transposition of retrotranscripts during meiosis and their hereditary establishment thus appear to occur relatively frequently. They may, therefore, play a significant role in the evolutionary process.

  6. Genome Editing a Mouse Locus Encoding a Variant Histone, H3.3B, to Report on its Expression in Live Animals

    Science.gov (United States)

    Wen, Duancheng; Noh, Kyung-Min; Goldberg, Aaron D.; Allis, C. David; Rosenwaks, Zev; Rafii, Shahin; Banaszynski, Laura A.

    2018-01-01

    Summary Chromatin remodeling via incorporation of histone variants plays a key role in the regulation of embryonic development. The histone variant H3.3 has been associated with a number of early events including formation of the paternal pronucleus upon fertilization. The small number of amino acid differences between H3.3 and its canonical counterparts (H3.1 and H3.2) has limited studies of the developmental significance of H3.3 deposition into chromatin due to difficulties in distinguishing the H3 isoforms. To this end, we used zinc-finger nuclease (ZFN) mediated gene editing to introduce a small C-terminal hemagglutinin (HA) tag to the endogenous H3.3B locus in mouse embryonic stem cells (ESCs), along with an internal ribosome entry site (IRES) and a separately translated fluorescent reporter of expression. This system will allow detection of expression driven by the reporter in cells, animals, and embryos, and will facilitate investigation of differential roles of paternal and maternal H3.3 protein during embryogenesis that would not be possible using variant-specific antibodies. Further, the ability to monitor endogenous H3.3 protein in various cell lineages will enhance our understanding of the dynamics of this histone variant over the course of development. genesis PMID:25262655

  7. Refactoring databases evolutionary database design

    CERN Document Server

    Ambler, Scott W

    2006-01-01

    Refactoring has proven its value in a wide range of development projects–helping software professionals improve system designs, maintainability, extensibility, and performance. Now, for the first time, leading agile methodologist Scott Ambler and renowned consultant Pramodkumar Sadalage introduce powerful refactoring techniques specifically designed for database systems. Ambler and Sadalage demonstrate how small changes to table structures, data, stored procedures, and triggers can significantly enhance virtually any database design–without changing semantics. You’ll learn how to evolve database schemas in step with source code–and become far more effective in projects relying on iterative, agile methodologies. This comprehensive guide and reference helps you overcome the practical obstacles to refactoring real-world databases by covering every fundamental concept underlying database refactoring. Using start-to-finish examples, the authors walk you through refactoring simple standalone databas...

  8. RDD Databases

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This database was established to oversee documents issued in support of fishery research activities including experimental fishing permits (EFP), letters of...

  9. Snowstorm Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Snowstorm Database is a collection of over 500 snowstorms dating back to 1900 and updated operationally. Only storms having large areas of heavy snowfall (10-20...

  10. Dealer Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The dealer reporting databases contain the primary data reported by federally permitted seafood dealers in the northeast. Electronic reporting was implemented May 1,...

  11. Clinical value of miR-452-5p expression in lung adenocarcinoma: A retrospective quantitative real-time polymerase chain reaction study and verification based on The Cancer Genome Atlas and Gene Expression Omnibus databases.

    Science.gov (United States)

    Gan, Xiao-Ning; Luo, Jie; Tang, Rui-Xue; Wang, Han-Lin; Zhou, Hong; Qin, Hui; Gan, Ting-Qing; Chen, Gang

    2017-05-01

    The role and mechanism of miR-452-5p in lung adenocarcinoma remain unclear. In this study, we performed a systematic study to investigate the clinical value of miR-452-5p expression in lung adenocarcinoma. The expression of miR-452-5p in 101 lung adenocarcinoma patients was detected by quantitative real-time polymerase chain reaction. The Cancer Genome Atlas and Gene Expression Omnibus databases were joined to verify the expression level of miR-452-5p in lung adenocarcinoma. Via several online prediction databases and bioinformatics software, pathway and network analyses of miR-452-5p target genes were performed to explore its prospective molecular mechanism. The expression of miR-452-5p in lung adenocarcinoma in house was significantly lower than that in adjacent tissues (p < 0.001). Additionally, the expression level of miR-452-5p was negatively correlated with several clinicopathological parameters including the tumor size (p = 0.014), lymph node metastasis (p = 0.032), and tumor-node-metastasis stage (p = 0.036). Data from The Cancer Genome Atlas also confirmed the low expression of miR-452 in lung adenocarcinoma (p < 0.001). Furthermore, reduced expression of miR-452-5p in lung adenocarcinoma (standard mean deviations = -0.393, 95% confidence interval: -0.774 to -0.011, p = 0.044) was validated by a meta-analysis. Five hub genes targeted by miR-452-5p, including SMAD family member 4, SMAD family member 2, cyclin-dependent kinase inhibitor 1B, tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon, and tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein beta, were significantly enriched in the cell-cycle pathway. In conclusion, low expression of miR-452-5p tends to play an essential role in lung adenocarcinoma. Bioinformatics analysis might be beneficial to reveal the potential mechanism of miR-452-5p in lung adenocarcinoma.

  12. National database

    DEFF Research Database (Denmark)

    Kristensen, Helen Grundtvig; Stjernø, Henrik

    1995-01-01

    Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen.......Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen....

  13. Sex-specific mouse liver gene expression: genome-wide analysis of developmental changes from pre-pubertal period to young adulthood

    Directory of Open Access Journals (Sweden)

    Conforto Tara L

    2012-04-01

    Full Text Available Abstract Background Early liver development and the transcriptional transitions during hepatogenesis are well characterized. However, gene expression changes during the late postnatal/pre-pubertal to young adulthood period are less well understood, especially with regards to sex-specific gene expression. Methods Microarray analysis of male and female mouse liver was carried out at 3, 4, and 8 wk of age to elucidate developmental changes in gene expression from the late postnatal/pre-pubertal period to young adulthood. Results A large number of sex-biased and sex-independent genes showed significant changes during this developmental period. Notably, sex-independent genes involved in cell cycle, chromosome condensation, and DNA replication were down regulated from 3 wk to 8 wk, while genes associated with metal ion binding, ion transport and kinase activity were up regulated. A majority of genes showing sex differential expression in adult liver did not display sex differences prior to puberty, at which time extensive changes in sex-specific gene expression were seen, primarily in males. Thus, in male liver, 76% of male-specific genes were up regulated and 47% of female-specific genes were down regulated from 3 to 8 wk of age, whereas in female liver 67% of sex-specific genes showed no significant change in expression. In both sexes, genes up regulated from 3 to 8 wk were significantly enriched (p p Ihh; female-specific Cdx4, Cux2, Tox, and Trim24 and may contribute to the developmental changes that lead to global acquisition of liver sex-specificity by 8 wk of age. Conclusions Overall, the observed changes in gene expression during postnatal liver development reflect the deceleration of liver growth and the induction of specialized liver functions, with widespread changes in sex-specific gene expression primarily occurring in male liver.

  14. Full Data of Yeast Interacting Proteins Database (Original Version) - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Yeast Interacting Proteins Database Full Data of Yeast Interacting Proteins Database (Origin...al Version) Data detail Data name Full Data of Yeast Interacting Proteins Database (Original Version) DOI 10....18908/lsdba.nbdc00742-004 Description of data contents The entire data in the Yeast Interacting Proteins Database...eir interactions are required. Several sources including YPD (Yeast Proteome Database, Costanzo, M. C., Hoga...ematic name in the SGD (Saccharomyces Genome Database; http://www.yeastgenome.org /). Bait gene name The gen

  15. dBBQs: dataBase of Bacterial Quality scores

    OpenAIRE

    Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David

    2017-01-01

    Background: It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from al...

  16. Experiment Databases

    Science.gov (United States)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  17. Visualization for genomics: the Microbial Genome Viewer.

    Science.gov (United States)

    Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J

    2004-07-22

    A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV

  18. DistiLD Database

    DEFF Research Database (Denmark)

    Palleja, Albert; Horn, Heiko; Eliasson, Sabrina

    2012-01-01

    Genome-wide association studies (GWAS) have identified thousands of single nucleotide polymorphisms (SNPs) associated with the risk of hundreds of diseases. However, there is currently no database that enables non-specialists to answer the following simple questions: which SNPs associated...... with diseases are in linkage disequilibrium (LD) with a gene of interest? Which chromosomal regions have been associated with a given disease, and which are the potentially causal genes in each region? To answer these questions, we use data from the HapMap Project to partition each chromosome into so-called LD...... blocks, so that SNPs in LD with each other are preferentially in the same block, whereas SNPs not in LD are in different blocks. By projecting SNPs and genes onto LD blocks, the DistiLD database aims to increase usage of existing GWAS results by making it easy to query and visualize disease...

  19. User Guidelines for the Brassica Database: BRAD.

    Science.gov (United States)

    Wang, Xiaobo; Cheng, Feng; Wang, Xiaowu

    2016-01-01

    The genome sequence of Brassica rapa was first released in 2011. Since then, further Brassica genomes have been sequenced or are undergoing sequencing. It is therefore necessary to develop tools that help users to mine information from genomic data efficiently. This will greatly aid scientific exploration and breeding application, especially for those with low levels of bioinformatic training. Therefore, the Brassica database (BRAD) was built to collect, integrate, illustrate, and visualize Brassica genomic datasets. BRAD provides useful searching and data mining tools, and facilitates the search of gene annotation datasets, syntenic or non-syntenic orthologs, and flanking regions of functional genomic elements. It also includes genome-analysis tools such as BLAST and GBrowse. One of the important aims of BRAD is to build a bridge between Brassica crop genomes with the genome of the model species Arabidopsis thaliana, thus transferring the bulk of A. thaliana gene study information for use with newly sequenced Brassica crops.

  20. Profiling of Escherichia coli Chromosome database.

    Science.gov (United States)

    Yamazaki, Yukiko; Niki, Hironori; Kato, Jun-ichi

    2008-01-01

    The Profiling of Escherichia coli Chromosome (PEC) database (http://www.shigen.nig.ac.jp/ecoli/pec/) is designed to allow E. coli researchers to efficiently access information from functional genomics studies. The database contains two principal types of data: gene essentiality and a large collection of E. coli genetic research resources. The essentiality data are based on data compilation from published single-gene essentiality studies and on cell growth studies of large-deletion mutants. Using the circular and linear viewers for both whole genomes and the minimal genome, users can not only gain an overview of the genome structure but also retrieve information on contigs, gene products, mutants, deletions, and so forth. In particular, genome-wide exhaustive mutants are an essential resource for studying E. coli gene functions. Although the genomic database was constructed independently from the genetic resources database, users may seamlessly access both types of data. In addition to these data, the PEC database also provides a summary of homologous genes of other bacterial genomes and of protein structure information, with a comprehensive interface. The PEC is thus a convenient and useful platform for contemporary E. coli researchers.

  1. Genome-wide retroviral insertional tagging of genes involved in cancer in Cdkn2a-deficient mice

    DEFF Research Database (Denmark)

    Lund, Anders H; Turner, Geoffrey; Trubetskoy, Alla

    2002-01-01

    We have used large-scale insertional mutagenesis to identify functional landmarks relevant to cancer in the recently completed mouse genome sequence. We infected Cdkn2a(-/-) mice with Moloney murine leukemia virus (MoMuLV) to screen for loci that can participate in tumorigenesis in collaboration...... retroviral integration sites and mapped them against the mouse genome sequence databases from Celera and Ensembl. In addition to 17 insertions targeting gene loci known to be cancer-related, we identified a total of 37 new common insertion sites (CISs), of which 8 encode components of signaling pathways...... that are involved in cancer. The effectiveness of large-scale insertional mutagenesis in a sensitized genetic background is demonstrated by the preference for activation of MAP kinase signaling, collaborating with Cdkn2a loss in generating the lymphoid and myeloid tumors. Collectively, our results show that large...

  2. Genomic research in Eucalyptus.

    Science.gov (United States)

    Poke, Fiona S; Vaillancourt, René E; Potts, Brad M; Reid, James B

    2005-09-01

    Eucalyptus L'Hérit. is a genus comprised of more than 700 species that is of vital importance ecologically to Australia and to the forestry industry world-wide, being grown in plantations for the production of solid wood products as well as pulp for paper. With the sequencing of the genomes of Arabidopsis thaliana and Oryza sativa and the recent completion of the first tree genome sequence, Populus trichocarpa, attention has turned to the current status of genomic research in Eucalyptus. For several eucalypt species, large segregating families have been established, high-resolution genetic maps constructed and large EST databases generated. Collaborative efforts have been initiated for the integration of diverse genomic projects and will provide the framework for future research including exploiting the sequence of the entire eucalypt genome which is currently being sequenced. This review summarises the current position of genomic research in Eucalyptus and discusses the direction of future research.

  3. Partnering for functional genomics research conference: Abstracts of poster presentations

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1998-06-01

    This reports contains abstracts of poster presentations presented at the Functional Genomics Research Conference held April 16--17, 1998 in Oak Ridge, Tennessee. Attention is focused on the following areas: mouse mutagenesis and genomics; phenotype screening; gene expression analysis; DNA analysis technology development; bioinformatics; comparative analyses of mouse, human, and yeast sequences; and pilot projects to evaluate methodologies.

  4. Using the genome aggregation database, computational pathogenicity prediction tools, and patch clamp heterologous expression studies to demote previously published long QT syndrome type 1 mutations from pathogenic to benign.

    Science.gov (United States)

    Clemens, Daniel J; Lentino, Anne R; Kapplinger, Jamie D; Ye, Dan; Zhou, Wei; Tester, David J; Ackerman, Michael J

    2018-04-01

    Mutations in the KCNQ1-encoded Kv7.1 potassium channel cause long QT syndrome (LQTS) type 1 (LQT1). It has been suggested that ∼10%-20% of rare LQTS case-derived variants in the literature may have been published erroneously as LQT1-causative mutations and may be "false positives." The purpose of this study was to determine which previously published KCNQ1 case variants are likely false positives. A list of all published, case-derived KCNQ1 missense variants (MVs) was compiled. The occurrence of each MV within the Genome Aggregation Database (gnomAD) was assessed. Eight in silico tools were used to predict each variant's pathogenicity. Case-derived variants that were either (1) too frequently found in gnomAD or (2) absent in gnomAD but predicted to be pathogenic by ≤2 tools were considered potential false positives. Three of these variants were characterized functionally using whole-cell patch clamp technique. Overall, there were 244 KCNQ1 case-derived MVs. Of these, 29 (12%) were seen in ≥10 individuals in gnomAD and are demotable. However, 157 of 244 MVs (64%) were absent in gnomAD. Of these, 7 (4%) were predicted to be pathogenic by ≤2 tools, 3 of which we characterized functionally. There was no significant difference in current density between heterozygous KCNQ1-F127L, -P477L, or -L619M variant-containing channels compared to KCNQ1-WT. This study offers preliminary evidence for the demotion of 32 (13%) previously published LQT1 MVs. Of these, 29 were demoted because of their frequent sighting in gnomAD. Additionally, in silico analysis and in vitro functional studies have facilitated the demotion of 3 ultra-rare MVs (F127L, P477L, L619M). Copyright © 2017 Heart Rhythm Society. Published by Elsevier Inc. All rights reserved.

  5. A genomic perspective on protein tyrosine phosphatases: gene structure, pseudogenes, and genetic disease linkage

    DEFF Research Database (Denmark)

    Andersen, Jannik N; Jansen, Peter G; Echwald, Søren M

    2004-01-01

    sequence databases, we discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts. Direct orthologs were present in the mouse genome for all 38 human PTP genes. In addition, we identified 12 PTP pseudogenes unique to humans...... that have probably contaminated previous bioinformatics analysis of this gene family. PCR amplification and transcript sequencing indicate that some PTP pseudogenes are expressed, but their function (if any) is unknown. Furthermore, we analyzed the enhanced diversity generated by alternative splicing...

  6. The GLIMS Glacier Database

    Science.gov (United States)

    Raup, B. H.; Khalsa, S. S.; Armstrong, R.

    2007-12-01

    The Global Land Ice Measurements from Space (GLIMS) project has built a geospatial and temporal database of glacier data, composed of glacier outlines and various scalar attributes. These data are being derived primarily from satellite imagery, such as from ASTER and Landsat. Each "snapshot" of a glacier is from a specific time, and the database is designed to store multiple snapshots representative of different times. We have implemented two web-based interfaces to the database; one enables exploration of the data via interactive maps (web map server), while the other allows searches based on text-field constraints. The web map server is an Open Geospatial Consortium (OGC) compliant Web Map Server (WMS) and Web Feature Server (WFS). This means that other web sites can display glacier layers from our site over the Internet, or retrieve glacier features in vector format. All components of the system are implemented using Open Source software: Linux, PostgreSQL, PostGIS (geospatial extensions to the database), MapServer (WMS and WFS), and several supporting components such as Proj.4 (a geographic projection library) and PHP. These tools are robust and provide a flexible and powerful framework for web mapping applications. As a service to the GLIMS community, the database contains metadata on all ASTER imagery acquired over glacierized terrain. Reduced-resolution of the images (browse imagery) can be viewed either as a layer in the MapServer application, or overlaid on the virtual globe within Google Earth. The interactive map application allows the user to constrain by time what data appear on the map. For example, ASTER or glacier outlines from 2002 only, or from Autumn in any year, can be displayed. The system allows users to download their selected glacier data in a choice of formats. The results of a query based on spatial selection (using a mouse) or text-field constraints can be downloaded in any of these formats: ESRI shapefiles, KML (Google Earth), Map

  7. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  8. Quantitative trait loci affecting phenotypic variation in the vacuolated lens mouse mutant, a multigenic mouse model of neural tube defects

    NARCIS (Netherlands)

    Korstanje, Ron; Desai, Jigar; Lazar, Gloria; King, Benjamin; Rollins, Jarod; Spurr, Melissa; Joseph, Jamie; Kadambi, Sindhuja; Li, Yang; Cherry, Allison; Matteson, Paul G.; Paigen, Beverly; Millonig, James H.

    Korstanje R, Desai J, Lazar G, King B, Rollins J, Spurr M, Joseph J, Kadambi S, Li Y, Cherry A, Matteson PG, Paigen B, Millonig JH. Quantitative trait loci affecting phenotypic variation in the vacuolated lens mouse mutant, a multigenic mouse model of neural tube defects. Physiol Genomics 35:

  9. Stackfile Database

    Science.gov (United States)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  10. Genomic analysis of mouse retinal development.

    Directory of Open Access Journals (Sweden)

    Seth Blackshaw

    2004-09-01

    Full Text Available The vertebrate retina is comprised of seven major cell types that are generated in overlapping but well-defined intervals. To identify genes that might regulate retinal development, gene expression in the developing retina was profiled at multiple time points using serial analysis of gene expression (SAGE. The expression patterns of 1,051 genes that showed developmentally dynamic expression by SAGE were investigated using in situ hybridization. A molecular atlas of gene expression in the developing and mature retina was thereby constructed, along with a taxonomic classification of developmental gene expression patterns. Genes were identified that label both temporal and spatial subsets of mitotic progenitor cells. For each developing and mature major retinal cell type, genes selectively expressed in that cell type were identified. The gene expression profiles of retinal Müller glia and mitotic progenitor cells were found to be highly similar, suggesting that Müller glia might serve to produce multiple retinal cell types under the right conditions. In addition, multiple transcripts that were evolutionarily conserved that did not appear to encode open reading frames of more than 100 amino acids in length ("noncoding RNAs" were found to be dynamically and specifically expressed in developing and mature retinal cell types. Finally, many photoreceptor-enriched genes that mapped to chromosomal intervals containing retinal disease genes were identified. These data serve as a starting point for functional investigations of the roles of these genes in retinal development and physiology.

  11. Extending Database Integration Technology

    National Research Council Canada - National Science Library

    Buneman, Peter

    1999-01-01

    Formal approaches to the semantics of databases and database languages can have immediate and practical consequences in extending database integration technologies to include a vastly greater range...

  12. Mapping data - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...tional Rice Genome Sequencing Project (IRGSP) Data file File name: kome_mapping_data.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...(Transcriptional Unit) About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Mapping data - KOME | LSDB Archive ...

  13. Comparative Genome Viewer

    International Nuclear Information System (INIS)

    Molineris, I.; Sales, G.

    2009-01-01

    The amount of information about genomes, both in the form of complete sequences and annotations, has been exponentially increasing in the last few years. As a result there is the need for tools providing a graphical representation of such information that should be comprehensive and intuitive. Visual representation is especially important in the comparative genomics field since it should provide a combined view of data belonging to different genomes. We believe that existing tools are limited in this respect as they focus on a single genome at a time (conservation histograms) or compress alignment representation to a single dimension. We have therefore developed a web-based tool called Comparative Genome Viewer (Cgv): it integrates a bidimensional representation of alignments between two regions, both at small and big scales, with the richness of annotations present in other genome browsers. We give access to our system through a web-based interface that provides the user with an interactive representation that can be updated in real time using the mouse to move from region to region and to zoom in on interesting details.

  14. Toward genome-enabled mycology.

    Science.gov (United States)

    Hibbett, David S; Stajich, Jason E; Spatafora, Joseph W

    2013-01-01

    Genome-enabled mycology is a rapidly expanding field that is characterized by the pervasive use of genome-scale data and associated computational tools in all aspects of fungal biology. Genome-enabled mycology is integrative and often requires teams of researchers with diverse skills in organismal mycology, bioinformatics and molecular biology. This issue of Mycologia presents the first complete fungal genomes in the history of the journal, reflecting the ongoing transformation of mycology into a genome-enabled science. Here, we consider the prospects for genome-enabled mycology and the technical and social challenges that will need to be overcome to grow the database of complete fungal genomes and enable all fungal biologists to make use of the new data.

  15. A catalog of the mouse gut metagenome

    DEFF Research Database (Denmark)

    Xiao, Liang; Feng, Qiang; Liang, Suisha

    2015-01-01

    laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human......We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing...... counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies....

  16. phiGENOME: an integrative navigation throughout bacteriophage genomes.

    Science.gov (United States)

    Stano, Matej; Klucar, Lubos

    2011-11-01

    phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright © 2011 Elsevier Inc. All rights reserved.

  17. dBBQs: dataBase of Bacterial Quality scores.

    Science.gov (United States)

    Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David

    2017-12-28

    It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy-to-use database. Prokaryotic genomic data from all sources were collected and combined to make a non-redundant set of bacterial genomes. The genome quality score for each was calculated by four different measurements: assembly quality, number of rRNA and tRNA genes, and the occurrence of conserved functional domains. The dataBase of Bacterial Quality scores (dBBQs) was designed to store and retrieve quality scores. It offers fast searching and download features which the result can be used for further analysis. In addition, the search results are shown in interactive JavaScript chart framework using DC.js. The analysis of quality scores across major public genome databases find that around 68% of the genomes are of acceptable quality for many uses. dBBQs (available at http://arc-gem.uams.edu/dbbqs ) provides genome quality scores for all available prokaryotic genome sequences with a user-friendly Web-interface. These scores can be used as cut-offs to get a high-quality set of genomes for testing bioinformatics tools or improving the analysis. Moreover, all data of the four measurements that were combined to make the quality score for each genome, which can potentially be used for further analysis. dBBQs will be updated regularly and is freely use for non-commercial purpose.

  18. Tissue- and Cell Type-Specific Expression of the Long Noncoding RNA Klhl14-AS in Mouse

    Directory of Open Access Journals (Sweden)

    Sara Carmela Credendino

    2017-01-01

    Full Text Available lncRNAs are acquiring increasing relevance as regulators in a wide spectrum of biological processes. The extreme heterogeneity in the mechanisms of action of these molecules, however, makes them very difficult to study, especially regarding their molecular function. A novel lncRNA has been recently identified as the most enriched transcript in mouse developing thyroid. Due to its genomic localization antisense to the protein-encoding Klhl14 gene, we named it Klhl14-AS. In this paper, we highlight that mouse Klhl14-AS produces at least five splicing variants, some of which have not been previously described. Klhl14-AS is expressed with a peculiar pattern, characterized by diverse relative abundance of its isoforms in different mouse tissues. We examine the whole expression level of Klhl14-AS in a panel of adult mouse tissues, showing that it is expressed in the thyroid, lung, kidney, testis, ovary, brain, and spleen, although at different levels. In situ hybridization analysis reveals that, in the context of each organ, Klhl14-AS shows a cell type-specific expression. Interestingly, databases report a similar expression profile for human Klhl14-AS. Our observations suggest that this lncRNA could play cell type-specific roles in several organs and pave the way for functional characterization of this gene in appropriate biological contexts.

  19. Phospholipase C δ-type consists of three isozymes: bovine PLCδ2 is a homologue of human/mouse PLCδ4

    International Nuclear Information System (INIS)

    Irino, Yasuhiro; Cho, Hiroyuki; Nakamura, Yoshikazu; Nakahara, Masamichi; Furutani, Masahiro; Suh, Pann-Ghill; Takenawa, Tadaomi; Fukami, Kiyoko

    2004-01-01

    To date, 12 phospholipase C (PLC) isozymes have been identified in mammals, and they are divided into five classes, β-, γ-, δ-, ε-, and ζ-type. PLCδ-type is reported to be composed of four isozymes, PLCδ1-δ4. Here we report that a screening for mouse PLCδ2 from a BAC library with primers that amplify a specific region of bovine PLCδ2 resulted in isolation of one clone containing the mouse PLCδ4 gene. Furthermore, a database search revealed that there is only one gene corresponding to PLCδ2 and PLCδ4 in the mouse and human genomes, indicating that bovine PLCδ2 is a homologue of human and mouse PLCδ4. However, PLCδ2 Western blot analysis with a widely used commercial anti-PLCδ2 antibody showed an expression pattern distinct from that of PLCδ4 in wild-type mice. In addition, an 80-kDa band, which was recognized by antibody against PLCδ2, was smaller than an 85-kDa band detected by anti-PLCδ4 antibody, and the 80-kDa band was detectable in lysates of brain, testis, and spleen from PLCδ4-deficient mice. We also found that immunoprecipitates from brain lysates with this PLCδ2 antibody contained no PLC activity. From these data, we conclude that bovine PLCδ2 is a homologue of human and mouse PLCδ4, and that three isozymes (δ1, δ3, and δ4) exist in the PLCδ family

  20. Home | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ple Search Original Site Database Center for Life Science Kousaku Okubo organ human The dictionary-type data...-SA Detail Taxonomy Icon Taxonomy Icon Download | Simple Search Original Site National Bioscience Database Center Kousaku Okubo...enter for Life Science Kousaku Okubo Dictionary 9 species (human, mouse, rat, zeb

  1. Detection of genomic rearrangements in cucumber using genomecmp software

    Science.gov (United States)

    Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.

  2. Database development and management

    CERN Document Server

    Chao, Lee

    2006-01-01

    Introduction to Database Systems Functions of a DatabaseDatabase Management SystemDatabase ComponentsDatabase Development ProcessConceptual Design and Data Modeling Introduction to Database Design Process Understanding Business ProcessEntity-Relationship Data Model Representing Business Process with Entity-RelationshipModelTable Structure and NormalizationIntroduction to TablesTable NormalizationTransforming Data Models to Relational Databases .DBMS Selection Transforming Data Models to Relational DatabasesEnforcing ConstraintsCreating Database for Business ProcessPhysical Design and Database

  3. Genome-derived vaccines.

    Science.gov (United States)

    De Groot, Anne S; Rappuoli, Rino

    2004-02-01

    Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.

  4. The Banana Genome Hub

    Science.gov (United States)

    Droc, Gaëtan; Larivière, Delphine; Guignon, Valentin; Yahiaoui, Nabila; This, Dominique; Garsmeur, Olivier; Dereeper, Alexis; Hamelin, Chantal; Argout, Xavier; Dufayard, Jean-François; Lengelle, Juliette; Baurens, Franc-Christophe; Cenci, Alberto; Pitollat, Bertrand; D’Hont, Angélique; Ruiz, Manuel; Rouard, Mathieu; Bocs, Stéphanie

    2013-01-01

    Banana is one of the world’s favorite fruits and one of the most important crops for developing countries. The banana reference genome sequence (Musa acuminata) was recently released. Given the taxonomic position of Musa, the completed genomic sequence has particular comparative value to provide fresh insights about the evolution of the monocotyledons. The study of the banana genome has been enhanced by a number of tools and resources that allows harnessing its sequence. First, we set up essential tools such as a Community Annotation System, phylogenomics resources and metabolic pathways. Then, to support post-genomic efforts, we improved banana existing systems (e.g. web front end, query builder), we integrated available Musa data into generic systems (e.g. markers and genetic maps, synteny blocks), we have made interoperable with the banana hub, other existing systems containing Musa data (e.g. transcriptomics, rice reference genome, workflow manager) and finally, we generated new results from sequence analyses (e.g. SNP and polymorphism analysis). Several uses cases illustrate how the Banana Genome Hub can be used to study gene families. Overall, with this collaborative effort, we discuss the importance of the interoperability toward data integration between existing information systems. Database URL: http://banana-genome.cirad.fr/ PMID:23707967

  5. Mathematics for Databases

    NARCIS (Netherlands)

    ir. Sander van Laar

    2007-01-01

    A formal description of a database consists of the description of the relations (tables) of the database together with the constraints that must hold on the database. Furthermore the contents of a database can be retrieved using queries. These constraints and queries for databases can very well be

  6. Databases and their application

    NARCIS (Netherlands)

    Grimm, E.C.; Bradshaw, R.H.W; Brewer, S.; Flantua, S.; Giesecke, T.; Lézine, A.M.; Takahara, H.; Williams, J.W.,Jr; Elias, S.A.; Mock, C.J.

    2013-01-01

    During the past 20 years, several pollen database cooperatives have been established. These databases are now constituent databases of the Neotoma Paleoecology Database, a public domain, multiproxy, relational database designed for Quaternary-Pliocene fossil data and modern surface samples. The

  7. DOT Online Database

    Science.gov (United States)

    Page Home Table of Contents Contents Search Database Search Login Login Databases Advisory Circulars accessed by clicking below: Full-Text WebSearch Databases Database Records Date Advisory Circulars 2092 5 data collection and distribution policies. Document Database Website provided by MicroSearch

  8. Centralized mouse repositories.

    Science.gov (United States)

    Donahue, Leah Rae; Hrabe de Angelis, Martin; Hagn, Michael; Franklin, Craig; Lloyd, K C Kent; Magnuson, Terry; McKerlie, Colin; Nakagata, Naomi; Obata, Yuichi; Read, Stuart; Wurst, Wolfgang; Hörlein, Andreas; Davisson, Muriel T

    2012-10-01

    Because the mouse is used so widely for biomedical research and the number of mouse models being generated is increasing rapidly, centralized repositories are essential if the valuable mouse strains and models that have been developed are to be securely preserved and fully exploited. Ensuring the ongoing availability of these mouse strains preserves the investment made in creating and characterizing them and creates a global resource of enormous value. The establishment of centralized mouse repositories around the world for distributing and archiving these resources has provided critical access to and preservation of these strains. This article describes the common and specialized activities provided by major mouse repositories around the world.

  9. Dietary Supplement Ingredient Database

    Science.gov (United States)

    ... and US Department of Agriculture Dietary Supplement Ingredient Database Toggle navigation Menu Home About DSID Mission Current ... values can be saved to build a small database or add to an existing database for national, ...

  10. Energy Consumption Database

    Science.gov (United States)

    Consumption Database The California Energy Commission has created this on-line database for informal reporting ) classifications. The database also provides easy downloading of energy consumption data into Microsoft Excel (XLSX

  11. HERVd: database of human endogenous retroviruses

    Czech Academy of Sciences Publication Activity Database

    Pačes, Jan; Pavlíček, Adam; Pačes, Václav

    2002-01-01

    Roč. 30, č. 1 (2002), s. 205-206 ISSN 0305-1048 R&D Projects: GA MŠk LN00A079; GA ČR GA301/99/M023 Keywords : HERV * database * human genome Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 7.051, year: 2002

  12. 1.15 - Structural Chemogenomics Databases to Navigate Protein–Ligand Interaction Space

    NARCIS (Netherlands)

    Kanev, G.K.; Kooistra, A.J.; de Esch, I.J.P.; de Graaf, C.

    2017-01-01

    Structural chemogenomics databases allow the integration and exploration of heterogeneous genomic, structural, chemical, and pharmacological data in order to extract useful information that is applicable for the discovery of new protein targets and biologically active molecules. Integrated databases

  13. Collecting Taxes Database

    Data.gov (United States)

    US Agency for International Development — The Collecting Taxes Database contains performance and structural indicators about national tax systems. The database contains quantitative revenue performance...

  14. USAID Anticorruption Projects Database

    Data.gov (United States)

    US Agency for International Development — The Anticorruption Projects Database (Database) includes information about USAID projects with anticorruption interventions implemented worldwide between 2007 and...

  15. NoSQL databases

    OpenAIRE

    Mrozek, Jakub

    2012-01-01

    This thesis deals with database systems referred to as NoSQL databases. In the second chapter, I explain basic terms and the theory of database systems. A short explanation is dedicated to database systems based on the relational data model and the SQL standardized query language. Chapter Three explains the concept and history of the NoSQL databases, and also presents database models, major features and the use of NoSQL databases in comparison with traditional database systems. In the fourth ...

  16. Alignment of whole genomes.

    Science.gov (United States)

    Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L

    1999-01-01

    A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427

  17. metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research.

    Science.gov (United States)

    Lyne, Mike; Smith, Richard N; Lyne, Rachel; Aleksic, Jelena; Hu, Fengyuan; Kalderimis, Alex; Stepan, Radek; Micklem, Gos

    2013-01-01

    Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first

  18. iSyTE 2.0: a database for expression-based gene discovery in the eye

    Science.gov (United States)

    Kakrana, Atul; Yang, Andrian; Anand, Deepti; Djordjevic, Djordje; Ramachandruni, Deepti; Singh, Abhyudai; Huang, Hongzhan

    2018-01-01

    Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. PMID:29036527

  19. Genomic research perspectives in Kazakhstan

    Directory of Open Access Journals (Sweden)

    Ainur Akilzhanova

    2014-01-01

    Full Text Available Introduction: Technological advancements rapidly propel the field of genome research. Advances in genetics and genomics such as the sequence of the human genome, the human haplotype map, open access databases, cheaper genotyping and chemical genomics, have transformed basic and translational biomedical research. Several projects in the field of genomic and personalized medicine have been conducted at the Center for Life Sciences in Nazarbayev University. The prioritized areas of research include: genomics of multifactorial diseases, cancer genomics, bioinformatics, genetics of infectious diseases and population genomics. At present, DNA-based risk assessment for common complex diseases, application of molecular signatures for cancer diagnosis and prognosis, genome-guided therapy, and dose selection of therapeutic drugs are the important issues in personalized medicine. Results: To further develop genomic and biomedical projects at Center for Life Sciences, the development of bioinformatics research and infrastructure and the establishment of new collaborations in the field are essential. Widespread use of genetic tools will allow the identification of diseases before the onset of clinical symptoms, the individualization of drug treatment, and could induce individual behavioral changes on the basis of calculated disease risk. However, many challenges remain for the successful translation of genomic knowledge and technologies into health advances, such as medicines and diagnostics. It is important to integrate research and education in the fields of genomics, personalized medicine, and bioinformatics, which will be possible with opening of the new Medical Faculty at Nazarbayev University. People in practice and training need to be educated about the key concepts of genomics and engaged so they can effectively apply their knowledge in a matter that will bring the era of genomic medicine to patient care. This requires the development of well

  20. Extreme genomes

    OpenAIRE

    DeLong, Edward F

    2000-01-01

    The complete genome sequence of Thermoplasma acidophilum, an acid- and heat-loving archaeon, has recently been reported. Comparative genomic analysis of this 'extremophile' is providing new insights into the metabolic machinery, ecology and evolution of thermophilic archaea.

  1. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  2. Cancer genomics

    DEFF Research Database (Denmark)

    Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner

    2007-01-01

    Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...

  3. Mouse Chromosome Engineering for Modeling Human Disease

    OpenAIRE

    van der Weyden, Louise; Bradley, Allan

    2006-01-01

    Chromosomal rearrangements occur frequently in humans and can be disease-associated or phenotypically neutral. Recent technological advances have led to the discovery of copy-number changes previously undetected by cytogenetic techniques. To understand the genetic consequences of such genomic changes, these mutations need to be modeled in experimentally tractable systems. The mouse is an excellent organism for this analysis because of its biological and genetic similarity to humans, and the e...

  4. The MAR databases: development and implementation of databases specific for marine metagenomics.

    Science.gov (United States)

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

    2018-01-04

    We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. PrimateLit Database

    Science.gov (United States)

    Primate Info Net Related Databases NCRR PrimateLit: A bibliographic database for primatology Top of any problems with this service. We welcome your feedback. The PrimateLit database is no longer being Resources, National Institutes of Health. The database is a collaborative project of the Wisconsin Primate

  6. Utilizing linkage disequilibrium information from Indian Genome ...

    Indian Academy of Sciences (India)

    Using LD information derived from Indian Genome Variation database (IGVdb) on populations .... Line diagram represents the SNPs selected in Indian (upper panel) and CEPH .... out procedure for extracting DNA from human nucleated cells.

  7. Database Description - RGP physicalmap | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available classification Plant databases - Rice Database classification Sequence Physical map Organism Taxonomy Name: ...inobe Journal: Nature Genetics (1994) 8: 365-372. External Links: Article title: Physical Mapping of Rice Ch...rnal: DNA Research (1997) 4(2): 133-140. External Links: Article title: Physical Mapping of Rice Chromosomes... T Sasaki Journal: Genome Research (1996) 6(10): 935-942. External Links: Article title: Physical mapping of

  8. Genomes to Proteomes

    Energy Technology Data Exchange (ETDEWEB)

    Panisko, Ellen A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Grigoriev, Igor [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Daly, Don S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Webb-Robertson, Bobbie-Jo [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Baker, Scott E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2009-03-01

    Biologists are awash with genomic sequence data. In large part, this is due to the rapid acceleration in the generation of DNA sequence that occurred as public and private research institutes raced to sequence the human genome. In parallel with the large human genome effort, mostly smaller genomes of other important model organisms were sequenced. Projects following on these initial efforts have made use of technological advances and the DNA sequencing infrastructure that was built for the human and other organism genome projects. As a result, the genome sequences of many organisms are available in high quality draft form. While in many ways this is good news, there are limitations to the biological insights that can be gleaned from DNA sequences alone; genome sequences offer only a bird's eye view of the biological processes endemic to an organism or community. Fortunately, the genome sequences now being produced at such a high rate can serve as the foundation for other global experimental platforms such as proteomics. Proteomic methods offer a snapshot of the proteins present at a point in time for a given biological sample. Current global proteomics methods combine enzymatic digestion, separations, mass spectrometry and database searching for peptide identification. One key aspect of proteomics is the prediction of peptide sequences from mass spectrometry data. Global proteomic analysis uses computational matching of experimental mass spectra with predicted spectra based on databases of gene models that are often generated computationally. Thus, the quality of gene models predicted from a genome sequence is crucial in the generation of high quality peptide identifications. Once peptides are identified they can be assigned to their parent protein. Proteins identified as expressed in a given experiment are most useful when compared to other expressed proteins in a larger biological context or biochemical pathway. In this chapter we will discuss the automatic

  9. Number and location of mouse mammary tumor virus proviral DNA in mouse DNA of normal tissue and of mammary tumors.

    Science.gov (United States)

    Groner, B; Hynes, N E

    1980-01-01

    The Southern DNA filter transfer technique was used to characterize the genomic location of the mouse mammary tumor proviral DNA in different inbred strains of mice. Two of the strains (C3H and CBA) arose from a cross of a Bagg albino (BALB/c) mouse and a DBA mouse. The mouse mammary tumor virus-containing restriction enzyme DNA fragments of these strains had similar patterns, suggesting that the proviruses of these mice are in similar genomic locations. Conversely, the pattern arising from the DNA of the GR mouse, a strain genetically unrelated to the others, appeared different, suggesting that its mouse mammary tumor proviruses are located in different genomic sites. The structure of another gene, that coding for beta-globin, was also compared. The mice strains which we studied can be categorized into two classes, expressing either one or two beta-globin proteins. The macroenvironment of the beta-globin gene appeared similar among the mice strains belonging to one genetic class. Female mice of the C3H strain exogenously transmit mouse mammary tumor virus via the milk, and their offspring have a high incidence of mammary tumor occurrence. DNA isolated from individual mammary tumors taken from C3H mice or from BALB/c mice foster nursed on C3H mothers was analyzed by the DNA filter transfer technique. Additional mouse mammary tumor virus-containing fragments were found in the DNA isolated from each mammary tumor. These proviral sequences were integrated into different genomic sites in each tumor. Images PMID:6245257

  10. KALIMER database development

    Energy Technology Data Exchange (ETDEWEB)

    Jeong, Kwan Seong; Lee, Yong Bum; Jeong, Hae Yong; Ha, Kwi Seok

    2003-03-01

    KALIMER database is an advanced database to utilize the integration management for liquid metal reactor design technology development using Web applications. KALIMER design database is composed of results database, Inter-Office Communication (IOC), 3D CAD database, and reserved documents database. Results database is a research results database during all phase for liquid metal reactor design technology development of mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD database is a schematic overview for KALIMER design structure. And reserved documents database is developed to manage several documents and reports since project accomplishment.

  11. KALIMER database development

    International Nuclear Information System (INIS)

    Jeong, Kwan Seong; Lee, Yong Bum; Jeong, Hae Yong; Ha, Kwi Seok

    2003-03-01

    KALIMER database is an advanced database to utilize the integration management for liquid metal reactor design technology development using Web applications. KALIMER design database is composed of results database, Inter-Office Communication (IOC), 3D CAD database, and reserved documents database. Results database is a research results database during all phase for liquid metal reactor design technology development of mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD database is a schematic overview for KALIMER design structure. And reserved documents database is developed to manage several documents and reports since project accomplishment

  12. Astonishing advances in mouse genetic tools for biomedical research.

    Science.gov (United States)

    Kaczmarczyk, Lech; Jackson, Walker S

    2015-01-01

    The humble house mouse has long been a workhorse model system in biomedical research. The technology for introducing site-specific genome modifications led to Nobel Prizes for its pioneers and opened a new era of mouse genetics. However, this technology was very time-consuming and technically demanding. As a result, many investigators continued to employ easier genome manipulation methods, though resulting models can suffer from overlooked or underestimated consequences. Another breakthrough, invaluable for the molecular dissection of disease mechanisms, was the invention of high-throughput methods to measure the expression of a plethora of genes in parallel. However, the use of samples containing material from multiple cell types could obfuscate data, and thus interpretations. In this review we highlight some important issues in experimental approaches using mouse models for biomedical research. We then discuss recent technological advances in mouse genetics that are revolutionising human disease research. Mouse genomes are now easily manipulated at precise locations thanks to guided endonucleases, such as transcription activator-like effector nucleases (TALENs) or the CRISPR/Cas9 system, both also having the potential to turn the dream of human gene therapy into reality. Newly developed methods of cell type-specific isolation of transcriptomes from crude tissue homogenates, followed by detection with next generation sequencing (NGS), are vastly improving gene regulation studies. Taken together, these amazing tools simplify the creation of much more accurate mouse models of human disease, and enable the extraction of hitherto unobtainable data.

  13. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  14. Logical database design principles

    CERN Document Server

    Garmany, John; Clark, Terry

    2005-01-01

    INTRODUCTION TO LOGICAL DATABASE DESIGNUnderstanding a Database Database Architectures Relational Databases Creating the Database System Development Life Cycle (SDLC)Systems Planning: Assessment and Feasibility System Analysis: RequirementsSystem Analysis: Requirements Checklist Models Tracking and Schedules Design Modeling Functional Decomposition DiagramData Flow Diagrams Data Dictionary Logical Structures and Decision Trees System Design: LogicalSYSTEM DESIGN AND IMPLEMENTATION The ER ApproachEntities and Entity Types Attribute Domains AttributesSet-Valued AttributesWeak Entities Constraint

  15. An Interoperable Cartographic Database

    OpenAIRE

    Slobodanka Ključanin; Zdravko Galić

    2007-01-01

    The concept of producing a prototype of interoperable cartographic database is explored in this paper, including the possibilities of integration of different geospatial data into the database management system and their visualization on the Internet. The implementation includes vectorization of the concept of a single map page, creation of the cartographic database in an object-relation database, spatial analysis, definition and visualization of the database content in the form of a map on t...

  16. Software listing: CHEMTOX database

    International Nuclear Information System (INIS)

    Moskowitz, P.D.

    1993-01-01

    Initially launched in 1983, the CHEMTOX Database was among the first microcomputer databases containing hazardous chemical information. The database is used in many industries and government agencies in more than 17 countries. Updated quarterly, the CHEMTOX Database provides detailed environmental and safety information on 7500-plus hazardous substances covered by dozens of regulatory and advisory sources. This brief listing describes the method of accessing data and provides ordering information for those wishing to obtain the CHEMTOX Database

  17. In Vivo SILAC-Based Proteomics Reveals Phosphoproteome Changes during Mouse Skin Carcinogenesis

    NARCIS (Netherlands)

    Zanivan, S.; Meves, A.; Behrendt, K.; Schoof, E.M.; Neilson, L.J.; Cox, J.; Tang, H.R.; Kalna, G.; Ree, J.H. van; Deursen, J.M.A. van; Trempus, C.S.; Machesky, L.M.; Linding, R.; Wickstrom, S.A.; Fassler, R.; Mann, M.

    2013-01-01

    Cancer progresses through distinct stages, and mouse models recapitulating traits of this progression are frequently used to explore genetic, morphological, and pharmacological aspects of tumor development. To complement genomic investigations of this process, we here quantify phosphoproteomic

  18. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  19. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  20. The genomic sequence of ectromelia virus, the causative agent of mousepox

    International Nuclear Information System (INIS)

    Chen Nanhai; Danila, Maria I.; Feng Zehua; Buller, R. Mark L.; Wang Chunlin; Han Xiaosi; Lefkowitz, Elliot J.; Upton, Chris

    2003-01-01

    Ectromelia virus is the causative agent of mousepox, an acute exanthematous disease of mouse colonies in Europe, Japan, China, and the U.S. The Moscow, Hampstead, and NIH79 strains are the most thoroughly studied with the Moscow strain being the most infectious and virulent for the mouse. In the late 1940s mousepox was proposed as a model for the study of the pathogenesis of smallpox and generalized vaccinia in humans. Studies in the last five decades from a succession of investigators have resulted in a detailed description of the virologic and pathologic disease course in genetically susceptible and resistant inbred and out-bred mice. We report the DNA sequence of the left-hand end, the predicted right-hand terminal repeat, and central regions of the genome of the Moscow strain of ectromelia virus (approximately 177,500 bp), which together with the previously sequenced right-hand end, yields a genome of 209,771 bp. We identified 175 potential genes specifying proteins of between 53 and 1924 amino acids, and 29 regions containing sequences related to genes predicted in other poxviruses, but unlikely to encode for functional proteins in ectromelia virus. The translated protein sequences were compared with the protein database for structure/function relationships, and these analyses were used to investigate poxvirus evolution and to attempt to explain at the cellular and molecular level the well-characterized features of the ectromelia virus natural life cycle

  1. The STRING database in 2017

    DEFF Research Database (Denmark)

    Szklarczyk, Damian; Morris, John H; Cook, Helen

    2017-01-01

    A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organi......A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number...... of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known...... pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer...

  2. An update on the mouse liver proteome

    Directory of Open Access Journals (Sweden)

    Borlak Jürgen

    2009-09-01

    Full Text Available Abstract Background Decoding of the liver proteome is subject of intense research, but hampered by methodological constraints. We recently developed an improved protocol for studying rat liver proteins based on 2-DE-MALDI-TOF-MS peptide mass finger printing. This methodology was now applied to develop a mouse liver protein database. Results Liver proteins were extracted by two different lysis buffers in sequence followed by a liquid-phase IEF pre-fractionation and separation of proteins by 2 DE at two different pH ranges, notably 5-8 and 7-10. Based on 9600 in gel digests a total of 643 mouse liver proteins with high sequence coverage (> 20 peptides per protein could be identified by MALDI-TOF-MS peptide mass finger printing. Notably, 255 proteins are novel and have not been reported so far by conventional two-dimensional electrophoresis proteome mapping. Additionally, the results of the present findings for mouse liver were compared to published data of the rat proteome to compile as many proteins as possible in a rodent liver database. Conclusion Based on 2-DE MALDI-TOF-MS a significantly improved proteome map of mouse liver was obtained. We discuss some prominent members of newly identified proteins for a better understanding of liver biology.

  3. Database Description - PSCDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name PSCDB Alternative n...rial Science and Technology (AIST) Takayuki Amemiya E-mail: Database classification Structure Databases - Protein structure Database...554-D558. External Links: Original website information Database maintenance site Graduate School of Informat...available URL of Web services - Need for user registration Not available About This Database Database Descri...ption Download License Update History of This Database Site Policy | Contact Us Database Description - PSCDB | LSDB Archive ...

  4. An XML-based system for synthesis of data from disparate databases.

    Science.gov (United States)

    Kurc, Tahsin; Janies, Daniel A; Johnson, Andrew D; Langella, Stephen; Oster, Scott; Hastings, Shannon; Habib, Farhat; Camerlengo, Terry; Ervin, David; Catalyurek, Umit V; Saltz, Joel H

    2006-01-01

    Diverse data sets have become key building blocks of translational biomedical research. Data types captured and referenced by sophisticated research studies include high throughput genomic and proteomic data, laboratory data, data from imagery, and outcome data. In this paper, the authors present the application of an XML-based data management system to support integration of data from disparate data sources and large data sets. This system facilitates management of XML schemas and on-demand creation and management of XML databases that conform to these schemas. They illustrate the use of this system in an application for genotype-phenotype correlation analyses. This application implements a method of phenotype-genotype correlation based on phylogenetic optimization of large data sets of mouse SNPs and phenotypic data. The application workflow requires the management and integration of genomic information and phenotypic data from external data repositories and from the results of phenotype-genotype correlation analyses. Our implementation supports the process of carrying out a complex workflow that includes large-scale phylogenetic tree optimizations and application of Maddison's concentrated changes test to large phylogenetic tree data sets. The data management system also allows collaborators to share data in a uniform way and supports complex queries that target data sets.

  5. Directory of IAEA databases

    International Nuclear Information System (INIS)

    1991-11-01

    The first edition of the Directory of IAEA Databases is intended to describe the computerized information sources available to IAEA staff members. It contains a listing of all databases produced at the IAEA, together with information on their availability

  6. Native Health Research Database

    Science.gov (United States)

    ... Indian Health Board) Welcome to the Native Health Database. Please enter your search terms. Basic Search Advanced ... To learn more about searching the Native Health Database, click here. Tutorial Video The NHD has made ...

  7. Cell Centred Database (CCDB)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Cell Centered Database (CCDB) is a web accessible database for high resolution 2D, 3D and 4D data from light and electron microscopy, including correlated imaging.

  8. E3 Staff Database

    Data.gov (United States)

    US Agency for International Development — E3 Staff database is maintained by E3 PDMS (Professional Development & Management Services) office. The database is Mysql. It is manually updated by E3 staff as...

  9. NIRS database of the original research database

    International Nuclear Information System (INIS)

    Morita, Kyoko

    1991-01-01

    Recently, library staffs arranged and compiled the original research papers that have been written by researchers for 33 years since National Institute of Radiological Sciences (NIRS) established. This papers describes how the internal database of original research papers has been created. This is a small sample of hand-made database. This has been cumulating by staffs who have any knowledge about computer machine or computer programming. (author)

  10. Scopus database: a review.

    Science.gov (United States)

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  11. Aviation Safety Issues Database

    Science.gov (United States)

    Morello, Samuel A.; Ricks, Wendell R.

    2009-01-01

    The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

  12. From Genome Sequence to Taxonomy - A Skeptic’s View

    DEFF Research Database (Denmark)

    Özen, Asli Ismihan; Vesth, Tammi Camilla; Ussery, David

    2012-01-01

    The relative ease of sequencing bacterial genomes has resulted in thousands of sequenced bacterial genomes available in the public databases. This same technology now allows for using the entire genome sequence as an identifier for an organism. There are many methods available which attempt to us...

  13. Automated Oracle database testing

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Ensuring database stability and steady performance in the modern world of agile computing is a major challenge. Various changes happening at any level of the computing infrastructure: OS parameters & packages, kernel versions, database parameters & patches, or even schema changes, all can potentially harm production services. This presentation shows how an automatic and regular testing of Oracle databases can be achieved in such agile environment.

  14. Inleiding database-systemen

    NARCIS (Netherlands)

    Pels, H.J.; Lans, van der R.F.; Pels, H.J.; Meersman, R.A.

    1993-01-01

    Dit artikel introduceert de voornaamste begrippen die een rol spelen rond databases en het geeft een overzicht van de doelstellingen, de functies en de componenten van database-systemen. Hoewel de functie van een database intuitief vrij duidelijk is, is het toch een in technologisch opzicht complex

  15. Comparative genome analysis of trypanotolerance QTL | Nganga ...

    African Journals Online (AJOL)

    Homologous sequences were used in the definition of synteny relationships and subsequent identification of the shared disease response genes. The homologous genes within the human genome were then identified and aligned to the bovine radiation hybrid map in order to identify the mouse/bovine homologous regions.

  16. CyanoClust: comparative genome resources of cyanobacteria and plastids

    OpenAIRE

    Sasaki, Naobumi V.; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Protein...

  17. Database Description - GenLibi | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ve name Gene Linker to bibliography DOI 10.18908/lsdba.nbdc01093-000 Creator Creator Name: Japan Science and Technology...mouse and rat genes. License CC BY-SA Detail Background and funding Name: JST (Japan Science and Technology ... site Japan Science and Technology Agency URL of the original website http://gene.biosciencedbc.jp/ Operatio...me(s): Journal: External Links: Original website information Database maintenance

  18. An Interoperable Cartographic Database

    Directory of Open Access Journals (Sweden)

    Slobodanka Ključanin

    2007-05-01

    Full Text Available The concept of producing a prototype of interoperable cartographic database is explored in this paper, including the possibilities of integration of different geospatial data into the database management system and their visualization on the Internet. The implementation includes vectorization of the concept of a single map page, creation of the cartographic database in an object-relation database, spatial analysis, definition and visualization of the database content in the form of a map on the Internet. 

  19. Keyword Search in Databases

    CERN Document Server

    Yu, Jeffrey Xu; Chang, Lijun

    2009-01-01

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from

  20. Nuclear power economic database

    International Nuclear Information System (INIS)

    Ding Xiaoming; Li Lin; Zhao Shiping

    1996-01-01

    Nuclear power economic database (NPEDB), based on ORACLE V6.0, consists of three parts, i.e., economic data base of nuclear power station, economic data base of nuclear fuel cycle and economic database of nuclear power planning and nuclear environment. Economic database of nuclear power station includes data of general economics, technique, capital cost and benefit, etc. Economic database of nuclear fuel cycle includes data of technique and nuclear fuel price. Economic database of nuclear power planning and nuclear environment includes data of energy history, forecast, energy balance, electric power and energy facilities

  1. Genome Imprinting

    Indian Academy of Sciences (India)

    the cell nucleus (mitochondrial and chloroplast genomes), and. (3) traits governed ... tively good embryonic development but very poor development of membranes and ... Human homologies for the type of situation described above are naturally ..... imprint; (b) New modifications of the paternal genome in germ cells of each ...

  2. Baculovirus Genomics

    NARCIS (Netherlands)

    Oers, van M.M.; Vlak, J.M.

    2007-01-01

    Baculovirus genomes are covalently closed circles of double stranded-DNA varying in size between 80 and 180 kilobase-pair. The genomes of more than fourty-one baculoviruses have been sequenced to date. The majority of these (37) are pathogenic to lepidopteran hosts; three infect sawflies

  3. Ancient genomes

    OpenAIRE

    Hoelzel, A Rus

    2005-01-01

    Ever since its invention, the polymerase chain reaction has been the method of choice for work with ancient DNA. In an application of modern genomic methods to material from the Pleistocene, a recent study has instead undertaken to clone and sequence a portion of the ancient genome of the cave bear.

  4. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  5. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  6. FANTOM5 CAGE profiles of human and mouse samples

    NARCIS (Netherlands)

    Noguchi, Shuhei; Arakawa, Takahiro; Fukuda, Shiro; Furuno, Masaaki; Hasegawa, Akira; Hori, Fumi; Ishikawa-Kato, Sachi; Kaida, Kaoru; Kaiho, Ai; Kanamori-Katayama, Mutsumi; Kawashima, Tsugumi; Kojima, Miki; Kubosaki, Atsutaka; Manabe, Ri-ichiroh; Murata, Mitsuyoshi; Nagao-Sato, Sayaka; Nakazato, Kenichi; Ninomiya, Noriko; Nishiyori-Sueki, Hiromi; Noma, Shohei; Saijyo, Eri; Saka, Akiko; Sakai, Mizuho; Simon, Christophe; Suzuki, Naoko; Tagami, Michihira; Watanabe, Shoko; Yoshida, Shigehiro; Arner, Peter; Axton, Richard A.; Babina, Magda; Baillie, J. Kenneth; Barnett, Timothy C.; Beckhouse, Anthony G.; Blumenthal, Antje; Bodega, Beatrice; Bonetti, Alessandro; Briggs, James; Brombacher, Frank; Carlisle, Ailsa J.; Clevers, Hans C.; Davis, Carrie A.; Detmar, Michael; Dohi, Taeko; Edge, Albert S. B.; Edinger, Matthias; Ehrlund, Anna; Ekwall, Karl; Endoh, Mitsuhiro; Enomoto, Hideki; Eslami, Afsaneh; Fagiolini, Michela; Fairbairn, Lynsey; Farach-Carson, Mary C.; Faulkner, Geoffrey J.; Ferrai, Carmelo; Fisher, Malcolm E.; Forrester, Lesley M.; Fujita, Rie; Furusawa, Jun-ichi; Geijtenbeek, Teunis B.; Gingeras, Thomas; Goldowitz, Daniel; Guhl, Sven; Guler, Reto; Gustincich, Stefano; Ha, Thomas J.; Hamaguchi, Masahide; Hara, Mitsuko; Hasegawa, Yuki; Herlyn, Meenhard; Heutink, Peter; Hitchens, Kelly J.; Hume, David A.; Ikawa, Tomokatsu; Ishizu, Yuri; Kai, Chieko; Kawamoto, Hiroshi; Kawamura, Yuki I.; Kempfle, Judith S.; Kenna, Tony J.; Kere, Juha; Khachigian, Levon M.; Kitamura, Toshio; Klein, Sarah; Klinken, S. Peter; Knox, Alan J.; Kojima, Soichi; Koseki, Haruhiko; Koyasu, Shigeo; Lee, Weonju; Lennartsson, Andreas; Mackay-sim, Alan; Mejhert, Niklas; Mizuno, Yosuke; Morikawa, Hiromasa; Morimoto, Mitsuru; Moro, Kazuyo; Morris, Kelly J.; Motohashi, Hozumi; Mummery, Christine L.; Nakachi, Yutaka; Nakahara, Fumio; Nakamura, Toshiyuki; Nakamura, Yukio; Nozaki, Tadasuke; Ogishima, Soichi; Ohkura, Naganari; Ohno, Hiroshi; Ohshima, Mitsuhiro; Okada-Hatakeyama, Mariko; Okazaki, Yasushi; Orlando, Valerio; Ovchinnikov, Dmitry A.; Passier, Robert; Patrikakis, Margaret; Pombo, Ana; Pradhan-Bhatt, Swati; Qin, Xian-Yang; Rehli, Michael; Rizzu, Patrizia; Roy, Sugata; Sajantila, Antti; Sakaguchi, Shimon; Sato, Hiroki; Satoh, Hironori; Savvi, Suzana; Saxena, Alka; Schmidl, Christian; Schneider, Claudio; Schulze-Tanzil, Gundula G.; Schwegmann, Anita; Sheng, Guojun; Shin, Jay W.; Sugiyama, Daisuke; Sugiyama, Takaaki; Summers, Kim M.; Takahashi, Naoko; Takai, Jun; Tanaka, Hiroshi; Tatsukawa, Hideki; Tomoiu, Andru; Toyoda, Hiroo; van de Wetering, Marc; van den Berg, Linda M.; Verardo, Roberto; Vijayan, Dipti; Wells, Christine A.; Winteringham, Louise N.; Wolvetang, Ernst; Yamaguchi, Yoko; Yamamoto, Masayuki; Yanagi-Mizuochi, Chiyo; Yoneda, Misako; Yonekura, Yohei; Zhang, Peter G.; Zucchelli, Silvia; Abugessaisa, Imad; Arner, Erik; Harshbarger, Jayson; Kondo, Atsushi; Lassmann, Timo; Lizio, Marina; Sahin, Serkan; Sengstag, Thierry; Severin, Jessica; Shimoji, Hisashi; Suzuki, Masanori; Suzuki, Harukazu; Kawai, Jun; Kondo, Naoto; Itoh, Masayoshi; Daub, Carsten O.; Kasukawa, Takeya; Kawaji, Hideya; Carninci, Piero; Forrest, Alistair R. R.; Hayashizaki, Yoshihide

    2017-01-01

    In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples,

  7. Comparison of TCDD-elicited genome-wide hepatic gene expression in Sprague–Dawley rats and C57BL/6 mice

    Energy Technology Data Exchange (ETDEWEB)

    Nault, Rance; Kim, Suntae; Zacharewski, Timothy R., E-mail: tzachare@msu.edu

    2013-03-01

    Although the structure and function of the AhR are conserved, emerging evidence suggests that downstream effects are species-specific. In this study, rat hepatic gene expression data from the DrugMatrix database (National Toxicology Program) were compared to mouse hepatic whole-genome gene expression data following treatment with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). For the DrugMatrix study, male Sprague–Dawley rats were gavaged daily with 20 μg/kg TCDD for 1, 3 and 5 days, while female C57BL/6 ovariectomized mice were examined 1, 3 and 7 days after a single oral gavage of 30 μg/kg TCDD. A total of 649 rat and 1386 mouse genes (|fold change| ≥ 1.5, P1(t) ≥ 0.99) were differentially expressed following treatment. HomoloGene identified 11,708 orthologs represented across the rat Affymetrix 230 2.0 GeneChip (12,310 total orthologs), and the mouse 4 × 44K v.1 Agilent oligonucleotide array (17,578 total orthologs). Comparative analysis found 563 and 922 orthologs differentially expressed in response to TCDD in the rat and mouse, respectively, with 70 responses associated with immune function and lipid metabolism in common to both. Moreover, QRTPCR analysis of Ceacam1, showed divergent expression (induced in rat; repressed in mouse) functionally consistent with TCDD-elicited hepatic steatosis in the mouse but not the rat. Functional analysis identified orthologs involved in nucleotide binding and acetyltransferase activity in rat, while mouse-specific responses were associated with steroid, phospholipid, fatty acid, and carbohydrate metabolism. These results provide further evidence that TCDD elicits species-specific regulation of distinct gene networks, and outlines considerations for future comparisons of publicly available microarray datasets. - Highlights: ► We performed a whole-genome comparison of TCDD-regulated genes in mice and rats. ► Previous species comparisons were extended using data from the DrugMatrix database. ► Less than 15% of TCDD

  8. Database Description - RPD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RPD Alternative name Rice Proteome Database...titute of Crop Science, National Agriculture and Food Research Organization Setsuko Komatsu E-mail: Database... classification Proteomics Resources Plant databases - Rice Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database... description Rice Proteome Database contains information on protei...and entered in the Rice Proteome Database. The database is searchable by keyword,

  9. Database Description - JSNP | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name JSNP Alternative nam...n Science and Technology Agency Creator Affiliation: Contact address E-mail : Database...sapiens Taxonomy ID: 9606 Database description A database of about 197,000 polymorphisms in Japanese populat...1):605-610 External Links: Original website information Database maintenance site Institute of Medical Scien...er registration Not available About This Database Database Description Download License Update History of This Database

  10. Database Description - ASTRA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name ASTRA Alternative n...tics Journal Search: Contact address Database classification Nucleotide Sequence Databases - Gene structure,...3702 Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description The database represents classified p...(10):1211-6. External Links: Original website information Database maintenance site National Institute of Ad... for user registration Not available About This Database Database Description Dow

  11. Database Description - PLACE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name PLACE Alternative name A Database...Kannondai, Tsukuba, Ibaraki 305-8602, Japan National Institute of Agrobiological Sciences E-mail : Databas...e classification Plant databases Organism Taxonomy Name: Tracheophyta Taxonomy ID: 58023 Database...99, Vol.27, No.1 :297-300 External Links: Original website information Database maintenance site National In...- Need for user registration Not available About This Database Database Descripti

  12. The Nostoc punctiforme Genome

    Energy Technology Data Exchange (ETDEWEB)

    John C. Meeks

    2001-12-31

    Nostoc punctiforme is a filamentous cyanobacterium with extensive phenotypic characteristics and a relatively large genome, approaching 10 Mb. The phenotypic characteristics include a photoautotrophic, diazotrophic mode of growth, but N. punctiforme is also facultatively heterotrophic; its vegetative cells have multiple development alternatives, including terminal differentiation into nitrogen-fixing heterocysts and transient differentiation into spore-like akinetes or motile filaments called hormogonia; and N. punctiforme has broad symbiotic competence with fungi and terrestrial plants, including bryophytes, gymnosperms and an angiosperm. The shotgun-sequencing phase of the N. punctiforme strain ATCC 29133 genome has been completed by the Joint Genome Institute. Annotation of an 8.9 Mb database yielded 7432 open reading frames, 45% of which encode proteins with known or probable known function and 29% of which are unique to N. punctiforme. Comparative analysis of the sequence indicates a genome that is highly plastic and in a state of flux, with numerous insertion sequences and multilocus repeats, as well as genes encoding transposases and DNA modification enzymes. The sequence also reveals the presence of genes encoding putative proteins that collectively define almost all characteristics of cyanobacteria as a group. N. punctiforme has an extensive potential to sense and respond to environmental signals as reflected by the presence of more than 400 genes encoding sensor protein kinases, response regulators and other transcriptional factors. The signal transduction systems and any of the large number of unique genes may play essential roles in the cell differentiation and symbiotic interaction properties of N. punctiforme.

  13. Database Description - Arabidopsis Phenome Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Arabidopsis Phenome Database Database Description General information of database Database n... BioResource Center Hiroshi Masuya Database classification Plant databases - Arabidopsis thaliana Organism T...axonomy Name: Arabidopsis thaliana Taxonomy ID: 3702 Database description The Arabidopsis thaliana phenome i...heir effective application. We developed the new Arabidopsis Phenome Database integrating two novel database...seful materials for their experimental research. The other, the “Database of Curated Plant Phenome” focusing

  14. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates

    Energy Technology Data Exchange (ETDEWEB)

    Nordberg, Henrik [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Cantor, Michael [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Dusheyko, Serge [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Hua, Susan [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Poliakov, Alexander [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Shabalov, Igor [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Smirnova, Tatyana [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Grigoriev, Igor V. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Dubchak, Inna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)

    2013-11-12

    The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a national user facility, serves the diverse scientific community by providing integrated high-throughput sequencing and computational analysis to enable system-based scientific approaches in support of DOE missions related to clean energy generation and environmental characterization. The JGI Genome Portal (http://genome.jgi.doe.gov) provides unified access to all JGI genomic databases and analytical tools. The JGI maintains extensive data management systems and specialized analytical capabilities to manage and interpret complex genomic data. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes. In this paper, we describe major updates of the Genome Portal in the past 2 years with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI.

  15. Thoroughbred Horse Single Nucleotide Polymorphism and Expression Database: HSDB

    Directory of Open Access Journals (Sweden)

    Joon-Ho Lee

    2014-09-01

    Full Text Available Genetics is important for breeding and selection of horses but there is a lack of well-established horse-related browsers or databases. In order to better understand horses, more variants and other integrated information are needed. Thus, we construct a horse genomic variants database including expression and other information. Horse Single Nucleotide Polymorphism and Expression Database (HSDB (http://snugenome2.snu.ac.kr/HSDB provides the number of unexplored genomic variants still remaining to be identified in the horse genome including rare variants by using population genome sequences of eighteen horses and RNA-seq of four horses. The identified single nucleotide polymorphisms (SNPs were confirmed by comparing them with SNP chip data and variants of RNA-seq, which showed a concordance level of 99.02% and 96.6%, respectively. Moreover, the database provides the genomic variants with their corresponding transcriptional profiles from the same individuals to help understand the functional aspects of these variants. The database will contribute to genetic improvement and breeding strategies of Thoroughbreds.

  16. Analysis of 16S libraries of mouse gastrointestinal microflora reveals a large new group of mouse intestinal bacteria

    NARCIS (Netherlands)

    Salzman, NH; de Jong, H; Paterson, Y; Harmsen, HJM; Welling, GW; Bos, NA

    2002-01-01

    Total genomic DNA from samples of intact mouse small intestine, large intestine, caecum and faeces was used as template for PCR amplification of 16S rRNA gene sequences with conserved bacterial primers. Phylogenetic analysis of the amplification products revealed 40 unique 16S rDNA sequences. Of

  17. Characterization and mapping of the mouse NDP (Norrie disease) locus (Ndp).

    Science.gov (United States)

    Battinelli, E M; Boyd, Y; Craig, I W; Breakefield, X O; Chen, Z Y

    1996-02-01

    Norrie disease is a severe X-linked recessive neurological disorder characterized by congenital blindness with progressive loss of hearing. Over half of Norrie patients also manifest different degrees of mental retardation. The gene for Norrie disease (NDP) has recently been cloned and characterized. With the human NDP cDNA, mouse genomic phage libraries were screened for the homolog of the gene. Comparison between mouse and human genomic DNA blots hybridized with the NDP cDNA, as well as analysis of phage clones, shows that the mouse NDP gene is 29 kb in size (28 kb for the human gene). The organization in the two species is very similar. Both have three exons with similar-sized introns and identical exon-intron boundaries between exon 2 and 3. The mouse open reading frame is 393 bp and, like the human coding sequence, is encoded in exons 2 and 3. The absence of six nucleotides in the second mouse exon results in the encoded protein being two amino acids smaller than its human counterpart. The overall homology between the human and mouse NDP protein is 95% and is particularly high (99%) in exon 3, consistent with the apparent functional importance of this region. Analysis of transcription initiation sites suggests the presence of multiple start sites associated with expression of the mouse NDP gene. Pedigree analysis of an interspecific mouse backcross localizes the mouse NDP gene close to Maoa in the conserved segment, which runs from CYBB to PFC in both human and mouse.

  18. Aberrant splicing in transgenes containing introns, exons, and V5 epitopes: lessons from developing an FSHD mouse model expressing a D4Z4 repeat with flanking genomic sequences.

    Directory of Open Access Journals (Sweden)

    Eugénie Ansseau

    Full Text Available The DUX4 gene, encoded within D4Z4 repeats on human chromosome 4q35, has recently emerged as a key factor in the pathogenic mechanisms underlying Facioscapulohumeral muscular dystrophy (FSHD. This recognition prompted development of animal models expressing the DUX4 open reading frame (ORF alone or embedded within D4Z4 repeats. In the first published model, we used adeno-associated viral vectors (AAV and strong viral control elements (CMV promoter, SV40 poly A to demonstrate that the DUX4 cDNA caused dose-dependent toxicity in mouse muscles. As a follow-up, we designed a second generation of DUX4-expressing AAV vectors to more faithfully genocopy the FSHD-permissive D4Z4 repeat region located at 4q35. This new vector (called AAV.D4Z4.V5.pLAM contained the D4Z4/DUX4 promoter region, a V5 epitope-tagged DUX4 ORF, and the natural 3' untranslated region (pLAM harboring two small introns, DUX4 exons 2 and 3, and the non-canonical poly A signal required for stabilizing DUX4 mRNA in FSHD. AAV.D4Z4.V5.pLAM failed to recapitulate the robust pathology of our first generation vectors following delivery to mouse muscle. We found that the DUX4.V5 junction sequence created an unexpected splice donor in the pre-mRNA that was preferentially utilized to remove the V5 coding sequence and DUX4 stop codon, yielding non-functional DUX4 protein with 55 additional residues on its carboxyl-terminus. Importantly, we further found that aberrant splicing could occur in any expression construct containing a functional splice acceptor and sequences resembling minimal splice donors. Our findings represent an interesting case study with respect to AAV.D4Z4.V5.pLAM, but more broadly serve as a note of caution for designing constructs containing V5 epitope tags and/or transgenes with downstream introns and exons.

  19. Gaze beats mouse

    DEFF Research Database (Denmark)

    Mateo, Julio C.; San Agustin, Javier; Hansen, John Paulin

    2008-01-01

    Facial EMG for selection is fast, easy and, combined with gaze pointing, it can provide completely hands-free interaction. In this pilot study, 5 participants performed a simple point-and-select task using mouse or gaze for pointing and a mouse button or a facial-EMG switch for selection. Gaze...

  20. Synthesizing genome-wide association studies and expression microarray reveals novel genes that act in the human growth plate to modulate height.

    Science.gov (United States)

    Lui, Julian C; Nilsson, Ola; Chan, Yingleong; Palmer, Cameron D; Andrade, Anenisia C; Hirschhorn, Joel N; Baron, Jeffrey

    2012-12-01

    Previous meta-analysis of genome-wide association (GWA) studies has identified 180 loci that influence adult height. However, each GWA locus typically comprises a set of contiguous genes, only one of which presumably modulates height. We reasoned that many of the causative genes within these loci influence height because they are expressed in and function in the growth plate, a cartilaginous structure that causes bone elongation and thus determines stature. Therefore, we used expression microarray studies of mouse and rat growth plate, human disease databases and a mouse knockout phenotype database to identify genes within the GWAS loci that are likely required for normal growth plate function. Each of these approaches identified significantly more genes within the GWA height loci than at random genomic locations (P analysis strongly implicates 78 genes in growth plate function, including multiple genes that participate in PTHrP-IHH, BMP and CNP signaling, and many genes that have not previously been implicated in the growth plate. Thus, this analysis reveals a large number of novel genes that regulate human growth plate chondrogenesis and thereby contribute to the normal variations in human adult height. The analytic approach developed for this study may be applied to GWA studies for other common polygenic traits and diseases, thus providing a new general strategy to identify causative genes within GWA loci and to translate genetic associations into mechanistic biological insights.