WorldWideScience

Sample records for mouse genome database

  1. Mouse genome database 2016.

    Science.gov (United States)

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data.

  2. The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology.

    Science.gov (United States)

    Eppig, Janan T; Bult, Carol J; Kadin, James A; Richardson, Joel E; Blake, Judith A; Anagnostopoulos, A; Baldarelli, R M; Baya, M; Beal, J S; Bello, S M; Boddy, W J; Bradt, D W; Burkart, D L; Butler, N E; Campbell, J; Cassell, M A; Corbani, L E; Cousins, S L; Dahmen, D J; Dene, H; Diehl, A D; Drabkin, H J; Frazer, K S; Frost, P; Glass, L H; Goldsmith, C W; Grant, P L; Lennon-Pierce, M; Lewis, J; Lu, I; Maltais, L J; McAndrews-Hill, M; McClellan, L; Miers, D B; Miller, L A; Ni, L; Ormsby, J E; Qi, D; Reddy, T B K; Reed, D J; Richards-Smith, B; Shaw, D R; Sinclair, R; Smith, C L; Szauter, P; Walker, M B; Walton, D O; Washburn, L L; Witham, I T; Zhu, Y

    2005-01-01

    The Mouse Genome Database (MGD) forms the core of the Mouse Genome Informatics (MGI) system (http://www.informatics.jax.org), a model organism database resource for the laboratory mouse. MGD provides essential integration of experimental knowledge for the mouse system with information annotated from both literature and online sources. MGD curates and presents consensus and experimental data representations of genotype (sequence) through phenotype information, including highly detailed reports about genes and gene products. Primary foci of integration are through representations of relationships among genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse and to build and implement the data and semantic standards that are essential for comparative genome analysis. Recent improvements in MGD discussed here include the enhancement of phenotype resources, the re-development of the International Mouse Strain Resource, IMSR, the update of mammalian orthology datasets and the electronic publication of classic books in mouse genetics.

  3. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse.

    Science.gov (United States)

    Blake, Judith A; Bult, Carol J; Eppig, Janan T; Kadin, James A; Richardson, Joel E

    2014-01-01

    The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is the community model organism database resource for the laboratory mouse, a premier animal model for the study of genetic and genomic systems relevant to human biology and disease. MGD maintains a comprehensive catalog of genes, functional RNAs and other genome features as well as heritable phenotypes and quantitative trait loci. The genome feature catalog is generated by the integration of computational and manual genome annotations generated by NCBI, Ensembl and Vega/HAVANA. MGD curates and maintains the comprehensive listing of functional annotations for mouse genes using the Gene Ontology, and MGD curates and integrates comprehensive phenotype annotations including associations of mouse models with human diseases. Recent improvements include integration of the latest mouse genome build (GRCm38), improved access to comparative and functional annotations for mouse genes with expanded representation of comparative vertebrate genomes and new loads of phenotype data from high-throughput phenotyping projects. All MGD resources are freely available to the research community.

  4. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    Science.gov (United States)

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Genome databases

    Energy Technology Data Exchange (ETDEWEB)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  6. The mouse genome database: genotypes, phenotypes, and models of human disease.

    Science.gov (United States)

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2013-01-01

    The laboratory mouse is the premier animal model for studying human biology because all life stages can be accessed experimentally, a completely sequenced reference genome is publicly available and there exists a myriad of genomic tools for comparative and experimental research. In the current era of genome scale, data-driven biomedical research, the integration of genetic, genomic and biological data are essential for realizing the full potential of the mouse as an experimental model. The Mouse Genome Database (MGD; http://www.informatics.jax.org), the community model organism database for the laboratory mouse, is designed to facilitate the use of the laboratory mouse as a model system for understanding human biology and disease. To achieve this goal, MGD integrates genetic and genomic data related to the functional and phenotypic characterization of mouse genes and alleles and serves as a comprehensive catalog for mouse models of human disease. Recent enhancements to MGD include the addition of human ortholog details to mouse Gene Detail pages, the inclusion of microRNA knockouts to MGD's catalog of alleles and phenotypes, the addition of video clips to phenotype images, providing access to genotype and phenotype data associated with quantitative trait loci (QTL) and improvements to the layout and display of Gene Ontology annotations.

  7. Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics.

    Science.gov (United States)

    Gruenberger, Michael; Alberts, Rudi; Smedley, Damian; Swertz, Morris; Schofield, Paul; Schughart, Klaus

    2010-01-22

    The integration of information present in many disparate biological databases represents a major challenge in biomedical research. To define the problems and needs, and to explore strategies for database integration in mouse functional genomics, we consulted the biologist user community and implemented solutions to two user-defined use-cases. We organised workshops, meetings and used a questionnaire to identify the needs of biologist database users in mouse functional genomics. As a result, two use-cases were developed that can be used to drive future designs or extensions of mouse databases. Here, we present the use-cases and describe some initial computational solutions for them. The application for the gene-centric use-case, "MUSIG-Gen" starts from a list of gene names and collects a wide range of data types from several distributed databases in a "shopping cart"-like manner. The iterative user-driven approach is a response to strongly articulated requests from users, especially those without computational biology backgrounds. The application for the phenotype-centric use-case, "MUSIG-Phen", is based on a similar concept and starting from phenotype descriptions retrieves information for associated genes. The use-cases created, and their prototype software implementations should help to better define biologists' needs for database integration and may serve as a starting point for future bioinformatics solutions aimed at end-user biologists.

  8. Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics

    Directory of Open Access Journals (Sweden)

    Schofield Paul

    2010-01-01

    Full Text Available Abstract Background The integration of information present in many disparate biological databases represents a major challenge in biomedical research. To define the problems and needs, and to explore strategies for database integration in mouse functional genomics, we consulted the biologist user community and implemented solutions to two user-defined use-cases. Results We organised workshops, meetings and used a questionnaire to identify the needs of biologist database users in mouse functional genomics. As a result, two use-cases were developed that can be used to drive future designs or extensions of mouse databases. Here, we present the use-cases and describe some initial computational solutions for them. The application for the gene-centric use-case, "MUSIG-Gen" starts from a list of gene names and collects a wide range of data types from several distributed databases in a "shopping cart"-like manner. The iterative user-driven approach is a response to strongly articulated requests from users, especially those without computational biology backgrounds. The application for the phenotype-centric use-case, "MUSIG-Phen", is based on a similar concept and starting from phenotype descriptions retrieves information for associated genes. Conclusion The use-cases created, and their prototype software implementations should help to better define biologists' needs for database integration and may serve as a starting point for future bioinformatics solutions aimed at end-user biologists.

  9. Rat Genome Database: a unique resource for rat, human, and mouse quantitative trait locus data.

    Science.gov (United States)

    Nigam, Rajni; Laulederkind, Stanley J F; Hayman, G Thomas; Smith, Jennifer R; Wang, Shur-Jen; Lowry, Timothy F; Petri, Victoria; De Pons, Jeff; Tutaj, Marek; Liu, Weisong; Jayaraman, Pushkala; Munzenmaier, Diane H; Worthey, Elizabeth A; Dwinell, Melinda R; Shimoyama, Mary; Jacob, Howard J

    2013-09-16

    The rat has been widely used as a disease model in a laboratory setting, resulting in an abundance of genetic and phenotype data from a wide variety of studies. These data can be found at the Rat Genome Database (RGD, http://rgd.mcw.edu/), which provides a platform for researchers interested in linking genomic variations to phenotypes. Quantitative trait loci (QTLs) form one of the earliest and core datasets, allowing researchers to identify loci harboring genes associated with disease. These QTLs are not only important for those using the rat to identify genes and regions associated with disease, but also for cross-organism analyses of syntenic regions on the mouse and the human genomes to identify potential regions for study in these organisms. Currently, RGD has data on >1,900 rat QTLs that include details about the methods and animals used to determine the respective QTL along with the genomic positions and markers that define the region. RGD also curates human QTLs (>1,900) and houses>4,000 mouse QTLs (imported from Mouse Genome Informatics). Multiple ontologies are used to standardize traits, phenotypes, diseases, and experimental methods to facilitate queries, analyses, and cross-organism comparisons. QTLs are visualized in tools such as GBrowse and GViewer, with additional tools for analysis of gene sets within QTL regions. The QTL data at RGD provide valuable information for the study of mapped phenotypes and identification of candidate genes for disease associations.

  10. The FunGenES database: a genomics resource for mouse embryonic stem cell differentiation.

    Science.gov (United States)

    Schulz, Herbert; Kolde, Raivo; Adler, Priit; Aksoy, Irène; Anastassiadis, Konstantinos; Bader, Michael; Billon, Nathalie; Boeuf, Hélène; Bourillot, Pierre-Yves; Buchholz, Frank; Dani, Christian; Doss, Michael Xavier; Forrester, Lesley; Gitton, Murielle; Henrique, Domingos; Hescheler, Jürgen; Himmelbauer, Heinz; Hübner, Norbert; Karantzali, Efthimia; Kretsovali, Androniki; Lubitz, Sandra; Pradier, Laurent; Rai, Meena; Reimand, Jüri; Rolletschek, Alexandra; Sachinidis, Agapios; Savatier, Pierre; Stewart, Francis; Storm, Mike P; Trouillas, Marina; Vilo, Jaak; Welham, Melanie J; Winkler, Johannes; Wobus, Anna M; Hatzopoulos, Antonis K

    2009-09-03

    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the "Functional Genomics in Embryonic Stem Cells" consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in "Expression Waves" and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells.

  11. The FunGenES database: a genomics resource for mouse embryonic stem cell differentiation.

    Directory of Open Access Journals (Sweden)

    Herbert Schulz

    Full Text Available Embryonic stem (ES cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the "Functional Genomics in Embryonic Stem Cells" consortium (FunGenES has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in "Expression Waves" and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells.

  12. Mouse Genome Informatics (MGI)

    Data.gov (United States)

    U.S. Department of Health & Human Services — MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human...

  13. Mouse Phenome Database (MPD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Mouse Phenome Database (MPD) has characterizations of hundreds of strains of laboratory mice to facilitate translational discoveries and to assist in selection...

  14. RIKEN mouse genome encyclopedia.

    Science.gov (United States)

    Hayashizaki, Yoshihide

    2003-01-01

    We have been working to establish the comprehensive mouse full-length cDNA collection and sequence database to cover as many genes as we can, named Riken mouse genome encyclopedia. Recently we are constructing higher-level annotation (Functional ANnoTation Of Mouse cDNA; FANTOM) not only with homology search based annotation but also with expression data profile, mapping information and protein-protein database. More than 1,000,000 clones prepared from 163 tissues were end-sequenced to classify into 159,789 clusters and 60,770 representative clones were fully sequenced. As a conclusion, the 60,770 sequences contained 33,409 unique. The next generation of life science is clearly based on all of the genome information and resources. Based on our cDNA clones we developed the additional system to explore gene function. We developed cDNA microarray system to print all of these cDNA clones, protein-protein interaction screening system, protein-DNA interaction screening system and so on. The integrated database of all the information is very useful not only for analysis of gene transcriptional network and for the connection of gene to phenotype to facilitate positional candidate approach. In this talk, the prospect of the application of these genome resourced should be discussed. More information is available at the web page: http://genome.gsc.riken.go.jp/.

  15. Plant Genome Duplication Database.

    Science.gov (United States)

    Lee, Tae-Ho; Kim, Junah; Robertson, Jon S; Paterson, Andrew H

    2017-01-01

    Genome duplication, widespread in flowering plants, is a driving force in evolution. Genome alignments between/within genomes facilitate identification of homologous regions and individual genes to investigate evolutionary consequences of genome duplication. PGDD (the Plant Genome Duplication Database), a public web service database, provides intra- or interplant genome alignment information. At present, PGDD contains information for 47 plants whose genome sequences have been released. Here, we describe methods for identification and estimation of dates of genome duplication and speciation by functions of PGDD.The database is freely available at http://chibba.agtec.uga.edu/duplication/.

  16. MouseCyc: a curated biochemical pathways database for the laboratory mouse

    OpenAIRE

    Evsikov, Alexei V.; Dolan, Mary E.; Genrich, Michael P; Patek, Emily; Bult, Carol J.

    2009-01-01

    Linking biochemical genetic data to the reference genome for the laboratory mouse is important for comparative physiology and for developing mouse models of human biology and disease. We describe here a new database of curated metabolic pathways for the laboratory mouse called MouseCyc . MouseCyc has been integrated with genetic and genomic data for the laboratory mouse available from the Mouse Genome Informatics database and with pathway data from other organisms, including human.

  17. dbSUPER: a database of super-enhancers in mouse and human genome.

    Science.gov (United States)

    Khan, Aziz; Zhang, Xuegong

    2016-01-04

    Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities.

  18. Rat Genome Database (RGD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Rat Genome Database (RGD) is a collaborative effort between leading research institutions involved in rat genetic and genomic research to collect, consolidate,...

  19. Querying genomic databases

    Energy Technology Data Exchange (ETDEWEB)

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  20. Genomic Database Searching.

    Science.gov (United States)

    Hutchins, James R A

    2017-01-01

    The availability of reference genome sequences for virtually all species under active research has revolutionized biology. Analyses of genomic variations in many organisms have provided insights into phenotypic traits, evolution and disease, and are transforming medicine. All genomic data from publicly funded projects are freely available in Internet-based databases, for download or searching via genome browsers such as Ensembl, Vega, NCBI's Map Viewer, and the UCSC Genome Browser. These online tools generate interactive graphical outputs of relevant chromosomal regions, showing genes, transcripts, and other genomic landmarks, and epigenetic features mapped by projects such as ENCODE.This chapter provides a broad overview of the major genomic databases and browsers, and describes various approaches and the latest resources for searching them. Methods are provided for identifying genomic locus and sequence information using gene names or codes, identifiers for DNA and RNA molecules and proteins; also from karyotype bands, chromosomal coordinates, sequences, motifs, and matrix-based patterns. Approaches are also described for batch retrieval of genomic information, performing more complex queries, and analyzing larger sets of experimental data, for example from next-generation sequencing projects.

  1. 10. international mouse genome conference

    Energy Technology Data Exchange (ETDEWEB)

    Meisler, M.H.

    1996-12-31

    Ten years after hosting the First International Mammalian Genome Conference in Paris in 1986, Dr. Jean-Louis Guenet presided over the Tenth Conference at the Pasteur Institute, October 7--10, 1996. The 1986 conference was a satellite to the Human Gene Mapping Workshop and had approximately 50 attendees. The 1996 meeting was attended by 300 scientists from around the world. In the interim, the number of mapped loci in the mouse increased from 1,000 to over 20,000. This report contains a listing of the program and its participants, and two articles that review the meeting and the role of the laboratory mouse in the Human Genome project. More than 200 papers were presented at the conference covering the following topics: International mouse chromosome committee meetings; Mutant generation and identification; Physical and genetic maps; New technology and resources; Chromatin structure and gene regulation; Rate and hamster genetic maps; Informatics and databases; and Quantitative trait analysis.

  2. Genomic Databases for Crop Improvement

    Directory of Open Access Journals (Sweden)

    David Edwards

    2012-03-01

    Full Text Available Genomics is playing an increasing role in plant breeding and this is accelerating with the rapid advances in genome technology. Translating the vast abundance of data being produced by genome technologies requires the development of custom bioinformatics tools and advanced databases. These range from large generic databases which hold specific data types for a broad range of species, to carefully integrated and curated databases which act as a resource for the improvement of specific crops. In this review, we outline some of the features of plant genome databases, identify specific resources for the improvement of individual crops and comment on the potential future direction of crop genome databases.

  3. The UCSC Genome Browser Database

    DEFF Research Database (Denmark)

    Karolchik, D; Kuhn, R M; Baertsch, R

    2008-01-01

    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrat...

  4. The Riken mouse genome encyclopedia project.

    Science.gov (United States)

    Hayashizaki, Yoshihide

    2003-01-01

    The Riken mouse genome encyclopedia a comprehensive full-length cDNA collection and sequence database. High-level functional annotation is based on sequence homology search, expression profiling, mapping and protein-protein interactions. More than 1000000 clones prepared from 163 tissues were end-sequenced and classified into 128000 clusters, and 60000 representative clones were fully sequenced representing 24000 clear protein-encoding genes. The application of the mouse genome database for positional cloning and gene network regulation analysis is reported.

  5. The UCSC genome browser database

    DEFF Research Database (Denmark)

    Kuhn, R M; Karolchik, D; Zweig, A S

    2007-01-01

    The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up t...

  6. The UCSC Genome Browser Database

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, ...

  7. The Mouse SAGE Site: database of public mouse SAGE libraries.

    Science.gov (United States)

    Divina, Petr; Forejt, Jirí

    2004-01-01

    The Mouse SAGE Site is a web-based database of all available public libraries generated by the Serial Analysis of Gene Expression (SAGE) from various mouse tissues and cell lines. The database contains mouse SAGE libraries organized in a uniform way and provides web-based tools for browsing, comparing and searching SAGE data with reliable tag-to-gene identification. A modified approach based on the SAGEmap database is used for reliable tag identification. The Mouse SAGE Site is maintained on an ongoing basis at the Institute of Molecular Genetics, Academy of Sciences of the Czech Republic and is accessible at the internet address http://mouse.biomed.cas.cz/sage/.

  8. The YH database: the first Asian diploid genome database

    DEFF Research Database (Denmark)

    Li, Guoqing; Ma, Lijia; Song, Chao

    2009-01-01

    The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse......, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH...... genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn....

  9. The YH database: the first Asian diploid genome database.

    Science.gov (United States)

    Li, Guoqing; Ma, Lijia; Song, Chao; Yang, Zhentao; Wang, Xiulan; Huang, Hui; Li, Yingrui; Li, Ruiqiang; Zhang, Xiuqing; Yang, Huanming; Wang, Jian; Wang, Jun

    2009-01-01

    The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.

  10. Searching and Indexing Genomic Databases via Kernelization

    Directory of Open Access Journals (Sweden)

    Travis eGagie

    2015-02-01

    Full Text Available The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper we survey the twenty-year history of this idea and discuss its relation to kernelization in parameterized complexity.

  11. Standards for Clinical Grade Genomic Databases.

    Science.gov (United States)

    Yohe, Sophia L; Carter, Alexis B; Pfeifer, John D; Crawford, James M; Cushman-Vokoun, Allison; Caughron, Samuel; Leonard, Debra G B

    2015-11-01

    Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.

  12. : a database of ciliate genome rearrangements.

    Science.gov (United States)

    Burns, Jonathan; Kukushkin, Denys; Lindblad, Kelsi; Chen, Xiao; Jonoska, Nataša; Landweber, Laura F

    2016-01-01

    Ciliated protists exhibit nuclear dimorphism through the presence of somatic macronuclei (MAC) and germline micronuclei (MIC). In some ciliates, DNA from precursor segments in the MIC genome rearranges to form transcriptionally active genes in the mature MAC genome, making these ciliates model organisms to study the process of somatic genome rearrangement. Similar broad scale, somatic rearrangement events occur in many eukaryotic cells and tumors. The (http://oxytricha.princeton.edu/mds_ies_db) is a database of genome recombination and rearrangement annotations, and it provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila.

  13. A physical map of the mouse genome

    NARCIS (Netherlands)

    Gregory, SG; Sekhon, M; Schein, J; Zhao, SY; Osoegawa, K; Scott, CE; Evans, RS; Burridge, PW; Cox, TV; Fox, CA; Hutton, RD; Mullenger, IR; Phillips, KJ; Smith, J; Stalker, J; Threadgold, GJ; Birney, E; Wylie, K; Chinwalla, A; Wallis, J; Hillier, L; Carter, J; Gaige, T; Jaeger, S; Kremitzki, C; Layman, D; McGrane, R; Mead, K; Walker, R; Jones, S; Smith, M; Asano, J; Bosdet, I; Chan, S; Chittaranjan, S; Chiu, R; Fjell, C; Fuhrmann, D; Girn, N; Gray, C; Guin, R; Hsiao, L; Krzywinski, M; Kutsche, R; Lee, SS; Mathewson, C; McLeavy, C; Messervier, S; Ness, S; Pandoh, P; Prabhu, AL; Saeedi, P; Smailus, D; Spence, L; Stott, J; Taylor, S; Terpstra, W; Tsai, M; Vardy, J; Wye, N; Yang, G; Shatsman, S; Ayodeji, B; Geer, K; Tsegaye, G; Shvartsbeyn, A; Gebregeorgis, E; Krol, M; Russell, D; Overton, L; Malek, JA; Holmes, M; Heaney, M; Shetty, J; Feldblyum, T; Nierman, WC; Catanese, JJ; Hubbard, T; Waterston, RH; Rogers, J; de Jong, PJ; Fraser, CM; Marra, M; McPherson, JD; Bentley, DR

    2002-01-01

    A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs w

  14. A physical map of the mouse genome

    NARCIS (Netherlands)

    Gregory, SG; Sekhon, M; Schein, J; Zhao, SY; Osoegawa, K; Scott, CE; Evans, RS; Burridge, PW; Cox, TV; Fox, CA; Hutton, RD; Mullenger, IR; Phillips, KJ; Smith, J; Stalker, J; Threadgold, GJ; Birney, E; Wylie, K; Chinwalla, A; Wallis, J; Hillier, L; Carter, J; Gaige, T; Jaeger, S; Kremitzki, C; Layman, D; McGrane, R; Mead, K; Walker, R; Jones, S; Smith, M; Asano, J; Bosdet, I; Chan, S; Chittaranjan, S; Chiu, R; Fjell, C; Fuhrmann, D; Girn, N; Gray, C; Guin, R; Hsiao, L; Krzywinski, M; Kutsche, R; Lee, SS; Mathewson, C; McLeavy, C; Messervier, S; Ness, S; Pandoh, P; Prabhu, AL; Saeedi, P; Smailus, D; Spence, L; Stott, J; Taylor, S; Terpstra, W; Tsai, M; Vardy, J; Wye, N; Yang, G; Shatsman, S; Ayodeji, B; Geer, K; Tsegaye, G; Shvartsbeyn, A; Gebregeorgis, E; Krol, M; Russell, D; Overton, L; Malek, JA; Holmes, M; Heaney, M; Shetty, J; Feldblyum, T; Nierman, WC; Catanese, JJ; Hubbard, T; Waterston, RH; Rogers, J; de Jong, PJ; Fraser, CM; Marra, M; McPherson, JD; Bentley, DR

    2002-01-01

    A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs w

  15. The mouse Gene Expression Database (GXD): 2017 update

    Science.gov (United States)

    Finger, Jacqueline H.; Smith, Constance M.; Hayamizu, Terry F.; McCright, Ingeborg J.; Xu, Jingxia; Law, Meiyee; Shaw, David R.; Baldarelli, Richard M.; Beal, Jon S.; Blodgett, Olin; Campbell, Jeff W.; Corbani, Lori E.; Lewis, Jill R.; Forthofer, Kim L.; Frost, Pete J.; Giannatto, Sharon C.; Hutchins, Lucie N.; Miers, Dave B.; Motenko, Howie; Stone, Kevin R.; Eppig, Janan T.; Kadin, James A.; Richardson, Joel E.; Ringwald, Martin

    2017-01-01

    The Gene Expression Database (GXD; www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental expression information. Through curation of the scientific literature and by collaborations with large-scale expression projects, GXD collects and integrates data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments. Expression data from both wild-type and mutant mice are included. The expression data are combined with genetic and phenotypic data in Mouse Genome Informatics (MGI) and made readily accessible to many types of database searches. At present, GXD includes over 1.5 million expression results and more than 300 000 images, all annotated with detailed and standardized metadata. Since our last report in 2014, we have added a large amount of data, we have enhanced data and database infrastructure, and we have implemented many new search and display features. Interface enhancements include: a new Mouse Developmental Anatomy Browser; interactive tissue-by-developmental stage and tissue-by-gene matrix views; capabilities to filter and sort expression data summaries; a batch search utility; gene-based expression overviews; and links to expression data from other species. PMID:27899677

  16. Plant cytogenetics in genome databases

    Science.gov (United States)

    Cytogenetic maps provide an integrated representation of genetic and cytological information that can be used to enhance genome and chromosome research. As genome analysis technologies become more affordable, the density of markers on cytogenetic maps increases, making these resources more useful a...

  17. Mouse Tumor Biology (MTB): a database of mouse models for human cancer.

    Science.gov (United States)

    Bult, Carol J; Krupke, Debra M; Begley, Dale A; Richardson, Joel E; Neuhauser, Steven B; Sundberg, John P; Eppig, Janan T

    2015-01-01

    The Mouse Tumor Biology (MTB; http://tumor.informatics.jax.org) database is a unique online compendium of mouse models for human cancer. MTB provides online access to expertly curated information on diverse mouse models for human cancer and interfaces for searching and visualizing data associated with these models. The information in MTB is designed to facilitate the selection of strains for cancer research and is a platform for mining data on tumor development and patterns of metastases. MTB curators acquire data through manual curation of peer-reviewed scientific literature and from direct submissions by researchers. Data in MTB are also obtained from other bioinformatics resources including PathBase, the Gene Expression Omnibus and ArrayExpress. Recent enhancements to MTB improve the association between mouse models and human genes commonly mutated in a variety of cancers as identified in large-scale cancer genomics studies, provide new interfaces for exploring regions of the mouse genome associated with cancer phenotypes and incorporate data and information related to Patient-Derived Xenograft models of human cancers.

  18. BGD: a database of bat genomes.

    Directory of Open Access Journals (Sweden)

    Jianfei Fang

    Full Text Available Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD. BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  19. The UCSC Genome Browser database: 2016 update.

    Science.gov (United States)

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

  20. Database Description - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available TMBETA-GENOME Database Description General information of database Database name TMBETA-GENOME Alternative n...oinfo/Gromiha/ Database classification Protein sequence databases - Protein prope...: Eukaryota Taxonomy ID: 2759 Database description TMBETA-GENOME is a database for transmembrane β-barrel pr...lgorithms and statistical methods have been perfumed and the annotation results are accumulated in the database.... Features and manner of utilization of database Users can download lists of sequences predicted as β-bar

  1. A unified gene catalog for the laboratory mouse reference genome.

    Science.gov (United States)

    Zhu, Y; Richardson, J E; Hale, P; Baldarelli, R M; Reed, D J; Recla, J M; Sinclair, R; Reddy, T B K; Bult, C J

    2015-08-01

    We report here a semi-automated process by which mouse genome feature predictions and curated annotations (i.e., genes, pseudogenes, functional RNAs, etc.) from Ensembl, NCBI and Vertebrate Genome Annotation database (Vega) are reconciled with the genome features in the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org) into a comprehensive and non-redundant catalog. Our gene unification method employs an algorithm (fjoin--feature join) for efficient detection of genome coordinate overlaps among features represented in two annotation data sets. Following the analysis with fjoin, genome features are binned into six possible categories (1:1, 1:0, 0:1, 1:n, n:1, n:m) based on coordinate overlaps. These categories are subsequently prioritized for assessment of annotation equivalencies and differences. The version of the unified catalog reported here contains more than 59,000 entries, including 22,599 protein-coding coding genes, 12,455 pseudogenes, and 24,007 other feature types (e.g., microRNAs, lincRNAs, etc.). More than 23,000 of the entries in the MGI gene catalog have equivalent gene models in the annotation files obtained from NCBI, Vega, and Ensembl. 12,719 of the features are unique to NCBI relative to Ensembl/Vega; 11,957 are unique to Ensembl/Vega relative to NCBI, and 3095 are unique to MGI. More than 4000 genome features fall into categories that require manual inspection to resolve structural differences in the gene models from different annotation sources. Using the MGI unified gene catalog, researchers can easily generate a comprehensive report of mouse genome features from a single source and compare the details of gene and transcript structure using MGI's mouse genome browser.

  2. Genomic organization and sequence analysis of the vomeronasal receptor V2R genes in mouse genome

    Institute of Scientific and Technical Information of China (English)

    YANG Hui; Zhang YaPing

    2007-01-01

    Two multigene superfamilies, named V1R and V2R, encoding seven-transmembrane-domain G-protein coupled receptors (GPCRs) have been identified as pheromone receptors in mammals. Three V2R gene families have been described in mouse and rat. Here we screened the updated mouse genome sequence database and finally retrieved 63 putative functional V2R genes including three newly identified genes which formed a new additional family. We described the genomic organization of these genes and also characterized the conservation of mouse V2R protein sequences. These genomic and sequence information we described are useful as part of the evidence to speculate the functional domain of V2Rs and should give aid to the functionality study in the future.

  3. Gramene database: Navigating plant comparative genomics resources

    Directory of Open Access Journals (Sweden)

    Parul Gupta

    2016-11-01

    Full Text Available Gramene (http://www.gramene.org is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationships to enrich the annotation of genomic data and provides tools to perform powerful comparative analyses across a wide spectrum of plant species. It consists of an integrated portal for querying, visualizing and analyzing data for 44 plant reference genomes, genetic variation data sets for 12 species, expression data for 16 species, curated rice pathways and orthology-based pathway projections for 66 plant species including various crops. Here we briefly describe the functions and uses of the Gramene database.

  4. Benchmarking database performance for genomic data.

    Science.gov (United States)

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc.

  5. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Loots, G; Ovcharenko, I

    2006-08-08

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions (ECRs) in vertebrate genomes entitled ECRbase that is constructed from a collection of pairwise vertebrate genome alignments produced by the ECR Browser database. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a collection of promoters in all vertebrate genomes presented in the database. The database also contains a collection of annotated transcription factor binding sites (TFBS) in all ECRs and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and two pufferfish genomes. It is freely accessible at http://ECRbase.dcode.org.

  6. Insights from Human/Mouse genome comparisons

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.

    2003-03-30

    Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestry of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.

  7. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome.

    Science.gov (United States)

    Elsik, Christine G; Unni, Deepak R; Diesh, Colin M; Tayal, Aditi; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies.

  8. The UCSC Genome Browser database: 2017 update.

    Science.gov (United States)

    Tyner, Cath; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Eisenhart, Christopher; Fischer, Clayton M; Gibson, David; Gonzalez, Jairo Navarro; Guruvadoo, Luvina; Haeussler, Maximilian; Heitner, Steve; Hinrichs, Angie S; Karolchik, Donna; Lee, Brian T; Lee, Christopher M; Nejad, Parisa; Raney, Brian J; Rosenbloom, Kate R; Speir, Matthew L; Villarreal, Chris; Vivian, John; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2017-01-04

    Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.

  9. The UCSC Genome Browser database: 2017 update

    Science.gov (United States)

    Tyner, Cath; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Eisenhart, Christopher; Fischer, Clayton M.; Gibson, David; Gonzalez, Jairo Navarro; Guruvadoo, Luvina; Haeussler, Maximilian; Heitner, Steve; Hinrichs, Angie S.; Karolchik, Donna; Lee, Brian T.; Lee, Christopher M.; Nejad, Parisa; Raney, Brian J.; Rosenbloom, Kate R.; Speir, Matthew L.; Villarreal, Chris; Vivian, John; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2017-01-01

    Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new ‘multi-region’ track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan. PMID:27899642

  10. How to Use the Candida Genome Database.

    Science.gov (United States)

    Skrzypek, Marek S; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD.

  11. Requirements and standards for organelle genome databases

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.

    2006-01-09

    Mitochondria and plastids (collectively called organelles)descended from prokaryotes that adopted an intracellular, endosymbioticlifestyle within early eukaryotes. Comparisons of their remnant genomesaddress a wide variety of biological questions, especially when includingthe genomes of their prokaryotic relatives and the many genes transferredto the eukaryotic nucleus during the transitions from endosymbiont toorganelle. The pace of producing complete organellar genome sequences nowmakes it unfeasible to do broad comparisons using the primary literatureand, even if it were feasible, it is now becoming uncommon for journalsto accept detailed descriptions of genome-level features. Unfortunatelyno database is currently useful for this task, since they have littlestandardization and are riddled with error. Here I outline what iscurrently wrong and what must be done to make this data useful to thescientific community.

  12. How to use the Candida Genome Database

    Science.gov (United States)

    Skrzypek, Marek S.; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Summary Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  13. Online genetic databases informing human genome epidemiology

    Directory of Open Access Journals (Sweden)

    Higgins Julian PT

    2007-07-01

    Full Text Available Abstract Background With the advent of high throughput genotyping technology and the information available via projects such as the human genome sequencing and the HapMap project, more and more data relevant to the study of genetics and disease risk will be produced. Systematic reviews and meta-analyses of human genome epidemiology studies rely on the ability to identify relevant studies and to obtain suitable data from these studies. A first port of call for most such reviews is a search of MEDLINE. We examined whether this could be usefully supplemented by identifying databases on the World Wide Web that contain genetic epidemiological information. Methods We conducted a systematic search for online databases containing genetic epidemiological information on gene prevalence or gene-disease association. In those containing information on genetic association studies, we examined what additional information could be obtained to supplement a MEDLINE literature search. Results We identified 111 databases containing prevalence data, 67 databases specific to a single gene and only 13 that contained information on gene-disease associations. Most of the latter 13 databases were linked to MEDLINE, although five contained information that may not be available from other sources. Conclusion There is no single resource of structured data from genetic association studies covering multiple diseases, and in relation to the number of studies being conducted there is very little information specific to gene-disease association studies currently available on the World Wide Web. Until comprehensive data repositories are created and utilized regularly, new data will remain largely inaccessible to many systematic review authors and meta-analysts.

  14. Update History of This Database - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us TMBETA-GENOME Up...date History of This Database Date Update contents 2015/03/09 TMBETA-GENOME English archive ...site is opened. Joomla SEF URLs by Artio About This Database Database Description Download License Update Hi...story of This Database Site Policy | Contact Us Update History of This Database - TMBETA-GENOME | LSDB Archive ...

  15. Mouse genome engineering using designer nucleases.

    Science.gov (United States)

    Hermann, Mario; Cermak, Tomas; Voytas, Daniel F; Pelczar, Pawel

    2014-04-02

    Transgenic mice carrying site-specific genome modifications (knockout, knock-in) are of vital importance for dissecting complex biological systems as well as for modeling human diseases and testing therapeutic strategies. Recent advances in the use of designer nucleases such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) 9 system for site-specific genome engineering open the possibility to perform rapid targeted genome modification in virtually any laboratory species without the need to rely on embryonic stem (ES) cell technology. A genome editing experiment typically starts with identification of designer nuclease target sites within a gene of interest followed by construction of custom DNA-binding domains to direct nuclease activity to the investigator-defined genomic locus. Designer nuclease plasmids are in vitro transcribed to generate mRNA for microinjection of fertilized mouse oocytes. Here, we provide a protocol for achieving targeted genome modification by direct injection of TALEN mRNA into fertilized mouse oocytes.

  16. MTGD: The Medicago truncatula genome database.

    Science.gov (United States)

    Krishnakumar, Vivek; Kim, Maria; Rosen, Benjamin D; Karamycheva, Svetlana; Bidwell, Shelby L; Tang, Haibao; Town, Christopher D

    2015-01-01

    Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant 'mines' such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.

  17. Contemporary approaches for modifying the mouse genome

    Science.gov (United States)

    Adams, David J.; van der Weyden, Louise

    2008-01-01

    The mouse is a premiere experimental organism that has contributed significantly to our understanding of vertebrate biology. Manipulation of the mouse genome via embryonic stem (ES) cell technology makes it possible to engineer an almost limitless repertoire of mutations to model human disease and assess gene function. In this review we outline recent advances in mouse experimental genetics and provide a “how-to” guide for those people wishing to access this technology. We also discuss new technologies, such as transposon-mediated mutagenesis, and resources of targeting vectors and ES cells, which are likely to dramatically accelerate the pace with which we can assess gene function in vivo, and the progress of forward and reverse genetic screens in mice. PMID:18559964

  18. Ontology searching and browsing at the Rat Genome Database

    Science.gov (United States)

    Laulederkind, Stanley J. F.; Tutaj, Marek; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R.; Jacob, Howard J.

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a ‘driller’ type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. Database URL: http://rgd.mcw.edu/rgdweb/ontology/search.html PMID:22434847

  19. Private and Efficient Query Processing on Outsourced Genomic Databases.

    Science.gov (United States)

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  20. CyanoBase: the cyanobacteria genome database update 2010.

    Science.gov (United States)

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  1. Exploring human disease using the Rat Genome Database

    Directory of Open Access Journals (Sweden)

    Mary Shimoyama

    2016-10-01

    Full Text Available Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers – within and beyond the rat community – who are particularly interested in leveraging rat-based insights to understand human diseases.

  2. Exploring human disease using the Rat Genome Database

    Science.gov (United States)

    Laulederkind, Stanley J. F.; De Pons, Jeff; Nigam, Rajni; Smith, Jennifer R.; Tutaj, Marek; Petri, Victoria; Hayman, G. Thomas; Wang, Shur-Jen; Ghiasvand, Omid; Thota, Jyothi; Dwinell, Melinda R.

    2016-01-01

    ABSTRACT Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers – within and beyond the rat community – who are particularly interested in leveraging rat-based insights to understand human diseases. PMID:27736745

  3. NIG_MoG: a mouse genome navigator for exploring intersubspecific genetic polymorphisms.

    Science.gov (United States)

    Takada, Toyoyuki; Yoshiki, Atsushi; Obata, Yuichi; Yamazaki, Yukiko; Shiroishi, Toshihiko

    2015-08-01

    The National Institute of Genetics Mouse Genome database (NIG_MoG; http://molossinus.lab.nig.ac.jp/msmdb/) primarily comprises the whole-genome sequence data of two inbred mouse strains, MSM/Ms and JF1/Ms. These strains were established at NIG and originated from the Japanese subspecies Mus musculus molossinus. NIG_MoG provides visualized genome polymorphism information, browsing single-nucleotide polymorphisms and short insertions and deletions in the genomes of MSM/Ms and JF1/Ms with respect to C57BL/6J (whose genome is predominantly derived from the West European subspecies M. m. domesticus). This allows users, especially wet-lab biologists, to intuitively recognize intersubspecific genome divergence in these mouse strains using visual data. The database also supports the in silico screening of bacterial artificial chromosome (BAC) clones that contain genomic DNA from MSM/Ms and the standard classical laboratory strain C57BL/6N. NIG_MoG is thus a valuable navigator for exploring mouse genome polymorphisms and BAC clones that are useful for studies of gene function and regulation based on intersubspecific genome divergence.

  4. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  5. Complex Loci in human and mouse genomes.

    Science.gov (United States)

    Engström, Pär G; Suzuki, Harukazu; Ninomiya, Noriko; Akalin, Altuna; Sessa, Luca; Lavorgna, Giovanni; Brozzi, Alessandro; Luzi, Lucilla; Tan, Sin Lam; Yang, Liang; Kunarso, Galih; Ng, Edwin Lian-Chong; Batalov, Serge; Wahlestedt, Claes; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Wells, Christine; Bajic, Vladimir B; Orlando, Valerio; Reid, James F; Lenhard, Boris; Lipovich, Leonard

    2006-04-01

    Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis-antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis-antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis-antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis-antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis-antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis-antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.

  6. Complex Loci in human and mouse genomes.

    Directory of Open Access Journals (Sweden)

    Pär G Engström

    2006-04-01

    Full Text Available Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis-antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis-antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs, along with 6,141 cis-antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis-antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis-antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis-antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.

  7. Plant database resources at The Institute for Genomic Research.

    Science.gov (United States)

    Chan, Agnes P; Rabinowicz, Pablo D; Quackenbush, John; Buell, C Robin; Town, Chris D

    2007-01-01

    With the completion of the genome sequences of the model plants Arabidopsis and rice, and the continuing sequencing efforts of other economically important crop plants, an unprecedented amount of genome sequence data is now available for large-scale genomics studies and analyses, such as the identification and discovery of novel genes, comparative genomics, and functional genomics. Efficient utilization of these large data sets is critically dependent on the ease of access and organization of the data. The plant databases at The Institute for Genomic Research (TIGR) have been set up to maintain various data types including genomic sequence, annotation and analyses, expressed transcript assemblies and analyses, and gene expression profiles from microarray studies. We present here an overview of the TIGR database resources for plant genomics and describe methods to access the data.

  8. Gramene database: navigating plant comparative genomics resources

    Science.gov (United States)

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  9. Recent updates and developments to plant genome size databases

    Science.gov (United States)

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  10. StellaBase: The Nematostella vectensis Genomics Database

    OpenAIRE

    James C Sullivan; Ryan, Joseph F; Watson, James A.; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2005-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions...

  11. A report from the Sixth International Mouse Genome Conference

    Energy Technology Data Exchange (ETDEWEB)

    Brown, S. [Saint Mary`s Hospital Medical School, London (United Kingdom). Dept. of Biochemistry and Molecular Genetics

    1992-12-31

    The Sixth Annual Mouse Genome Conference was held in October, 1992 at Buffalo, USA. The mouse is one of the primary model organisms in the Human Genome Project. Through the use of gene targeting studies the mouse has become a powerful biological model for the study of gene function and, in addition, the comparison of the many homologous mutations identified in human and mouse have widened our understanding of the biology of these two organisms. A primary goal in the mouse genome program has been to create a genetic map of STSs of high resolution (<1cM) that would form the basis for the physical mapping of the whole mouse genome. Buffalo saw substantial new progress towards the goal of a very high density genetic map and the beginnings of substantive efforts towards physical mapping in chromosome regions with a high density of genetic markers.

  12. A Compressed Self-Index for Genomic Databases

    CERN Document Server

    Gagie, Travis; Nekrich, Yakov; Puglisi, Simon J

    2011-01-01

    Advances in DNA sequencing technology will soon result in databases of thousands of genomes. Within a species, individuals' genomes are almost exact copies of each other; e.g., any two human genomes are 99.9% the same. Relative Lempel-Ziv (RLZ) compression takes advantage of this property: it stores the first genome uncompressed or as an FM-index, then compresses the other genomes with a variant of LZ77 that copies phrases only from the first genome. RLZ achieves good compression and supports fast random access; in this paper we show how to support fast search as well, thus obtaining an efficient compressed self-index.

  13. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    Science.gov (United States)

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org.

  14. Database for exchangeable gene trap clones: pathway and gene ontology analysis of exchangeable gene trap clone mouse lines.

    Science.gov (United States)

    Araki, Masatake; Nakahara, Mai; Muta, Mayumi; Itou, Miharu; Yanai, Chika; Yamazoe, Fumika; Miyake, Mikiko; Morita, Ayaka; Araki, Miyuki; Okamoto, Yoshiyuki; Nakagata, Naomi; Yoshinobu, Kumiko; Yamamura, Ken-ichi; Araki, Kimi

    2014-02-01

    Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we used a promoter-trap strategy, all trapped genes were expressed in ES cells. To understand the general characteristics of the trapped genes in the EGTC library, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway analysis and found that the EGTC ES clones covered a broad range of pathways. We also used Gene Ontology (GO) classification data provided by Mouse Genome Informatics (MGI) to compare the functional distribution of genes in each GO term between trapped genes in the EGTC mouse lines and total genes annotated in MGI. We found the functional distributions for the trapped genes in the EGTC mouse lines and for the RefSeq genes for the whole mouse genome were similar, indicating that the EGTC mouse lines had trapped a wide range of mouse genes. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.

  15. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    OpenAIRE

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10...

  16. viruSITE—integrated database for viral genomics

    Science.gov (United States)

    Stano, Matej; Beke, Gabor; Klucar, Lubos

    2016-01-01

    Viruses are the most abundant biological entities and the reservoir of most of the genetic diversity in the Earth's biosphere. Viral genomes are very diverse, generally short in length and compared to other organisms carry only few genes. viruSITE is a novel database which brings together high-value information compiled from various resources. viruSITE covers the whole universe of viruses and focuses on viral genomes, genes and proteins. The database contains information on virus taxonomy, host range, genome features, sequential relatedness as well as the properties and functions of viral genes and proteins. All entries in the database are linked to numerous information resources. The above-mentioned features make viruSITE a comprehensive knowledge hub in the field of viral genomics. The web interface of the database was designed so as to offer an easy-to-navigate, intuitive and user-friendly environment. It provides sophisticated text searching and a taxonomy-based browsing system. viruSITE also allows for an alternative approach based on sequence search. A proprietary genome browser generates a graphical representation of viral genomes. In addition to retrieving and visualising data, users can perform comparative genomics analyses using a variety of tools. Database URL: http://www.virusite.org/ PMID:28025349

  17. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  18. Uniform standards for genome databases in forest and fruit trees

    Science.gov (United States)

    TreeGenes and tfGDR serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype...

  19. Meeting Report: The Twelfth International Mouse Genome Conference

    Energy Technology Data Exchange (ETDEWEB)

    Manolakou, Katerina; Cross, Sally H.; Simpson, Eleanor H.; Jackson, Ian J.

    1998-10-01

    The annual International Mouse Genome Conference (IMGC) is where, scientifically speaking, classical mouse genetics meets the relative newcomer of genomics. The 12th meeting took place last October in the delightful Bavarian village of Garmisch-Partenkirchen, and we were greeted by the sight on the mountains of the first snowfall of the season. However the discussions left little time for exploration. Minds of participants in Garmisch were focused by a recent document produced by the NIH and by discussions within other funding agencies worldwide. If implemented, the proposals will further enhance the status of the mouse as the principal model for study of the function of the human genome.

  20. Potential translational targets revealed by linking mouse grooming behavioral phenotypes to gene expression using public databases.

    Science.gov (United States)

    Roth, Andrew; Kyzar, Evan J; Cachat, Jonathan; Stewart, Adam Michael; Green, Jeremy; Gaikwad, Siddharth; O'Leary, Timothy P; Tabakoff, Boris; Brown, Richard E; Kalueff, Allan V

    2013-01-10

    Rodent self-grooming is an important, evolutionarily conserved behavior, highly sensitive to pharmacological and genetic manipulations. Mice with aberrant grooming phenotypes are currently used to model various human disorders. Therefore, it is critical to understand the biology of grooming behavior, and to assess its translational validity to humans. The present in-silico study used publicly available gene expression and behavioral data obtained from several inbred mouse strains in the open-field, light-dark box, elevated plus- and elevated zero-maze tests. As grooming duration differed between strains, our analysis revealed several candidate genes with significant correlations between gene expression in the brain and grooming duration. The Allen Brain Atlas, STRING, GoMiner and Mouse Genome Informatics databases were used to functionally map and analyze these candidate mouse genes against their human orthologs, assessing the strain ranking of their expression and the regional distribution of expression in the mouse brain. This allowed us to identify an interconnected network of candidate genes (which have expression levels that correlate with grooming behavior), display altered patterns of expression in key brain areas related to grooming, and underlie important functions in the brain. Collectively, our results demonstrate the utility of large-scale, high-throughput data-mining and in-silico modeling for linking genomic and behavioral data, as well as their potential to identify novel neural targets for complex neurobehavioral phenotypes, including grooming.

  1. wFleaBase: the Daphnia genome database

    Directory of Open Access Journals (Sweden)

    Singan Vasanth R

    2005-03-01

    Full Text Available Abstract Background wFleaBase is a database with the necessary infrastructure to curate, archive and share genetic, molecular and functional genomic data and protocols for an emerging model organism, the microcrustacean Daphnia. Commonly known as the water-flea, Daphnia's ecological merit is unequaled among metazoans, largely because of its sentinel role within freshwater ecosystems and over 200 years of biological investigations. By consequence, the Daphnia Genomics Consortium (DGC has launched an interdisciplinary research program to create the resources needed to study genes that affect ecological and evolutionary success in natural environments. Discussion These tools include the genome database wFleaBase, which currently contains functions to search and extract information from expressed sequenced tags, genome survey sequences and full genome sequencing projects. This new database is built primarily from core components of the Generic Model Organism Database project, and related bioinformatics tools. Summary Over the coming year, preliminary genetic maps and the nearly complete genomic sequence of Daphnia pulex will be integrated into wFleaBase, including gene predictions and ortholog assignments based on sequence similarities with eukaryote genes of known function. wFleaBase aims to serve a large ecological and evolutionary research community. Our challenge is to rapidly expand its content and to ultimately integrate genetic and functional genomic information with population-level responses to environmental challenges. URL: http://wfleabase.org/.

  2. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    Science.gov (United States)

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.

  3. OryzaGenome: Genome Diversity Database of Wild Oryza Species

    KAUST Repository

    Ohyanagi, Hajime

    2015-11-18

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a textbased browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.

  4. OryzaGenome: Genome Diversity Database of Wild Oryza Species.

    Science.gov (United States)

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori

    2016-01-01

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.

  5. Development of a maize molecular evolutionary genomic database.

    Science.gov (United States)

    Du, Chunguang; Buckler, Edward; Muse, Spencer

    2003-01-01

    PANZEA is the first public database for studying maize genomic diversity. It was initiated as a repository of genomic diversity for an NSF Plant Genome project on 'Maize Evolutionary Genomics'. PANZEA is hosted at the Bioinformatics Research Center, North Carolina State University, and is open to the public (http://statgen.ncsu.edu/panzea). PANZEA is designed to capture the interrelationships between germplasm, molecular diversity, phenotypic diversity and genome structure. It has the ability to store, integrate and visualize DNA sequence, enzymatic, SSR (simple sequence repeat) marker, germplasm and phenotypic data. The relational data model is selected and implemented in Oracle. An automated DNA sequence data submission tool has been created that allows project researchers to remotely submit their DNA sequence data directly to PANZEA. On-line database search forms and reports have been created to allow users to search or download germplasm, DNA sequence, gene/locus data and much more, directly from the web.

  6. microPIR2: a comprehensive database for human-mouse comparative study of microRNA-promoter interactions.

    Science.gov (United States)

    Piriyapongsa, Jittima; Bootchai, Chaiwat; Ngamphiw, Chumpol; Tongsima, Sissades

    2014-01-01

    microRNA (miRNA)-promoter interaction resource (microPIR) is a public database containing over 15 million predicted miRNA target sites located within human promoter sequences. These predicted targets are presented along with their related genomic and experimental data, making the microPIR database the most comprehensive repository of miRNA promoter target sites. Here, we describe major updates of the microPIR database including new target predictions in the mouse genome and revised human target predictions. The updated database (microPIR2) now provides ∼80 million human and 40 million mouse predicted target sites. In addition to being a reference database, microPIR2 is a tool for comparative analysis of target sites on the promoters of human-mouse orthologous genes. In particular, this new feature was designed to identify potential miRNA-promoter interactions conserved between species that could be stronger candidates for further experimental validation. We also incorporated additional supporting information to microPIR2 such as nuclear and cytoplasmic localization of miRNAs and miRNA-disease association. Extra search features were also implemented to enable various investigations of targets of interest. Database URL: http://www4a.biotec.or.th/micropir2

  7. Matching curated genome databases: a non trivial task

    Directory of Open Access Journals (Sweden)

    Labedan Bernard

    2008-10-01

    Full Text Available Abstract Background Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq and EBI (Genome Reviews to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. Results Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS, and/or in the total number of CDS in the respective version of the same genome. CorBank is freely accessible at http://www.corbank.u-psud.fr. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. Conclusion CorBank is very efficient in rapid detection of the numerous differences existing

  8. The Yak genome database: an integrative database for studying yak biology and high-altitude adaption.

    Science.gov (United States)

    Hu, Quanjun; Ma, Tao; Wang, Kun; Xu, Ting; Liu, Jianquan; Qiu, Qiang

    2012-11-07

    The yak (Bos grunniens) is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. The Yak Genome Database (YGD) is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. A new yak genome database (YGD) has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak.

  9. The Yak genome database: an integrative database for studying yak biology and high-altitude adaption

    Directory of Open Access Journals (Sweden)

    Hu Quanjun

    2012-11-01

    Full Text Available Abstract Background The yak (Bos grunniens is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. Description The Yak Genome Database (YGD is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. Conclusions A new yak genome database (YGD has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak.

  10. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  11. StellaBase: the Nematostella vectensis Genomics Database.

    Science.gov (United States)

    Sullivan, James C; Ryan, Joseph F; Watson, James A; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2006-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at http://www.stellabase.org.

  12. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  13. Engineering subtle targeted mutations into the mouse genome.

    Science.gov (United States)

    Menke, Douglas B

    2013-09-01

    Homologous recombination in embryonic stem (ES) cells offers an exquisitely precise mechanism to introduce targeted modifications to the mouse genome. This ability to produce specific alterations to the mouse genome has become an essential tool for the analysis of gene function and the development of mouse models of human disease. Of the many thousands of mouse alleles that have been generated by gene targeting, the majority are designed to completely ablate gene function, to create conditional alleles that are inactivated in the presence of Cre recombinase, or to produce reporter alleles that label-specific tissues or cell populations (Eppig et al., 2012, Nucleic Acids Res 40:D881-D886). However, there is a variety of powerful motivations for the introduction of subtle targeted mutations (STMs) such as point mutations, small deletions, or small insertions into the mouse genome. The introduction of STMs allows the ablation of specific transcript isoforms, permits the functional investigation of particular domains or amino acids within a protein, provides the ability to study the role of specific sites with in cis-regulatory elements, and can result in better mouse models of human genetic disorders. In this review, I examine the current strategies that are commonly used to introduce STMs into the mouse genome and highlight new gene targeting technologies, including TALENs and CRISPR/Cas, which are likely to influence the future of gene targeting in mice.

  14. KAIKObase: An integrated silkworm genome database and data mining tool

    Directory of Open Access Journals (Sweden)

    Nagaraju Javaregowda

    2009-10-01

    Full Text Available Abstract Background The silkworm, Bombyx mori, is one of the most economically important insects in many developing countries owing to its large-scale cultivation for silk production. With the development of genomic and biotechnological tools, B. mori has also become an important bioreactor for production of various recombinant proteins of biomedical interest. In 2004, two genome sequencing projects for B. mori were reported independently by Chinese and Japanese teams; however, the datasets were insufficient for building long genomic scaffolds which are essential for unambiguous annotation of the genome. Now, both the datasets have been merged and assembled through a joint collaboration between the two groups. Description Integration of the two data sets of silkworm whole-genome-shotgun sequencing by the Japanese and Chinese groups together with newly obtained fosmid- and BAC-end sequences produced the best continuity (~3.7 Mb in N50 scaffold size among the sequenced insect genomes and provided a high degree of nucleotide coverage (88% of all 28 chromosomes. In addition, a physical map of BAC contigs constructed by fingerprinting BAC clones and a SNP linkage map constructed using BAC-end sequences were available. In parallel, proteomic data from two-dimensional polyacrylamide gel electrophoresis in various tissues and developmental stages were compiled into a silkworm proteome database. Finally, a Bombyx trap database was constructed for documenting insertion positions and expression data of transposon insertion lines. Conclusion For efficient usage of genome information for functional studies, genomic sequences, physical and genetic map information and EST data were compiled into KAIKObase, an integrated silkworm genome database which consists of 4 map viewers, a gene viewer, and sequence, keyword and position search systems to display results and data at the level of nucleotide sequence, gene, scaffold and chromosome. Integration of the

  15. i-Genome: A database to summarize oligonucleotide data in genomes

    Directory of Open Access Journals (Sweden)

    Chang Yu-Chung

    2004-10-01

    Full Text Available Abstract Background Information on the occurrence of sequence features in genomes is crucial to comparative genomics, evolutionary analysis, the analyses of regulatory sequences and the quantitative evaluation of sequences. Computing the frequencies and the occurrences of a pattern in complete genomes is time-consuming. Results The proposed database provides information about sequence features generated by exhaustively computing the sequences of the complete genome. The repetitive elements in the eukaryotic genomes, such as LINEs, SINEs, Alu and LTR, are obtained from Repbase. The database supports various complete genomes including human, yeast, worm, and 128 microbial genomes. Conclusions This investigation presents and implements an efficiently computational approach to accumulate the occurrences of the oligonucleotides or patterns in complete genomes. A database is established to maintain the information of the sequence features, including the distributions of oligonucleotide, the gene distribution, the distribution of repetitive elements in genomes and the occurrences of the oligonucleotides. The database can provide more effective and efficient way to access the repetitive features in genomes.

  16. BBGD: an online database for blueberry genomic data

    Directory of Open Access Journals (Sweden)

    Matthews Benjamin F

    2007-01-01

    Full Text Available Abstract Background Blueberry is a member of the Ericaceae family, which also includes closely related cranberry and more distantly related rhododendron, azalea, and mountain laurel. Blueberry is a major berry crop in the United States, and one that has great nutritional and economical value. Extreme low temperatures, however, reduce crop yield and cause major losses to US farmers. A better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation is needed to produce blueberry cultivars with enhanced cold hardiness. To that end, the blueberry genomics database (BBDG was developed. Along with the analysis tools and web-based query interfaces, the database serves both the broader Ericaceae research community and the blueberry research community specifically by making available ESTs and gene expression data in searchable formats and in elucidating the underlying mechanisms of cold acclimation and freeze tolerance in blueberry. Description BBGD is the world's first database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. Conclusion By using the database, researchers have developed EST-based markers for mapping and have identified a number of "candidate" cold tolerance genes that are highly expressed in blueberry flower buds after exposure to low temperatures.

  17. BBGD: an online database for blueberry genomic data.

    Science.gov (United States)

    Alkharouf, Nadim W; Dhanaraj, Anik L; Naik, Dhananjay; Overall, Chris; Matthews, Benjamin F; Rowland, Lisa J

    2007-01-30

    Blueberry is a member of the Ericaceae family, which also includes closely related cranberry and more distantly related rhododendron, azalea, and mountain laurel. Blueberry is a major berry crop in the United States, and one that has great nutritional and economical value. Extreme low temperatures, however, reduce crop yield and cause major losses to US farmers. A better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation is needed to produce blueberry cultivars with enhanced cold hardiness. To that end, the blueberry genomics database (BBDG) was developed. Along with the analysis tools and web-based query interfaces, the database serves both the broader Ericaceae research community and the blueberry research community specifically by making available ESTs and gene expression data in searchable formats and in elucidating the underlying mechanisms of cold acclimation and freeze tolerance in blueberry. BBGD is the world's first database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. By using the database, researchers have developed EST-based markers for mapping and have identified a number of "candidate" cold tolerance genes that are highly expressed in blueberry flower buds after exposure to low temperatures.

  18. BRAD, the genetics and genomics database for Brassica plants

    Directory of Open Access Journals (Sweden)

    Li Pingxia

    2011-10-01

    Full Text Available Abstract Background Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data. Description BRAD, the Brassica database, is a web-based resource focusing on genome scale genetic and genomic data for important Brassica crops. BRAD was built based on the first whole genome sequence and on further data analysis of the Brassica A genome species, Brassica rapa (Chiifu-401-42. It provides datasets, such as the complete genome sequence of B. rapa, which was de novo assembled from Illumina GA II short reads and from BAC clone sequences, predicted genes and associated annotations, non coding RNAs, transposable elements (TE, B. rapa genes' orthologous to those in A. thaliana, as well as genetic markers and linkage maps. BRAD offers useful searching and data mining tools, including search across annotation datasets, search for syntenic or non-syntenic orthologs, and to search the flanking regions of a certain target, as well as the tools of BLAST and Gbrowse. BRAD allows users to enter almost any kind of information, such as a B. rapa or A. thaliana gene ID, physical position or genetic marker. Conclusion BRAD, a new database which focuses on the genetics and genomics of the Brassica plants has been developed, it aims at helping scientists and breeders to fully and efficiently use the information of genome data of Brassica plants. BRAD will be continuously updated and can be accessed through http://brassicadb.org.

  19. A primer on rapid prototyping of genomic databases in Prolog

    Energy Technology Data Exchange (ETDEWEB)

    Yoshida, Kaoru; Smith, C.L. [Lawrence Berkeley Lab., CA (United States); Overbeek, R. [Argonne National Lab., IL (United States). Mathematics and Computer Science Div.

    1992-01-01

    This report presents a tutorial on how one might create an integrated database of genomic information. We outline the required steps for implementation, give a brief introduction to Prolog, and discuss the query facility supported by our system. Our goal is to enable researchers to being constructing their own biological information system.

  20. Tripal: a construction toolkit for online genome databases.

    Science.gov (United States)

    Ficklin, Stephen P; Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net.

  1. Human-mouse comparative genomics: successes and failures to reveal functional regions of the human genome

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Baroukh, Nadine; Rubin, Edward M.

    2003-05-15

    Deciphering the genetic code embedded within the human genome remains a significant challenge despite the human genome consortium's recent success at defining its linear sequence (Lander et al. 2001; Venter et al. 2001). While useful strategies exist to identify a large percentage of protein encoding regions, efforts to accurately define functional sequences in the remaining {approx}97 percent of the genome lag. Our primary interest has been to utilize the evolutionary relationship and the universal nature of genomic sequence information in vertebrates to reveal functional elements in the human genome. This has been achieved through the combined use of vertebrate comparative genomics to pinpoint highly conserved sequences as candidates for biological activity and transgenic mouse studies to address the functionality of defined human DNA fragments. Accordingly, we describe strategies and insights into functional sequences in the human genome through the use of comparative genomics coupled wit h functional studies in the mouse.

  2. DemaDb: an integrated dematiaceous fungal genomes database.

    Science.gov (United States)

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my.

  3. A comparative encyclopedia of DNA elements in the mouse genome.

    Science.gov (United States)

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

    2014-11-20

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

  4. A Comparative Encyclopedia of DNA Elements in the Mouse Genome

    Science.gov (United States)

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing

    2014-01-01

    Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824

  5. EuPathDB: the eukaryotic pathogen genomics database resource

    Science.gov (United States)

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  6. EuPathDB: the eukaryotic pathogen genomics database resource.

    Science.gov (United States)

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y; Brestelli, John; Brunk, Brian P; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C; Lawrence, Cris; Li, Wei; Pinney, Deborah F; Pulman, Jane A; Roos, David S; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-04

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.

  7. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    Science.gov (United States)

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  8. The Genome Database for Rosaceae (GDR): year 10 update.

    Science.gov (United States)

    Jung, Sook; Ficklin, Stephen P; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G; Mueller, Lukas A; Olmstead, Mercy A; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making.

  9. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    Science.gov (United States)

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.

  10. The Genome Database for Rosaceae (GDR): year 10 update

    Science.gov (United States)

    Jung, Sook; Ficklin, Stephen P.; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G.; Mueller, Lukas A.; Olmstead, Mercy A.; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making. PMID:24225320

  11. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  12. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    Science.gov (United States)

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/.

  13. Endonucleases: new tools to edit the mouse genome.

    Science.gov (United States)

    Wijshake, Tobias; Baker, Darren J; van de Sluis, Bart

    2014-10-01

    Mouse transgenesis has been instrumental in determining the function of genes in the pathophysiology of human diseases and modification of genes by homologous recombination in mouse embryonic stem cells remains a widely used technology. However, this approach harbors a number of disadvantages, as it is time-consuming and quite laborious. Over the last decade a number of new genome editing technologies have been developed, including zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas). These systems are characterized by a designed DNA binding protein or RNA sequence fused or co-expressed with a non-specific endonuclease, respectively. The engineered DNA binding protein or RNA sequence guides the nuclease to a specific target sequence in the genome to induce a double strand break. The subsequent activation of the DNA repair machinery then enables the introduction of gene modifications at the target site, such as gene disruption, correction or insertion. Nuclease-mediated genome editing has numerous advantages over conventional gene targeting, including increased efficiency in gene editing, reduced generation time of mutant mice, and the ability to mutagenize multiple genes simultaneously. Although nuclease-driven modifications in the genome are a powerful tool to generate mutant mice, there are concerns about off-target cleavage, especially when using the CRISPR/Cas system. Here, we describe the basic principles of these new strategies in mouse genome manipulation, their inherent advantages, and their potential disadvantages compared to current technologies used to study gene function in mouse models. This article is part of a Special Issue entitled: From Genome to Function.

  14. Addition of a breeding database in the Genome Database for Rosaceae.

    Science.gov (United States)

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  15. Mouse genome engineering using designer nucleases

    OpenAIRE

    Hermann, Mario; Cermak, Tomas; Daniel F Voytas; Pelczar, Pawel

    2014-01-01

    Transgenic mice carrying site-specific genome modifications (knockout, knock-in) are of vital importance for dissecting complex biological systems as well as for modeling human diseases and testing therapeutic strategies. Recent advances in the use of designer nucleases such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) 9 system for site-specific geno...

  16. Human and mouse genome analysis using array comparative genomic hybridization

    NARCIS (Netherlands)

    Snijders, Antoine Maria

    2004-01-01

    Almost all human cancers as well as developmental abnormalities are characterized by the presence of genetic alterations, most of which target a gene or a particular genomic locus resulting in altered gene expression and ultimately an altered phenotype. Different types of genetic alterations include

  17. GDR (Genome Database for Rosaceae: integrated web resources for Rosaceae genomics and genetics research

    Directory of Open Access Journals (Sweden)

    Ficklin Stephen

    2004-09-01

    Full Text Available Abstract Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  18. CAGE: A Database of Cancer Genes of Human, Mouse and Rat

    Directory of Open Access Journals (Sweden)

    Sana Khalid

    2011-11-01

    Full Text Available CAGE is the database of cancer genes of human, mouse and rat. We have designed PCR oligonucleotide primer sequences for each gene, with their features and conditions given. This feature alone greatly facilitates researchers in PCR amplification of genes sequences, especially in cloning experiments. Currently it encompasses more than 1000 nucleotide entries. Flexible database design, easy expandability, and easy retrieval of information are the main features of this database. The Database is publicly available at cgdb.pakbiz.org.

  19. The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology.

    Science.gov (United States)

    Wang, Dapeng; Xia, Yan; Li, Xinna; Hou, Lixia; Yu, Jun

    2013-01-01

    Over the past 10 years, genomes of cultivated rice cultivars and their wild counterparts have been sequenced although most efforts are focused on genome assembly and annotation of two major cultivated rice (Oryza sativa L.) subspecies, 93-11 (indica) and Nipponbare (japonica). To integrate information from genome assemblies and annotations for better analysis and application, we now introduce a comparative rice genome database, the Rice Genome Knowledgebase (RGKbase, http://rgkbase.big.ac.cn/RGKbase/). RGKbase is built to have three major components: (i) integrated data curation for rice genomics and molecular biology, which includes genome sequence assemblies, transcriptomic and epigenomic data, genetic variations, quantitative trait loci (QTLs) and the relevant literature; (ii) User-friendly viewers, such as Gbrowse, GeneBrowse and Circos, for genome annotations and evolutionary dynamics and (iii) Bioinformatic tools for compositional and synteny analyses, gene family classifications, gene ontology terms and pathways and gene co-expression networks. RGKbase current includes data from five rice cultivars and species: Nipponbare (japonica), 93-11 (indica), PA64s (indica), the African rice (Oryza glaberrima) and a wild rice species (Oryza brachyantha). We are also constantly introducing new datasets from variety of public efforts, such as two recent releases-sequence data from ∼1000 rice varieties, which are mapped into the reference genome, yielding ample high-quality single-nucleotide polymorphisms and insertions-deletions.

  20. Exploring Protein Function Using the Saccharomyces Genome Database.

    Science.gov (United States)

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  1. The integrated web service and genome database for agricultural plants with biotechnology information

    Science.gov (United States)

    Kim, ChangKug; Park, DongSuk; Seol, YoungJoo; Hahn, JangHo

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage. PMID:21887015

  2. MaizeGDB: The Maize Genetics and Genomics Database.

    Science.gov (United States)

    Harper, Lisa; Gardiner, Jack; Andorf, Carson; Lawrence, Carolyn J

    2016-01-01

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website ( http://www.maizegdb.org ) are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. In addition, pre-compiled reports are made available for particular types of data and bulletin boards are provided to facilitate communication and coordination among members of the community of maize geneticists.

  3. TOPSAN: a dynamic web database for structural genomics.

    Science.gov (United States)

    Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

    2011-01-01

    The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.

  4. The catfish genome database cBARBEL: an informatic platform for genome biology of ictalurid catfish.

    Science.gov (United States)

    Lu, Jianguo; Peatman, Eric; Yang, Qing; Wang, Shaolin; Hu, Zhiliang; Reecy, James; Kucuktas, Huseyin; Liu, Zhanjiang

    2011-01-01

    The catfish genome database, cBARBEL (abbreviated from catfish Breeder And Researcher Bioinformatics Entry Location) is an online open-access database for genome biology of ictalurid catfish (Ictalurus spp.). It serves as a comprehensive, integrative platform for all aspects of catfish genetics, genomics and related data resources. cBARBEL provides BLAST-based, fuzzy and specific search functions, visualization of catfish linkage, physical and integrated maps, a catfish EST contig viewer with SNP information overlay, and GBrowse-based organization of catfish genomic data based on sequence similarity with zebrafish chromosomes. Subsections of the database are tightly related, allowing a user with a sequence or search string of interest to navigate seamlessly from one area to another. As catfish genome sequencing proceeds and ongoing quantitative trait loci (QTL) projects bear fruit, cBARBEL will allow rapid data integration and dissemination within the catfish research community and to interested stakeholders. cBARBEL can be accessed at http://catfishgenome.org.

  5. Biological Database of Images and Genomes: tools for community annotations linking image and genomic information

    Science.gov (United States)

    Oberlin, Andrew T; Jurkovic, Dominika A; Balish, Mitchell F; Friedberg, Iddo

    2013-01-01

    Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype–genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management and web services. By using BioDIG to create websites, researchers and curators can rapidly annotate a large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of mycoplasmas. Database URL: BioDIG website: http://biodig.org BioDIG source code repository: http://github.com/FriedbergLab/BioDIG The MyDIG database: http://mydig.biodig.org/ PMID:23550062

  6. Genomic responses in mouse models poorly mimic human inflammatory diseases.

    Science.gov (United States)

    Seok, Junhee; Warren, H Shaw; Cuenca, Alex G; Mindrinos, Michael N; Baker, Henry V; Xu, Weihong; Richards, Daniel R; McDonald-Smith, Grace P; Gao, Hong; Hennessy, Laura; Finnerty, Celeste C; López, Cecilia M; Honari, Shari; Moore, Ernest E; Minei, Joseph P; Cuschieri, Joseph; Bankey, Paul E; Johnson, Jeffrey L; Sperry, Jason; Nathens, Avery B; Billiar, Timothy R; West, Michael A; Jeschke, Marc G; Klein, Matthew B; Gamelli, Richard L; Gibran, Nicole S; Brownstein, Bernard H; Miller-Graziano, Carol; Calvano, Steve E; Mason, Philip H; Cobb, J Perren; Rahme, Laurence G; Lowry, Stephen F; Maier, Ronald V; Moldawer, Lyle L; Herndon, David N; Davis, Ronald W; Xiao, Wenzhong; Tompkins, Ronald G

    2013-02-26

    A cornerstone of modern biomedical research is the use of mouse models to explore basic pathophysiological mechanisms, evaluate new therapeutic approaches, and make go or no-go decisions to carry new drug candidates forward into clinical trials. Systematic studies evaluating how well murine models mimic human inflammatory diseases are nonexistent. Here, we show that, although acute inflammatory stresses from different etiologies result in highly similar genomic responses in humans, the responses in corresponding mouse models correlate poorly with the human conditions and also, one another. Among genes changed significantly in humans, the murine orthologs are close to random in matching their human counterparts (e.g., R(2) between 0.0 and 0.1). In addition to improvements in the current animal model systems, our study supports higher priority for translational medical research to focus on the more complex human conditions rather than relying on mouse models to study human inflammatory diseases.

  7. Genomic responses in mouse models poorly mimic human inflammatory diseases

    Science.gov (United States)

    Seok, Junhee; Warren, H. Shaw; Cuenca, Alex G.; Mindrinos, Michael N.; Baker, Henry V.; Xu, Weihong; Richards, Daniel R.; McDonald-Smith, Grace P.; Gao, Hong; Hennessy, Laura; Finnerty, Celeste C.; López, Cecilia M.; Honari, Shari; Moore, Ernest E.; Minei, Joseph P.; Cuschieri, Joseph; Bankey, Paul E.; Johnson, Jeffrey L.; Sperry, Jason; Nathens, Avery B.; Billiar, Timothy R.; West, Michael A.; Jeschke, Marc G.; Klein, Matthew B.; Gamelli, Richard L.; Gibran, Nicole S.; Brownstein, Bernard H.; Miller-Graziano, Carol; Calvano, Steve E.; Mason, Philip H.; Cobb, J. Perren; Rahme, Laurence G.; Lowry, Stephen F.; Maier, Ronald V.; Moldawer, Lyle L.; Herndon, David N.; Davis, Ronald W.; Xiao, Wenzhong; Tompkins, Ronald G.; Abouhamze, Amer; Balis, Ulysses G. J.; Camp, David G.; De, Asit K.; Harbrecht, Brian G.; Hayden, Douglas L.; Kaushal, Amit; O’Keefe, Grant E.; Kotz, Kenneth T.; Qian, Weijun; Schoenfeld, David A.; Shapiro, Michael B.; Silver, Geoffrey M.; Smith, Richard D.; Storey, John D.; Tibshirani, Robert; Toner, Mehmet; Wilhelmy, Julie; Wispelwey, Bram; Wong, Wing H

    2013-01-01

    A cornerstone of modern biomedical research is the use of mouse models to explore basic pathophysiological mechanisms, evaluate new therapeutic approaches, and make go or no-go decisions to carry new drug candidates forward into clinical trials. Systematic studies evaluating how well murine models mimic human inflammatory diseases are nonexistent. Here, we show that, although acute inflammatory stresses from different etiologies result in highly similar genomic responses in humans, the responses in corresponding mouse models correlate poorly with the human conditions and also, one another. Among genes changed significantly in humans, the murine orthologs are close to random in matching their human counterparts (e.g., R2 between 0.0 and 0.1). In addition to improvements in the current animal model systems, our study supports higher priority for translational medical research to focus on the more complex human conditions rather than relying on mouse models to study human inflammatory diseases. PMID:23401516

  8. A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses.

    Science.gov (United States)

    Kryukov, Kirill; Sumiyama, Kenta; Ikeo, Kazuho; Gojobori, Takashi; Saitou, Naruya

    2012-01-01

    Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1-4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/.

  9. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  10. Nuclear-like Seq in mt Genome - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us RMG Nuclear...-like Seq in mt Genome Data detail Data name Nuclear-like Seq in mt Genome Description of data co...t This Database Database Description Download License Update History of This Database Site Policy | Contact Us Nuclear-like Seq in mt Genome - RMG | LSDB Archive ...

  11. Accessing the SEED genome databases via Web services API: tools for programmers

    National Research Council Canada - National Science Library

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-01-01

    .... The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes...

  12. An MRI-based atlas and database of the developing mouse brain.

    Science.gov (United States)

    Chuang, Nelson; Mori, Susumu; Yamamoto, Akira; Jiang, Hangyi; Ye, Xin; Xu, Xin; Richards, Linda J; Nathans, Jeremy; Miller, Michael I; Toga, Arthur W; Sidman, Richard L; Zhang, Jiangyang

    2011-01-01

    The advent of mammalian gene engineering and genetically modified mouse models has led to renewed interest in developing resources for referencing and quantitative analysis of mouse brain anatomy. In this study, we used diffusion tensor imaging (DTI) for quantitative characterization of anatomical phenotypes in the developing mouse brain. As an anatomical reference for neuroscience research using mouse models, this paper presents DTI based atlases of ex vivo C57BL/6 mouse brains at several developmental stages. The atlas complements existing histology and MRI-based atlases by providing users access to three-dimensional, high-resolution images of the developing mouse brain, with distinct tissue contrasts and segmentations of major gray matter and white matter structures. The usefulness of the atlas and database was demonstrated by quantitative measurements of the development of major gray matter and white matter structures. Population average images of the mouse brain at several postnatal stages were created using large deformation diffeomorphic metric mapping and their anatomical variations were quantitatively characterized. The atlas and database enhance our ability to examine the neuroanatomy in normal or genetically engineered mouse strains and mouse models of neurological diseases.

  13. An MRI-based Atlas and Database of the Developing Mouse Brain

    Science.gov (United States)

    Chuang, Nelson; Mori, Susumu; Yamamoto, Akira; Jiang, Hangyi; Ye, Xin; Xu, Xin; Richards, Linda J.; Nathans, Jeremy; Miller, Michael I.; W.Toga, Arthur; Sidman, Richard L.; Zhang, Jiangyang

    2010-01-01

    The advent of mammalian gene engineering and genetically modified mouse models has led to renewed interest in developing resources for referencing and quantitative analysis of mouse brain anatomy. In this study, we used diffusion tensor imaging (DTI) for quantitative characterization of anatomical phenotypes in the developing mouse brain. As an anatomical reference for neuroscience research using mouse models, this paper presents DTI based atlases of ex vivo C57BL/6 mouse brains at several developmental stages. The atlas complements existing histology and MRI-based atlases by providing users access to three-dimensional, high-resolution images of the developing mouse brain, with distinct tissue contrasts and segmentations of major gray matter and white matter structures. The usefulness of the atlas and database was demonstrated by quantitative measurements of the development of major gray matter and white matter structures. Population average images of the mouse brain at several postnatal stages were created using large deformation diffeomorphic metric mapping and their anatomical variations were quantitatively characterized. The atlas and database enhance our ability to examine the neuroanatomy in normal or genetically engineered mouse strains and mouse models of neurological diseases. PMID:20656042

  14. Genomics and Public Health Research: Can the State Allow Access to Genomic Databases?

    Directory of Open Access Journals (Sweden)

    M Stanton Jean

    2012-04-01

    Full Text Available Because many diseases are multifactorial disorders,the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State's role and authority on matters related to public health,in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec. Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec,as examples of key tools framing public health decision-making process.We observed that State powers in public health, are not,in Quebec,well adapted to the expansion of genomics research.We propose that the scope of the concept of research in public health should be clear and include the following characteristics:a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up,consent, etc..We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters.

  15. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States)]|[Lawrence Berkeley Lab., CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  16. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  17. Analysis of segmental duplications, mouse genome synteny and recurrent cancer-associated amplicons in human chromosome 6p21-p12.

    Science.gov (United States)

    Martin, J W; Yoshimoto, M; Ludkovski, O; Thorner, P S; Zielenska, M; Squire, J A; Nuin, P A S

    2010-06-01

    It has been proposed that regions of microhomology in the human genome could facilitate genomic rearrangements, copy number transitions, and rapid genomic change during tumor progression. To investigate this idea, this study examines the role of repetitive sequence elements, and corresponding syntenic mouse genomic features, in targeting cancer-associated genomic instability of specific regions of the human genome. Automated database-mining algorithms designed to search for frequent copy number transitions and genomic breakpoints were applied to 2 publicly-available online databases and revealed that 6p21-p12 is one of the regions of the human genome most frequently involved in tumor-specific alterations. In these analyses, 6p21-p12 exhibited the highest frequency of genomic amplification in osteosarcomas. Analysis of repetitive elements in regions of homology between human chromosome 6p and the syntenic regions of the mouse genome revealed a strong association between the location of segmental duplications greater than 5 kilobase-pairs and the position of discontinuities at the end of the syntenic region. The presence of clusters of segmental duplications flanking these syntenic regions also correlated with a high frequency of amplification and genomic alteration. Collectively, the experimental findings, in silico analyses, and comparative genomic studies presented here suggest that segmental duplications may facilitate cancer-associated copy number transitions and rearrangements at chromosome 6p21-p12. This process may involve homology-dependent DNA recombination and/or repair, which may also contribute towards the overall plasticity of the human genome.

  18. DG-CST (Disease Gene Conserved Sequence Tags), a database of human–mouse conserved elements associated to disease genes

    Science.gov (United States)

    Boccia, Angelo; Petrillo, Mauro; di Bernardo, Diego; Guffanti, Alessandro; Mignone, Flavio; Confalonieri, Stefano; Luzi, Lucilla; Pesole, Graziano; Paolella, Giovanni; Ballabio, Andrea; Banfi, Sandro

    2005-01-01

    The identification and study of evolutionarily conserved genomic sequences that surround disease-related genes is a valuable tool to gain insight into the functional role of these genes and to better elucidate the pathogenetic mechanisms of disease. We created the DG-CST (Disease Gene Conserved Sequence Tags) database for the identification and detailed annotation of human–mouse conserved genomic sequences that are localized within or in the vicinity of human disease-related genes. CSTs are defined as sequences that show at least 70% identity between human and mouse over a length of at least 100 bp. The database contains CST data relative to over 1088 genes responsible for monogenetic human genetic diseases or involved in the susceptibility to multifactorial/polygenic diseases. DG-CST is accessible via the internet at http://dgcst.ceinge.unina.it/ and may be searched using both simple and complex queries. A graphic browser allows direct visualization of the CSTs and related annotations within the context of the relative gene and its transcripts. PMID:15608249

  19. Comparative Study of Apoptosis-related Gene Loci in Human, Mouse and Rat Genomes

    Institute of Scientific and Technical Information of China (English)

    Yan-Bin YIN; Yong ZHANG; Peng YU; Jing-Chu LUO; Ying JIANG; Song-Gang LI

    2005-01-01

    Many genes are involved in mammalian cell apoptosis pathway. These apoptosis genes often contain characteristic functional domains, and can be classified into at least 15 functional groups, according to previous reports. Using an integrated bioinformatics platform for motif or domain search from three public mammalian proteomes (International Protein Index database for human, mouse, and rat), we systematically cataloged all of the proteins involved in mammalian apoptosis pathway. By localizing those proteins onto the genomes, we obtained a gene locus centric apoptosis gene catalog for human, mouse and rat.Further phylogenetic analysis showed that most of the apoptosis related gene loci are conserved among these three mammals. Interestingly, about one-third of apoptosis gene loci form gene clusters on mammal chromosomes, and exist in the three species, which indicated that mammalian apoptosis gene orders are also conserved. In addition, some tandem duplicated gene loci were revealed by comparing gene loci clusters in the three species. All data produced in this work were stored in a relational database and may be viewed at http://pcas.cbi.pku.edu.cn/database/apd.php.

  20. Database Description - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available d and funding Name: Database Integration Coordination Program (FY2011-FY2013) Integration of plant databases...ency (JST) Reference(s) Article title: Plant Genome DataBase Japan (PGDBj): A Portal Website for the Integ...ration of Plant Genome-Related Databases Author name(s): Erika Asamizu, Hisako Ichi

  1. License - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available TMBETA-GENOME License License to Use This Database Last updated : 2015/03/09 You may use this database in co...ms regarding the use of this database and the requirements you must follow in using this database.... The license for this database is specified in the Creative Commons Attribution-Share Alike... 2.1 Japan . If you use data from this database, please be sure attribute this database as follows: TMBETA-G...ummary of the Creative Commons Attribution-Share Alike 2.1 Japan is found here . With regard to this database

  2. Genome analysis methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGDBj Registered...ear Year of genome analysis Sequencing method Sequencing method Read counts Read counts Covered genome region Covered...otation method Number of predicted genes Number of predicted genes Genome database Genome database informati... License Update History of This Database Site Policy | Contact Us Genome analysis... methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  3. Where in the genome are we? A cautionary tale of database use in genomics research.

    Directory of Open Access Journals (Sweden)

    Laura Kelly eVaughan

    2013-03-01

    Full Text Available With the advent of high throughput data genomic technologies the volume of available data is now staggering. In addition databases that provide resources to annotate, translate and connect biological data have grown exponentially in content and use. The availability of such data emphasizes the importance of bioinformatics and computational biology in genomics research and has led to the development of thousands of tools to integrate and utilize these resources. When utilizing such resources, the principles of reproducible research are often overlooked. In this manuscript we provide selected case studies illustrating issues that may arise while working with genes and genetic polymorphisms. These case studies illustrate potential sources of error which can be introduced if the practices of reproducible research are not employed and non-concurrent databases are used. We also show examples of a lack of transparency when these databases are concerned when using popular bioinformatics tools. These examples highlight that resources are constantly evolving, and in order to provide reproducible results, research should be aware of and connected to the correct release of the data, particularly when implementing computational tools.

  4. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  5. MELOGEN: an EST database for melon functional genomics

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2007-09-01

    Full Text Available Abstract Background Melon (Cucumis melo L. is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs and 10,614 unclustered sequences (singletons. Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs and 356 single nucleotide polymorphisms (SNPs were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of

  6. Whole genome sequence analysis of the TALLYHO/Jng mouse.

    Science.gov (United States)

    Denvir, James; Boskovic, Goran; Fan, Jun; Primerano, Donald A; Parkman, Jacaline K; Kim, Jung Han

    2016-11-11

    The TALLYHO/Jng (TH) mouse is a polygenic model for obesity and type 2 diabetes first described in the literature in 2001. The origin of the TH strain is an outbred colony of the Theiler Original strain and mice derived from this source were selectively bred for male hyperglycemia establishing an inbred strain at The Jackson Laboratory. TH mice manifest many of the disease phenotypes observed in human obesity and type 2 diabetes. We sequenced the whole genome of TH mice maintained at Marshall University to a depth of approximately 64.8X coverage using data from three next generation sequencing runs. Genome-wide, we found approximately 4.31 million homozygous single nucleotide polymorphisms (SNPs) and 1.10 million homozygous small insertions and deletions (indels) of which 98,899 SNPs and 163,720 indels were unique to the TH strain compared to 28 previously sequenced inbred mouse strains. In order to identify potentially clinically-relevant genes, we intersected our list of SNP and indel variants with human orthologous genes in which variants were associated in GWAS studies with obesity, diabetes, and metabolic syndrome, and with genes previously shown to confer a monogenic obesity phenotype in humans, and found several candidate variants that could be functionally tested using TH mice. Further, we filtered our list of variants to those occurring in an obesity quantitative trait locus, tabw2, identified in TH mice and found a missense polymorphism in the Cidec gene and characterized this variant's effect on protein function. We generated a complete catalog of variants in TH mice using the data from whole genome sequencing. Our findings will facilitate the identification of causal variants that underlie metabolic diseases in TH mice and will enable identification of candidate susceptibility genes for complex human obesity and type 2 diabetes.

  7. mouseTube – a database to collaboratively unravel mouse ultrasonic communication [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Nicolas Torquet

    2016-09-01

    Full Text Available Ultrasonic vocalisation is a broadly used proxy to evaluate social communication in mouse models of neuropsychiatric disorders. The efficacy and robustness of testing these models suffer from limited knowledge of the structure and functions of these vocalisations as well as of the way to analyse the data. We created mouseTube, an open database with a web interface, to facilitate sharing and comparison of ultrasonic vocalisations data and metadata attached to a recording file. Metadata describe 1 the acquisition procedure, e.g., hardware, software, sampling frequency, bit depth; 2 the biological protocol used to elicit ultrasonic vocalisations; 3 the characteristics of the individual emitting ultrasonic vocalisations (e.g., strain, sex, age. To promote open science and enable reproducibility, data are made freely available. The website provides searching functions to facilitate the retrieval of recording files of interest. It is designed to enable comparisons of ultrasonic vocalisation emission between strains, protocols or laboratories, as well as to test different analysis algorithms and to search for protocols established to elicit mouse ultrasonic vocalisations. Over the long term, users will be able to download and compare different analysis results for each data file. Such application will boost the knowledge on mouse ultrasonic communication and stimulate sharing and comparison of automatic analysis methods to refine phenotyping techniques in mouse models of neuropsychiatric disorders.

  8. Genome annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available English ]; } else { document.getElementById(lang).innerHTML= '[ Japanese | English ]'; } } window.onload = ...e entry and the word BAC, PAC, chromosome Genomic, or Genomic sequence is included in the entry. Number of d

  9. Heterogeneity in rates of recombination across the mouse genome

    Energy Technology Data Exchange (ETDEWEB)

    Nachman, M.W.; Churchill, G.A. [Cornell Univ., Ithaca, NY (United States)

    1996-02-01

    If loci are randomly distributed on a physical map, the density of markers on a genetic map will be inversely proportional to recombination rate. First proposed by Mary Lyon, we have used this idea to estimate recombination rates from the Drosophila melanogaster linkage map. These results were compared with results of two other studies that estimated regional recombination rates in D. melanogaster using both physical and genetic maps. The three methods were largely concordant in identifying large-scale genomic patterns of recombination. The marker density method was then applied to the Mus musculus microsatellite linkage map. The distribution of microsatellites provided evidence for heterogeneity in recombination rates. Centromeric regions for several mouse chromosomes had significantly greater numbers of markers than expected, suggesting that recombination rates were lower in these regions. In contrast, most telomeric regions contained significantly fewer markers than expected. This indicates that recombination rates are elevated at the telomeres of many mouse chromosomes and is consistent with a comparison of the genetic and cytogenetic maps in these regions. The density of markers on a genetic map may provide a generally useful way to estimate regional recombination rates in species for which genetic, but not physical, maps are available. 44 refs., 5 figs., 4 tabs.

  10. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    Science.gov (United States)

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.

  11. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  12. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    NARCIS (Netherlands)

    S.L. Macrae (Sheila L.); Q. Zhang (Quanwei); C. Lemetre (Christophe); I. Seim (Inge); R.B. Calder (Robert B.); J.H.J. Hoeijmakers (Jan); Y. Suh (Yousin); V.N. Gladyshev (Vadim N.); A. Seluanov (Andrei); V. Gorbunova (Vera); J. Vijg (Jan); Z.D. Zhang (Zhengdong D.)

    2015-01-01

    textabstractGenome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM g

  13. Download - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us TMBETA...Update History of This Database Site Policy | Contact Us Download - TMBETA-GENOME | LSDB Archive ...

  14. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

    Science.gov (United States)

    Wiley, Laura K; Sivley, R Michael; Bush, William S

    2013-01-01

    Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

  15. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    Science.gov (United States)

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  16. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  17. Selective binding of specific mouse genomic DNA fragments by mouse vimentin filaments in vitro.

    Science.gov (United States)

    Wang, X; Tolstonog, G; Shoeman, R L; Traub, P

    1996-03-01

    Mouse vimentin intermediate filaments (IFs) reconstituted in vitro were analyzed for their capacity to select certain DNA sequences from a mixture of about 500-bp-long fragments of total mouse genomic DNA. The fragments preferentially bound by the IFs and enriched by several cycles of affinity binding and polymerase chain reaction (PCR) amplification were cloned and sequenced. In general, they were G-rich and highly repetitive in that they often contained Gn, (GT)n, and (GA)n repeat elements. Other, more complex repeat sequences were identified as well. Apart from the capacity to adopt a Z-DNA and triple helix configuration under superhelical tension, many fragments were potentially able to form cruciform structures and contained consensus binding sites for various transcription factors. All of these sequence elements are known to occur in introns and 5'/3'-flanking regions of genes and to play roles in DNA transcription, recombination and replication. A FASTA search of the EMBL data bank indeed revealed that sequences homologous to the mouse repetitive DNA fragments are commonly associated with gene-regulatory elements. Unexpectedly, vimentin IFs also bound a large number of apparently overlapping, AT-rich DNA fragments that could be aligned into a composite sequence highly homologous to the 234-bp consensus centromere repeat sequence of gamma-satellite DNA. Previous experiments have shown a high affinity of vimentin for G-rich, repetitive telomere DNA sequences, superhelical DNA, and core histones. Taken together, these data support the hypothesis that, after penetration of the double nuclear membrane via an as yet unidentified mechanism, vimentin IFs cooperatively fix repetitive DNA sequence elements in a differentiation-specific manner in the nuclear periphery subjacent to the nuclear lamina and thus participate in the organization of chromatin and in the control of transcription, replication, and recombination processes. This includes aspects of global

  18. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    Science.gov (United States)

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  19. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    Science.gov (United States)

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  20. Database of Periodic DNA Regions in Major Genomes

    Directory of Open Access Journals (Sweden)

    Felix E. Frenkel

    2017-01-01

    Full Text Available Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial, C. elegans, D. melanogaster, and A. thaliana genomes.

  1. Database of Periodic DNA Regions in Major Genomes

    Science.gov (United States)

    2017-01-01

    Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes. PMID:28182099

  2. Dioxin induces genomic instability in mouse embryonic fibroblasts.

    Directory of Open Access Journals (Sweden)

    Merja Korkalainen

    Full Text Available Ionizing radiation and certain other exposures have been shown to induce genomic instability (GI, i.e., delayed genetic damage observed many cell generations later in the progeny of the exposed cells. The aim of this study was to investigate induction of GI by a nongenotoxic carcinogen, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD. Mouse embryonic fibroblasts (C3H10T1/2 were exposed to 1, 10 or 100 nM TCDD for 2 days. Micronuclei (MN and expression of selected cancer-related genes were assayed both immediately and at a delayed point in time (8 days. For comparison, similar experiments were done with cadmium, a known genotoxic agent. TCDD treatment induced an elevated frequency of MN at 8 days, but not directly after the exposure. TCDD-induced alterations in gene expression were also mostly delayed, with more changes observed at 8 days than at 2 days. Exposure to cadmium produced an opposite pattern of responses, with pronounced effects immediately after exposure but no increase in MN and few gene expression changes at 8 days. Although all responses to TCDD alone were delayed, menadione-induced DNA damage (measured by the Comet assay, was found to be increased directly after a 2-day TCDD exposure, indicating that the stability of the genome was compromised already at this time point. The results suggested a flat dose-response relationship consistent with dose-response data reported for radiation-induced GI. These findings indicate that TCDD, although not directly genotoxic, induces GI, which is associated with impaired DNA damage response.

  3. Bisphenol a exposure disrupts genomic imprinting in the mouse.

    Directory of Open Access Journals (Sweden)

    Martha Susiarjo

    2013-04-01

    Full Text Available Exposure to endocrine disruptors is associated with developmental defects. One compound of concern, to which humans are widely exposed, is bisphenol A (BPA. In model organisms, BPA exposure is linked to metabolic disorders, infertility, cancer, and behavior anomalies. Recently, BPA exposure has been linked to DNA methylation changes, indicating that epigenetic mechanisms may be relevant. We investigated effects of exposure on genomic imprinting in the mouse as imprinted genes are regulated by differential DNA methylation and aberrant imprinting disrupts fetal, placental, and postnatal development. Through allele-specific and quantitative real-time PCR analysis, we demonstrated that maternal BPA exposure during late stages of oocyte development and early stages of embryonic development significantly disrupted imprinted gene expression in embryonic day (E 9.5 and 12.5 embryos and placentas. The affected genes included Snrpn, Ube3a, Igf2, Kcnq1ot1, Cdkn1c, and Ascl2; mutations and aberrant regulation of these genes are associated with imprinting disorders in humans. Furthermore, the majority of affected genes were expressed abnormally in the placenta. DNA methylation studies showed that BPA exposure significantly altered the methylation levels of differentially methylated regions (DMRs including the Snrpn imprinting control region (ICR and Igf2 DMR1. Moreover, exposure significantly reduced genome-wide methylation levels in the placenta, but not the embryo. Histological and immunohistochemical examinations revealed that these epigenetic defects were associated with abnormal placental development. In contrast to this early exposure paradigm, exposure outside of the epigenetic reprogramming window did not cause significant imprinting perturbations. Our data suggest that early exposure to common environmental compounds has the potential to disrupt fetal and postnatal health through epigenetic changes in the embryo and abnormal development of the

  4. The Ruby UCSC API: accessing the UCSC genome database using Ruby

    Directory of Open Access Journals (Sweden)

    Mishima Hiroyuki

    2012-09-01

    Full Text Available Abstract Background The University of California, Santa Cruz (UCSC genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser and several means for programmatic queries. A simple application programming interface (API in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby. Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.

  5. An integrated computational pipeline and database to support whole-genome sequence annotation.

    Science.gov (United States)

    Mungall, C J; Misra, S; Berman, B P; Carlson, J; Frise, E; Harris, N; Marshall, B; Shu, S; Kaminker, J S; Prochnik, S E; Smith, C D; Smith, E; Tupy, J L; Wiel, C; Rubin, G M; Lewis, S E

    2002-01-01

    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.

  6. Sputnik: a database platform for comparative plant genomics.

    Science.gov (United States)

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  7. Complete mitochondrial genome of the gray mouse lemur, Microcebus murinus (Primates, Cheirogaleidae).

    Science.gov (United States)

    Lecompte, Emilie; Crouau-Roy, Brigitte; Aujard, Fabienne; Holota, Hélène; Murienne, Jérôme

    2016-09-01

    We report the high-coverage complete mitochondrial genome sequence of the gray mouse lemur Microcebus murinus. The sequencing has been performed on an Illumina Hiseq 2500 platform, with a genome skimming strategy. The total length of this mitogenome is 16 963 bp, containing 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 non-coding region (D-loop region). The genome organization, nucleotide composition and codon usage are similar to those reported from other primate's mitochondrial genomes. The complete mitochondrial genome sequence reported here will be useful for comparative genomics studies in primates.

  8. Genome-wide assembly and analysis of alternative transcripts in mouse

    Science.gov (United States)

    Sharov, Alexei A.; Dudekula, Dawood B.; Ko, Minoru S.H.

    2005-01-01

    To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47%, 18%, and 14% of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays. PMID:15867436

  9. MIPS PlantsDB: a database framework for comparative plant genome research.

    Science.gov (United States)

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  10. Databases and web tools for cancer genomics study.

    Science.gov (United States)

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.

  11. Databases and Web Tools for Cancer Genomics Study

    Institute of Scientific and Technical Information of China (English)

    Yadong Yang; Xunong Dong; Bingbing Xie; Nan Ding; Juan Chen; Yongjun Li; Qian Zhang; Hongzhu Qu; Xiangdong Fang

    2015-01-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data com-prehensiveness, and user experience. The resources reviewed include data repository and analysis tools;and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.

  12. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  13. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    Science.gov (United States)

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  14. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    Directory of Open Access Journals (Sweden)

    Rodrigo Aniceto

    2015-01-01

    Full Text Available Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  15. CR Cistrome: a ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse.

    Science.gov (United States)

    Wang, Qixuan; Huang, Jinyan; Sun, Hanfei; Liu, Jing; Wang, Juan; Wang, Qian; Qin, Qian; Mei, Shenglin; Zhao, Chengchen; Yang, Xiaoqin; Liu, X Shirley; Zhang, Yong

    2014-01-01

    Diversified histone modifications (HMs) are essential epigenetic features. They play important roles in fundamental biological processes including transcription, DNA repair and DNA replication. Chromatin regulators (CRs), which are indispensable in epigenetics, can mediate HMs to adjust chromatin structures and functions. With the development of ChIP-Seq technology, there is an opportunity to study CR and HM profiles at the whole-genome scale. However, no specific resource for the integration of CR ChIP-Seq data or CR-HM ChIP-Seq linkage pairs is currently available. Therefore, we constructed the CR Cistrome database, available online at http://compbio.tongji.edu.cn/cr and http://cistrome.org/cr/, to further elucidate CR functions and CR-HM linkages. Within this database, we collected all publicly available ChIP-Seq data on CRs in human and mouse and categorized the data into four cohorts: the reader, writer, eraser and remodeler cohorts, together with curated introductions and ChIP-Seq data analysis results. For the HM readers, writers and erasers, we provided further ChIP-Seq analysis data for the targeted HMs and schematized the relationships between them. We believe CR Cistrome is a valuable resource for the epigenetics community.

  16. Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies.

    Science.gov (United States)

    Viennas, Emmanouil; Komianou, Angeliki; Mizzi, Clint; Stojiljkovic, Maja; Mitropoulou, Christina; Muilu, Juha; Vihinen, Mauno; Grypioti, Panagiota; Papadaki, Styliani; Pavlidis, Cristiana; Zukic, Branka; Katsila, Theodora; van der Spek, Peter J; Pavlovic, Sonja; Tzimas, Giannis; Patrinos, George P

    2017-01-04

    FINDbase (http://www.findbase.org) is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders and pharmacogenomics biomarkers. The database also records the incidence of rare genetic diseases in various populations, all in well-distinct data modules. Here, we report extensive data content updates in all data modules, with direct implications to clinical pharmacogenomics. Also, we report significant new developments in FINDbase, namely (i) the release of a new version of the ETHNOS software that catalyzes development curation of national/ethnic genetic databases, (ii) the migration of all FINDbase data content into 90 distinct national/ethnic mutation databases, all built around Microsoft's PivotViewer (http://www.getpivot.com) software (iii) new data visualization tools and (iv) the interrelation of FINDbase with DruGeVar database with direct implications in clinical pharmacogenomics. The abovementioned updates further enhance the impact of FINDbase, as a key resource for Genomic Medicine applications.

  17. FULL-GENOME ANALYSIS OF ALTERNATIVE SPLICING IN MOUSE LIVER AFTER HEPATOTOXICANT EXPOSURE

    Science.gov (United States)

    Alternative splicing plays a role in determining gene function and protein diversity. We have employed whole genome exon profiling using Affymetrix Mouse Exon 1.0 ST arrays to understand the significance of alternative splicing on a genome-wide scale in response to multiple toxic...

  18. Genomic and cDNA cloning of a novel mouse lipoxygenase gene

    NARCIS (Netherlands)

    Willems van Dijk, K.; Steketee, K.; Havekes, L.; Frants, R.; Hofker, M.

    1995-01-01

    A novel 12- and 15-lipoxygenase related gene was isolated from a mouse strain 129 genomic phage library in a screen with a human 15-lipoxygenase cDNA probe. The complete genomic sequence revealed 14 exons and 13 introns covering 7.3 kb of DNA. The splice junctions were verified from the cDNA

  19. Databases, models, and algorithms for functional genomics: a bioinformatics perspective.

    Science.gov (United States)

    Singh, Gautam B; Singh, Harkirat

    2005-02-01

    A variety of patterns have been observed on the DNA and protein sequences that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns discovered on biological sequences, they are generally cataloged and maintained within internationally shared databases. Furthermore,the variability in a family of observed patterns is often represented using computational models in order to facilitate their search within an uncharacterized biological sequence. As the biological data is comprised of a mosaic of sequence-levels motifs, it is significant to unravel the synergies of macromolecular coordination utilized in cell-specific differential synthesis of proteins. This article provides an overview of the various pattern representation methodologies and the surveys the pattern databases available for use to the molecular biologists. Our aim is to describe the principles behind the computational modeling and analysis techniques utilized in bioinformatics research, with the objective of providing insight necessary to better understand and effectively utilize the available databases and analysis tools. We also provide a detailed review of DNA sequence level patterns responsible for structural conformations within the Scaffold or Matrix Attachment Regions (S/MARs).

  20. CMD: a Cotton Microsatellite Database resource for Gossypium genomics

    Directory of Open Access Journals (Sweden)

    Liu Shaolin

    2006-05-01

    Full Text Available Abstract Background The Cotton Microsatellite Database (CMD http://www.cottonssr.org is a curated and integrated web-based relational database providing centralized access to publicly available cotton microsatellites, an invaluable resource for basic and applied research in cotton breeding. Description At present CMD contains publication, sequence, primer, mapping and homology data for nine major cotton microsatellite projects, collectively representing 5,484 microsatellites. In addition, CMD displays data for three of the microsatellite projects that have been screened against a panel of core germplasm. The standardized panel consists of 12 diverse genotypes including genetic standards, mapping parents, BAC donors, subgenome representatives, unique breeding lines, exotic introgression sources, and contemporary Upland cottons with significant acreage. A suite of online microsatellite data mining tools are accessible at CMD. These include an SSR server which identifies microsatellites, primers, open reading frames, and GC-content of uploaded sequences; BLAST and FASTA servers providing sequence similarity searches against the existing cotton SSR sequences and primers, a CAP3 server to assemble EST sequences into longer transcripts prior to mining for SSRs, and CMap, a viewer for comparing cotton SSR maps. Conclusion The collection of publicly available cotton SSR markers in a centralized, readily accessible and curated web-enabled database provides a more efficient utilization of microsatellite resources and will help accelerate basic and applied research in molecular breeding and genetic mapping in Gossypium spp.

  1. BambooGDB: a bamboo genome database with functional annotation and an analysis platform.

    Science.gov (United States)

    Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

    2014-01-01

    Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein-protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org.

  2. The mouse QTL map helps interpret human genome-wide association studies for HDL cholesterol.

    Science.gov (United States)

    Leduc, Magalie S; Lyons, Malcolm; Darvishi, Katayoon; Walsh, Kenneth; Sheehan, Susan; Amend, Sarah; Cox, Allison; Orho-Melander, Marju; Kathiresan, Sekar; Paigen, Beverly; Korstanje, Ron

    2011-06-01

    Genome-wide association (GWA) studies represent a powerful strategy for identifying susceptibility genes for complex diseases in human populations but results must be confirmed and replicated. Because of the close homology between mouse and human genomes, the mouse can be used to add evidence to genes suggested by human studies. We used the mouse quantitative trait loci (QTL) map to interpret results from a GWA study for genes associated with plasma HDL cholesterol levels. We first positioned single nucleotide polymorphisms (SNPs) from a human GWA study on the genomic map for mouse HDL QTL. We then used mouse bioinformatics, sequencing, and expression studies to add evidence for one well-known HDL gene (Abca1) and three newly identified genes (Galnt2, Wwox, and Cdh13), thus supporting the results of the human study. For GWA peaks that occur in human haplotype blocks with multiple genes, we examined the homologous regions in the mouse to prioritize the genes using expression, sequencing, and bioinformatics from the mouse model, showing that some genes were unlikely candidates and adding evidence for candidate genes Mvk and Mmab in one haplotype block and Fads1 and Fads2 in the second haplotype block. Our study highlights the value of mouse genetics for evaluating genes found in human GWA studies.

  3. Lineage-specific biology revealed by a finished genome assembly of the mouse.

    Science.gov (United States)

    Church, Deanna M; Goodstadt, Leo; Hillier, Ladeana W; Zody, Michael C; Goldstein, Steve; She, Xinwe; Bult, Carol J; Agarwala, Richa; Cherry, Joshua L; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E; Ponting, Chris P

    2009-05-05

    The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.

  4. Lineage-specific biology revealed by a finished genome assembly of the mouse.

    Directory of Open Access Journals (Sweden)

    Deanna M Church

    2009-05-01

    Full Text Available The mouse (Mus musculus is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes. In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.

  5. Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

    Directory of Open Access Journals (Sweden)

    Vonstein Veronika

    2010-06-01

    Full Text Available Abstract Background The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. Results The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. Conclusions We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  6. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    Science.gov (United States)

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  7. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development

    Directory of Open Access Journals (Sweden)

    Siew Woh Choo

    2014-01-01

    Full Text Available To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC tool, and pathogenomics profiling tool (PathoProT. The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.

  8. Choosing a genome browser for a Model Organism Database: surveying the maize community.

    Science.gov (United States)

    Sen, Taner Z; Harper, Lisa C; Schaeffer, Mary L; Andorf, Carson M; Seigfried, Trent E; Campbell, Darwin A; Lawrence, Carolyn J

    2010-01-01

    As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers' needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers' needs. Here, we document the survey's outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/

  9. Endonucleases : new tools to edit the mouse genome

    NARCIS (Netherlands)

    Wijshake, Tobias; Baker, Darren J.; van de Sluis, Bart

    2014-01-01

    Mouse transgenesis has been instrumental in determining the function of genes in the pathophysiology of human diseases and modification of genes by homologous recombination in mouse embryonic stem cells remains a widely used technology. However, this approach harbors a number of disadvantages, as it

  10. MaizeGDB, the community database for maize genetics and genomics

    OpenAIRE

    2004-01-01

    The Maize Genetics and Genomics Database (MaizeGDB) is a central repository for maize sequence, stock, phenotype, genotypic and karyotypic variation, and chromosomal mapping data. In addition, MaizeGDB provides contact information for over 2400 maize cooperative researchers, facilitating interactions between members of the rapidly expanding maize community. MaizeGDB represents the synthesis of all data available previously from ZmDB and from MaizeDB—databases that have been superseded by Maiz...

  11. The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources.

    Science.gov (United States)

    Arnaud, Martha B; Cerqueira, Gustavo C; Inglis, Diane O; Skrzypek, Marek S; Binkley, Jonathan; Chibucos, Marcus C; Crabtree, Jonathan; Howarth, Clinton; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin; Wortman, Jennifer R

    2012-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at aspergillus-curator@lists.stanford.edu.

  12. Application of functional genomics to the chimeric mouse model of HCV infection: optimization of microarray protocols and genomics analysis

    Directory of Open Access Journals (Sweden)

    Smith Maria W

    2006-05-01

    Full Text Available Abstract Background Many model systems of human viral disease involve human-mouse chimeric tissue. One such system is the recently developed SCID-beige/Alb-uPA mouse model of hepatitis C virus (HCV infection which involves a human-mouse chimeric liver. The use of functional genomics to study HCV infection in these chimeric tissues is complicated by the potential cross-hybridization of mouse mRNA on human oligonucleotide microarrays. To identify genes affected by mouse liver mRNA hybridization, mRNA from identical human liver samples labeled with either Cy3 or Cy5 was compared in the presence and absence of known amounts of mouse liver mRNA labeled in only one dye. Results The results indicate that hybridization of mouse mRNA to the corresponding human gene probe on Agilent Human 22 K oligonucleotide microarray does occur. The number of genes affected by such cross-hybridization was subsequently reduced to approximately 300 genes both by increasing the hybridization temperature and using liver samples which contain at least 80% human tissue. In addition, Real Time quantitative RT-PCR using human specific probes was shown to be a valid method to verify the expression level in human cells of known cross-hybridizing genes. Conclusion The identification of genes affected by cross-hybridization of mouse liver RNA on human oligonucleotide microarrays makes it feasible to use functional genomics approaches to study the chimeric SCID-beige/Alb-uPA mouse model of HCV infection. This approach used to study cross-species hybridization on oligonucleotide microarrays can be adapted to other chimeric systems of viral disease to facilitate selective analysis of human gene expression.

  13. RICD: A rice indica cDNA database resource for rice functional genomics

    Directory of Open Access Journals (Sweden)

    Zhang Qifa

    2008-11-01

    Full Text Available Abstract Background The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Results Rice Indica cDNA Database (RICD is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. Conclusion The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.

  14. CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse

    Directory of Open Access Journals (Sweden)

    Friard Olivier

    2010-08-01

    Full Text Available Abstract Background Transcription Factors (TFs and microRNAs (miRNAs are key players for gene expression regulation in higher eukaryotes. In the last years, a large amount of bioinformatic studies were devoted to the elucidation of transcriptional and post-transcriptional (mostly miRNA-mediated regulatory interactions, but little is known about the interplay between them. Description Here we describe a dynamic web-accessible database, CircuitsDB, supporting a genome-wide transcriptional and post-transcriptional regulatory network integration, for the human and mouse genomes, based on a bioinformatic sequence-analysis approach. In particular, CircuitsDB is currently focused on the study of mixed miRNA/TF Feed-Forward regulatory Loops (FFLs, i.e. elementary circuits in which a master TF regulates an miRNA and together with it a set of Joint Target protein-coding genes. The database was constructed using an ab-initio oligo analysis procedure for the identification of the transcriptional and post-transcriptional interactions. Several external sources of information were then pooled together to obtain the functional annotation of the proposed interactions. Results for human and mouse genomes are presented in an integrated web tool, that allows users to explore the circuits, investigate their sequence and functional properties and thus suggest possible biological experiments. Conclusions We present CircuitsDB, a web-server devoted to the study of human and mouse mixed miRNA/TF Feed-Forward regulatory circuits, freely available at: http://biocluster.di.unito.it/circuits/

  15. Systematic discovery of unannotated genes in 11 yeast species using a database of orthologous genomic segments

    LENUS (Irish Health Repository)

    OhEigeartaigh, Sean S

    2011-07-26

    Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external

  16. Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data

    Directory of Open Access Journals (Sweden)

    Kennedy Breandan

    2010-01-01

    Full Text Available Abstract Background The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data. Results Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation. Conclusions By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.

  17. Semantically enabling a genome-wide association study database

    Directory of Open Access Journals (Sweden)

    Beck Tim

    2012-12-01

    Full Text Available Abstract Background The amount of data generated from genome-wide association studies (GWAS has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits, and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data. Results A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH terminology suitable for describing all traits (diseases and medical signs and symptoms at various levels of granularity and the Human Phenotype Ontology (HPO most suitable for describing phenotypic abnormalities (medical signs and symptoms at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications. Conclusions We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic

  18. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA...... from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the 'Chicken...... Variation Database' (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information...

  19. Characteristics of the mouse genomic histamine H1 receptor gene

    Energy Technology Data Exchange (ETDEWEB)

    Inoue, Isao; Taniuchi, Ichiro; Kitamura, Daisuke [Kyushu Univ., Fukuoka (Japan)] [and others

    1996-08-15

    We report here the molecular cloning of a mouse histamine H1 receptor gene. The protein deduced from the nucleotide sequence is composed of 488 amino acid residues with characteristic properties of GTP binding protein-coupled receptors. Our results suggest that the mouse histamine H1 receptor gene is a single locus, and no related sequences were detected. Interspecific backcross analysis indicated that the mouse histamine H1 receptor gene (Hrh1) is located in the central region of mouse Chromosome 6 linked to microphthalmia (Mitfmi), ras-related fibrosarcoma oncogene 1 (Raf1), and ret proto-oncogene (Ret) in a region of homology with human chromosome 3p. 12 refs., 3 figs.

  20. Generation of an Oocyte-Specific Cas9 Transgenic Mouse for Genome Editing.

    Directory of Open Access Journals (Sweden)

    Linlin Zhang

    Full Text Available The CRISPR/Cas9 system has been developed as an easy-handle and multiplexable approach for engineering eukaryotic genomes by zygote microinjection of Cas9 and sgRNA, while preparing Cas9 for microinjection is laborious and introducing inconsistency into the experiment. Here, we describe a modified strategy for gene targeting through using oocyte-specific Cas9 transgenic mouse. With this mouse line, we successfully achieve precise gene targeting by injection of sgRNAs only into one-cell-stage embryos. Through comprehensive analysis, we also show allele complexity and off-target mutagenesis induced by this strategy is obviously lower than Cas9 mRNA/sgRNA injection. Thus, injection of sgRNAs into oocyte-specific Cas9 transgenic mouse embryo provides a convenient, efficient and reliable approach for mouse genome editing.

  1. LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes.

    Science.gov (United States)

    Li, Jun; Dai, Xinbin; Liu, Tingsong; Zhao, Patrick Xuechun

    2012-01-01

    Legumes play a vital role in maintaining the nitrogen cycle of the biosphere. They conduct symbiotic nitrogen fixation through endosymbiotic relationships with bacteria in root nodules. However, this and other characteristics of legumes, including mycorrhization, compound leaf development and profuse secondary metabolism, are absent in the typical model plant Arabidopsis thaliana. We present LegumeIP (http://plantgrn.noble.org/LegumeIP/), an integrative database for comparative genomics and transcriptomics of model legumes, for studying gene function and genome evolution in legumes. LegumeIP compiles gene and gene family information, syntenic and phylogenetic context and tissue-specific transcriptomic profiles. The database holds the genomic sequences of three model legumes, Medicago truncatula, Glycine max and Lotus japonicus plus two reference plant species, A. thaliana and Populus trichocarpa, with annotations based on UniProt, InterProScan, Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes databases. LegumeIP also contains large-scale microarray and RNA-Seq-based gene expression data. Our new database is capable of systematic synteny analysis across M. truncatula, G. max, L. japonicas and A. thaliana, as well as construction and phylogenetic analysis of gene families across the five hosted species. Finally, LegumeIP provides comprehensive search and visualization tools that enable flexible queries based on gene annotation, gene family, synteny and relative gene expression.

  2. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    Science.gov (United States)

    MacRae, Sheila L; Zhang, Quanwei; Lemetre, Christophe; Seim, Inge; Calder, Robert B; Hoeijmakers, Jan; Suh, Yousin; Gladyshev, Vadim N; Seluanov, Andrei; Gorbunova, Vera; Vijg, Jan; Zhang, Zhengdong D

    2015-01-01

    Genome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM genes appeared to be strongly conserved, with copy number variation in only four genes. Interestingly, we found NMR to have a higher copy number of CEBPG, a regulator of DNA repair, and TINF2, a protector of telomere integrity. NMR, as well as human, was also found to have a lower rate of germline nucleotide substitution than the mouse. Together, the data suggest that the long-lived NMR, as well as human, has more robust GM than mouse and identifies new targets for the analysis of the exceptional longevity of the NMR. PMID:25645816

  3. VitisExpDB: A database resource for grape functional genomics

    Directory of Open Access Journals (Sweden)

    Walker M Andrew

    2008-02-01

    Full Text Available Abstract Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.

  4. Sentra : a database of signal transduction proteins for comparative genome analysis.

    Energy Technology Data Exchange (ETDEWEB)

    D' Souza, M.; Glass, E. M.; Syed, M. H.; Zhang, Y.; Rodriguez, A.; Maltsev, N.; Galerpin, M. Y.; Mathematics and Computer Science; Univ. of Chicago; NIH

    2007-01-01

    Sentra (http://compbio.mcs.anl.gov/sentra), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.

  5. RadishBase: a database for genomics and genetics of radish.

    Science.gov (United States)

    Shen, Di; Sun, Honghe; Huang, Mingyun; Zheng, Yi; Li, Xixiang; Fei, Zhangjun

    2013-02-01

    Radish is an economically important vegetable crop. During the past several years, large-scale genomics and genetics resources have been accumulated for this species. To store, query, analyze and integrate these radish resources efficiently, we have developed RadishBase (http://bioinfo.bti.cornell.edu/radish), a genomics and genetics database of radish. Currently the database contains radish mitochondrial genome sequences, expressed sequence tag (EST) and unigene sequences and annotations, biochemical pathways, EST-derived single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers, and genetic maps. RadishBase is designed to enable users easily to retrieve and visualize biologically important information through a set of efficient query interfaces and analysis tools, including the BLAST search and unigene annotation query interfaces, and tools to classify unigenes functionally, to identify enriched gene ontology (GO) terms and to visualize genetic maps. A database containing radish pathways predicted from unigene sequences is also included in RadishBase. The tools and interfaces in RadishBase allow efficient mining of recently released and continually expanding large-scale radish genomics and genetics data sets, including the radish genome sequences and RNA-seq data sets.

  6. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  7. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    Directory of Open Access Journals (Sweden)

    Mohit Verma

    Full Text Available Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB, which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology search and comparative gene expression analysis. The current release of CTDB (v2.0 hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.

  8. Integrated Database And Knowledge Base For Genomic Prospective Cohort Study In Tohoku Medical Megabank Toward Personalized Prevention And Medicine.

    Science.gov (United States)

    Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun

    2015-01-01

    The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.

  9. Modelling human regulatory variation in mouse: finding the function in genome-wide association studies and whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Jean-François Schmouth

    Full Text Available An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs, in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX. This method can be applied to most human genes for which a bacterial artificial chromosome (BAC construct can be derived and a mouse-null allele exists. This strategy comprises (1 the use of recombineering technology to create a human variant-harbouring BAC, (2 knock-in of this BAC into the mouse genome using Hprt docking technology, and (3 allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation.

  10. PairWise Neighbours database: overlaps and spacers among prokaryote genomes

    Directory of Open Access Journals (Sweden)

    Garcia-Vallvé Santiago

    2009-06-01

    Full Text Available Abstract Background Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD sequence, which is responsible of an efficient translation. Description PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL http://genomes.urv.cat/pwneigh. This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence. Conclusion There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites.

  11. Design and implementation of a database for Brucella melitensis genome annotation.

    Science.gov (United States)

    De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

    2008-03-18

    The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.

  12. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  13. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.

    Science.gov (United States)

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.

  14. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi

    2016-12-24

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  15. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

    Science.gov (United States)

    Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

    2008-11-27

    The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

  16. Precision cancer mouse models through genome editing with CRISPR-Cas9

    OpenAIRE

    Mou, Haiwei; Kennedy, Zachary; Anderson, Daniel G.; Yin, Hao; Xue, Wen

    2015-01-01

    The cancer genome is highly complex, with hundreds of point mutations, translocations, and chromosome gains and losses per tumor. To understand the effects of these alterations, precise models are needed. Traditional approaches to the construction of mouse models are time-consuming and laborious, requiring manipulation of embryonic stem cells and multiple steps. The recent development of the clustered regularly interspersed short palindromic repeats (CRISPR)-Cas9 system, a powerful genome-edi...

  17. DNA methylation of Sleeping Beauty with transposition into the mouse genome.

    Science.gov (United States)

    Park, Chang Won; Kren, Betsy T; Largaespada, David A; Steer, Clifford J

    2005-08-01

    The Sleeping Beauty transposon is a recently developed non-viral vector that can mediate insertion of transgenes into the mammalian genome. Foreign DNA elements that are introduced tend to invoke a host-defense mechanism resulting in epigenetic changes, such as DNA methylation, which may induce transcriptional inactivation of mammalian genes. To assess potential epigenetic modifications associated with Sleeping Beauty transposition, we investigated the DNA methylation pattern of transgenes inserted into the mouse genome as well as genomic regions flanking the insertion sites with bisulfite-mediated genomic sequencing. Transgenic mouse lines were created with two different Sleeping Beauty transposons carrying either the Agouti or eGFP transgene. Our results showed that DNA methylation in the keratin-14 promoter and Agouti transgene were negligible. In addition, two different genomic loci flanking the Agouti insertion site exhibited patterns of DNA methylation similar to wild-type mice. In contrast, high levels of DNA methylation were observed in the eGFP transgene and its ROSA26 promoter. These results indicate that transposition via Sleeping Beauty into the mouse genome may result in a significant level of de novo DNA methylation. This may depend on a number of different factors including the cargo DNA sequence, chromosomal context of the insertion site, and/or host genetic background.

  18. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    Science.gov (United States)

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  19. ProtRepeatsDB: a database of amino acid repeats in genomes

    Directory of Open Access Journals (Sweden)

    Chauhan Virander S

    2006-07-01

    Full Text Available Abstract Background Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB http://bioinfo.icgeb.res.in/repeats/ is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats. Description ProtRepeatsDB (v1.2 consists of perfect as well as mismatch amino acid repeats in the protein sequences of 141 organisms, the genomes of which are now available. The web interface of ProtRepeatsDB consists of different tools to perform repeat s; based on protein IDs, organism name, repeat sequences, and keywords as in FASTA headers, size, frequency, gene ontology (GO annotation IDs and regular expressions (REGEXP describing repeats. These tools also allow formulation of a variety of simple, complex and logical queries to facilitate mining and large-scale cross-species comparisons of amino acid repeats. In addition to this, the database also contains sequence analysis tools to determine repeats in user input sequences. Conclusion ProtRepeatsDB is a multi-organism database of different types of amino acid repeats present in proteins. It integrates useful tools to perform genome wide queries for rapid screening and identification of amino acid repeats and facilitates comparative and evolutionary studies of the repeats. The database is useful for identification of species or organism specific

  20. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement.

    Science.gov (United States)

    Dhanapal, Arun Prabhu; Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away.

  1. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    Science.gov (United States)

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.

  2. SoyTEdb: a comprehensive database of transposable elements in the soybean genome

    Directory of Open Access Journals (Sweden)

    Zhu Liucun

    2010-02-01

    Full Text Available Abstract Background Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop. Description Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I and 6,029 DNA transposons (Class II with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95% of these elements (particularly a few hundred low-copy-number families are first described in this study. Conclusion SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually

  3. Genomic structure and refined chromosomal localization of the mouse Ptch2 gene.

    Science.gov (United States)

    Fröhlich, L; Liu, Z; Beier, D R; Lanske, B

    2002-01-01

    The vertebrate Patched 2 (Ptch2) gene encodes a putative membrane-embedded protein which may have roles in Hedgehog signaling during development and in tumorigenesis. We determined the genomic structure of the mouse Ptch2 gene and show that Ptch2 is composed of 22 exons spanning approximately 18 kb of genomic DNA. The exon-intron boundaries were found to be conserved within the human and mouse Ptch2 genes. Analysis of the 5' flanking region revealed a CpG island, the putative promoter region and the transcriptional start site while a polyadenylation signal as well as a mRNA destabilizing motif were identified in the 3' flanking region. Single-strand conformation polymorphism analysis was used to map mouse Ptch2 to chromosome 4 between the microsatellite markers D4Mit20 and D4Mit334.

  4. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  5. The Alternaria genomes database: a comprehensive resource for a fungal genus comprised of saprophytes, plant pathogens, and allergenic species.

    Science.gov (United States)

    Dang, Ha X; Pryor, Barry; Peever, Tobin; Lawrence, Christopher B

    2015-03-25

    Alternaria is considered one of the most common saprophytic fungal genera on the planet. It is comprised of many species that exhibit a necrotrophic phytopathogenic lifestyle. Several species are clinically associated with allergic respiratory disorders although rarely found to cause invasive infections in humans. Finally, Alternaria spp. are among the most well known producers of diverse fungal secondary metabolites, especially toxins. We have recently sequenced and annotated the genomes of 25 Alternaria spp. including but not limited to many necrotrophic plant pathogens such as A. brassicicola (a pathogen of Brassicaceous crops like cabbage and canola) and A. solani (a major pathogen of Solanaceous plants like potato and tomato), and several saprophytes that cause allergy in human such as A. alternata isolates. These genomes were annotated and compared. Multiple genetic differences were found in the context of plant and human pathogenicity, notably the pro-inflammatory potential of A. alternata. The Alternaria genomes database was built to provide a public platform to access the whole genome sequences, genome annotations, and comparative genomics data of these species. Genome annotation and comparison were performed using a pipeline that integrated multiple computational and comparative genomics tools. Alternaria genome sequences together with their annotation and comparison data were ported to Ensembl database schemas using a self-developed tool (EnsImport). Collectively, data are currently hosted using a customized installation of the Ensembl genome browser platform. Recent efforts in fungal genome sequencing have facilitated the studies of the molecular basis of fungal pathogenicity as a whole system. The Alternaria genomes database provides a comprehensive resource of genomics and comparative data of an important saprophytic and plant/human pathogenic fungal genus. The database will be updated regularly with new genomes when they become available. The

  6. Databases

    Data.gov (United States)

    National Aeronautics and Space Administration — The databases of computational and experimental data from the first Aeroelastic Prediction Workshop are located here. The databases file names tell their contents by...

  7. Highly Efficient Mouse Genome Editing by CRISPR Ribonucleoprotein Electroporation of Zygotes.

    Science.gov (United States)

    Chen, Sean; Lee, Benjamin; Lee, Angus Yiu-Fai; Modzelewski, Andrew J; He, Lin

    2016-07-08

    The CRISPR/Cas9 system has been employed to efficiently edit the genomes of diverse model organisms. CRISPR-mediated mouse genome editing is typically accomplished by microinjection of Cas9 DNA/RNA and single guide RNA (sgRNA) into zygotes to generate modified animals in one step. However, microinjection is a technically demanding, labor-intensive, and costly procedure with poor embryo viability. Here, we describe a simple and economic electroporation-based strategy to deliver Cas9/sgRNA ribonucleoproteins into mouse zygotes with 100% efficiency for in vivo genome editing. Our methodology, designated as CRISPR RNP Electroporation of Zygotes (CRISPR-EZ), enables highly efficient and high-throughput genome editing in vivo, with a significant improvement in embryo viability compared with microinjection. Using CRISPR-EZ, we generated a variety of editing schemes in mouse embryos, including indel (insertion/deletion) mutations, point mutations, large deletions, and small insertions. In a proof-of-principle experiment, we used CRISPR-EZ to target the tyrosinase (Tyr) gene, achieving 88% bi-allelic editing and 42% homology-directed repair-mediated precise sequence modification in live mice. Taken together, CRISPR-EZ is simple, economic, high throughput, and highly efficient with the potential to replace microinjection for in vivo genome editing in mice and possibly in other mammals.

  8. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    Science.gov (United States)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  9. Modeling chromosomes in mouse to explore the function of genes, genomic disorders, and chromosomal organization.

    Directory of Open Access Journals (Sweden)

    Véronique Brault

    2006-07-01

    Full Text Available One of the challenges of genomic research after the completion of the human genome project is to assign a function to all the genes and to understand their interactions and organizations. Among the various techniques, the emergence of chromosome engineering tools with the aim to manipulate large genomic regions in the mouse model offers a powerful way to accelerate the discovery of gene functions and provides more mouse models to study normal and pathological developmental processes associated with aneuploidy. The combination of gene targeting in ES cells, recombinase technology, and other techniques makes it possible to generate new chromosomes carrying specific and defined deletions, duplications, inversions, and translocations that are accelerating functional analysis. This review presents the current status of chromosome engineering techniques and discusses the different applications as well as the implication of these new techniques in future research to better understand the function of chromosomal organization and structures.

  10. Specific amplification by PCR of rearranged genomic variable regions of immunoglobulin genes from mouse hybridoma cells.

    Science.gov (United States)

    Berdoz, J; Monath, T P; Kraehenbuhl, J P

    1995-04-01

    We have designed a novel strategy for the isolation of the rearranged genomic fragments encoding the L-VH-D-JH and L-V kappa/lambda-J kappa/lambda regions of mouse immunoglobulin genes. This strategy is based on the PCR amplification of genomic DNA from mouse hybridomas using multiple specific primers chosen in the 5'-untranslated region and in the intron downstream of the rearranged JH/J kappa/lambda sequences. Variable regions with intact coding sequences, including full-length leader peptides (L) can be obtained without previous DNA sequencing. Our strategy is based on a genomic template that produces fragments that do not need to be adapted for recombinant antibody expression, thus facilitating the generation of chimeric and isotype-switched immunoglobulins.

  11. The MiST2 database: a comprehensive genomics resource on microbial signal transduction

    OpenAIRE

    Ulrich, Luke E.; Igor B Zhulin

    2009-01-01

    The MiST2 database (http://mistdb.com) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antibiotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. Thes...

  12. HGD-Chn: The Database of Genome Diversity and Variation for Chinese Populations.

    Science.gov (United States)

    Hong-Sheng, Gui; Peng, Zhou; Cheng-Bo, Yang; Sheng-Bin, Li

    2009-04-01

    The Database of Genome Diversity and Variation for Chinese Populations is toward a more efficient utilization and sharing of the valuable yet diminishing genetic resources in China (including sample information of healthy populations, healthy pedigrees, disease population and disease pedigrees; genomic diversity data; disease-related allelic and haplotype data). Organization of the database can be divided into two parts: (1) Genetic resources of healthy people--Organizing genetic resources of healthy people. A variety of genetic markers (VNTR, STR, SNP, HLA, and enzyme markers, etc.) are chosen for their diversity among populations, with their distribution among different ethnic groups in China stored in the form of allelic frequency. A further analysis as well as an overall description of the Chinese population genetic structure is also being made possible. (2) Disease genetic resources--Four categories are mainly concerned: chromosomal diseases, monogenic diseases, polygenic diseases, and birth defects. For each kind of disease, the basic introduction and description, sample information, and allelic data of related gene are involved. Aside from research-oriented information, introductory courses oriented at general public covering fields of genomic diversity and variation, the related experimental techniques, standards and specifications could also be accessed in our website. Further more, flexible query and submit system with user-friendly interfaces are also integrated in our website to simplify the process of user-query and administrators' database maintenance work. Online data analyzing and managing tools are developed using bioinformatics algorithm and programming language for a better interpretation of the biological data.

  13. EcoGene: a genome sequence database for Escherichia coli K-12.

    Science.gov (United States)

    Rudd, K E

    2000-01-01

    The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Center. The EcoGene12 release includes 4293 genes. EcoGene12 differs from the GenBank annotation of the complete genome sequence in several ways, including (i) the revision of 706 predicted or confirmed gene start sites, (ii) the correction or hypothetical reconstruction of 61 frame-shifts caused by either sequence error or mutation, (iii) the reconstruction of 14 protein sequences interrupted by the insertion of IS elements, and (iv) pre-dictions that 92 genes are partially deleted gene fragments. A literature survey identified 717 proteins whose N-terminal amino acids have been verified by sequencing. 12 446 cross-references to 6835 literature citations and s are provided. EcoGene is accessible at a new website: http://bmb.med.miami.edu/EcoGene/EcoWeb. Users can search and retrieve individual EcoGene GenePages or they can download large datasets for incorporation into database management systems, facilitating various genome-scale computational and functional analyses.

  14. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    Science.gov (United States)

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  15. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    Science.gov (United States)

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences.

  16. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    Directory of Open Access Journals (Sweden)

    James J Davis

    2016-02-01

    Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  17. Human Ageing Genomic Resources: integrated databases and tools for the biology and genetics of ageing.

    Science.gov (United States)

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology.

  18. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    Science.gov (United States)

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  19. Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis.

    Directory of Open Access Journals (Sweden)

    William R Swindell

    Full Text Available Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the human disease, and standardized validation criteria for psoriasis mouse models have not been widely applied. In this study, whole-genome transcriptional profiling is used to compare gene expression patterns manifested by human psoriatic skin lesions with those that occur in five psoriasis mouse models (K5-Tie2, imiquimod, K14-AREG, K5-Stat3C and K5-TGFbeta1. While the cutaneous gene expression profiles associated with each mouse phenotype exhibited statistically significant similarity to the expression profile of psoriasis in humans, each model displayed distinctive sets of similarities and differences in comparison to human psoriasis. For all five models, correspondence to the human disease was strong with respect to genes involved in epidermal development and keratinization. Immune and inflammation-associated gene expression, in contrast, was more variable between models as compared to the human disease. These findings support the value of all five models as research tools, each with identifiable areas of convergence to and divergence from the human disease. Additionally, the approach used in this paper provides an objective and quantitative method for evaluation of proposed mouse models of psoriasis, which can be strategically applied in future studies to score strengths of mouse phenotypes relative to specific aspects of human psoriasis.

  20. ReplicationDomain: a visualization tool and comparative database for genome-wide replication timing data

    Directory of Open Access Journals (Sweden)

    Yokochi Tomoki

    2008-12-01

    Full Text Available Abstract Background Eukaryotic DNA replication is regulated at the level of large chromosomal domains (0.5–5 megabases in mammals within which replicons are activated relatively synchronously. These domains replicate in a specific temporal order during S-phase and our genome-wide analyses of replication timing have demonstrated that this temporal order of domain replication is a stable property of specific cell types. Results We have developed ReplicationDomain http://www.replicationdomain.org as a web-based database for analysis of genome-wide replication timing maps (replication profiles from various cell lines and species. This database also provides comparative information of transcriptional expression and is configured to display any genome-wide property (for instance, ChIP-Chip or ChIP-Seq data via an interactive web interface. Our published microarray data sets are publicly available. Users may graphically display these data sets for a selected genomic region and download the data displayed as text files, or alternatively, download complete genome-wide data sets. Furthermore, we have implemented a user registration system that allows registered users to upload their own data sets. Upon uploading, registered users may choose to: (1 view their data sets privately without sharing; (2 share with other registered users; or (3 make their published or "in press" data sets publicly available, which can fulfill journal and funding agencies' requirements for data sharing. Conclusion ReplicationDomain is a novel and powerful tool to facilitate the comparative visualization of replication timing in various cell types as well as other genome-wide chromatin features and is considerably faster and more convenient than existing browsers when viewing multi-megabase segments of chromosomes. Furthermore, the data upload function with the option of private viewing or sharing of data sets between registered users should be a valuable resource for the

  1. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    Science.gov (United States)

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  2. Databases

    Directory of Open Access Journals (Sweden)

    Nick Ryan

    2004-01-01

    Full Text Available Databases are deeply embedded in archaeology, underpinning and supporting many aspects of the subject. However, as well as providing a means for storing, retrieving and modifying data, databases themselves must be a result of a detailed analysis and design process. This article looks at this process, and shows how the characteristics of data models affect the process of database design and implementation. The impact of the Internet on the development of databases is examined, and the article concludes with a discussion of a range of issues associated with the recording and management of archaeological data.

  3. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    Science.gov (United States)

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  4. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Ratnere, Igor; Wolf, Yuri I.; Koonin, Eugene V.; Dubchak, Inna

    2009-07-23

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov.

  5. Update History of This Database - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGDBj Registered...f This Database Date Update contents 2014/10/10 PGDBj Registered plant list, Marker list, QTL list, Plant DB... link & Genome analysis methods English archive site is opened. 2012/08/08 PGDBj Registered... Policy | Contact Us Update History of This Database - PGDBj Registered plant lis

  6. Developing genomic knowledge bases and databases to support clinical management: current perspectives.

    Science.gov (United States)

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward.

  7. Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew.

    Science.gov (United States)

    Fan, Yu; Yu, Dandan; Yao, Yong-Gang

    2014-11-21

    The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.

  8. The Strategies WDK: a graphical search interface and web development kit for functional genomics databases.

    Science.gov (United States)

    Fischer, Steve; Aurrecoechea, Cristina; Brunk, Brian P; Gao, Xin; Harb, Omar S; Kraemer, Eileen T; Pennington, Cary; Treatman, Charles; Kissinger, Jessica C; Roos, David S; Stoeckert, Christian J

    2011-01-01

    Web sites associated with the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) have recently introduced a graphical user interface, the Strategies WDK, intended to make advanced searching and set and interval operations easy and accessible to all users. With a design guided by usability studies, the system helps motivate researchers to perform dynamic computational experiments and explore relationships across data sets. For example, PlasmoDB users seeking novel therapeutic targets may wish to locate putative enzymes that distinguish pathogens from their hosts, and that are expressed during appropriate developmental stages. When a researcher runs one of the approximately 100 searches available on the site, the search is presented as a first step in a strategy. The strategy is extended by running additional searches, which are combined with set operators (union, intersect or minus), or genomic interval operators (overlap, contains). A graphical display uses Venn diagrams to make the strategy's flow obvious. The interface facilitates interactive adjustment of the component searches with changes propagating forward through the strategy. Users may save their strategies, creating protocols that can be shared with colleagues. The strategy system has now been deployed on all EuPathDB databases, and successfully deployed by other projects. The Strategies WDK uses a configurable MVC architecture that is compatible with most genomics and biological warehouse databases, and is available for download at code.google.com/p/strategies-wdk. Database URL: www.eupathdb.org.

  9. Genome-wide assembly and analysis of alternative transcripts in mouse

    OpenAIRE

    Sharov, Alexei A; Dudekula, Dawood B.; Minoru S.H. Ko

    2005-01-01

    To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databas...

  10. Genomic imprinting is variably lost during reprogramming of mouse iPS cells

    OpenAIRE

    2013-01-01

    Derivation of induced pluripotent stem (iPS) cells is mainly an epigenetic reprogramming process. It is still quite controversial how genomic imprinting is reprogrammed in iPS cells. Thus, we derived multiple iPS clones from genetically identical mouse somatic cells. We found that parentally inherited imprint was variably lost among these iPS clones. Concurrent with the loss of DNA methylation imprint at the corresponding Snrpn and Peg3 imprinted regions, parental origin-specific expression o...

  11. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray

    OpenAIRE

    Carter, Mark G.; Sharov, Alexei A; VanBuren, Vincent; Dudekula, Dawood B.; Carmack, Condie E; Nelson, Charlie; Ko, Minoru SH

    2005-01-01

    The ability to quantitatively measure the expression of all genes in a given tissue or cell with a single assay is an exciting promise of gene-expression profiling technology. An in situ-synthesized 60-mer oligonucleotide microarray designed to detect transcripts from all mouse genes was validated, as well as a set of exogenous RNA controls derived from the yeast genome (made freely available without restriction), which allow quantitative estimation of absolute endogenous transcript abundance.

  12. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray

    Science.gov (United States)

    Carter, Mark G; Sharov, Alexei A; VanBuren, Vincent; Dudekula, Dawood B; Carmack, Condie E; Nelson, Charlie; Ko, Minoru SH

    2005-01-01

    The ability to quantitatively measure the expression of all genes in a given tissue or cell with a single assay is an exciting promise of gene-expression profiling technology. An in situ-synthesized 60-mer oligonucleotide microarray designed to detect transcripts from all mouse genes was validated, as well as a set of exogenous RNA controls derived from the yeast genome (made freely available without restriction), which allow quantitative estimation of absolute endogenous transcript abundance. PMID:15998450

  13. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk.

    Science.gov (United States)

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B; Huson, Daniel H; Frick, Julia-Stefanie

    2016-04-25

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Exploring the utility of human DNA methylation arrays for profiling mouse genomic DNA.

    Science.gov (United States)

    Wong, Nicholas C; Ng, Jane; Hall, Nathan E; Lunke, Sebastian; Salmanidis, Marika; Brumatti, Gabriela; Ekert, Paul G; Craig, Jeffrey M; Saffery, Richard

    2013-07-01

    Illumina Infinium Human Methylation (HM) BeadChips are widely used for measuring genome-scale DNA methylation, particularly in relation to epigenome-wide association studies (EWAS) studies. The methylation profile of human samples can be assessed accurately and reproducibly using the HM27 BeadChip (27,578 CpG sites) or its successor, the HM450 BeadChip (482,421 CpG sites). To date no mouse equivalent has been developed, greatly hindering the application of this methodology to the wide range of valuable murine models of disease and development currently in existence. We found 1308 and 13,715 probes from HM27 and HM450 BeadChip respectively, uniquely matched the bisulfite converted reference mouse genome (mm9). We demonstrate reproducible measurements of DNA methylation at these probes in a range of mouse tissue samples and in a murine cell line model of acute myeloid leukaemia. In the absence of a mouse counterpart, the Infinium Human Methylation BeadChip arrays have utility for methylation profiling in non-human species.

  15. Legal agreements and the governance of research commons: lessons from materials sharing in mouse genomics.

    Science.gov (United States)

    Mishra, Amrita; Bubela, Tania

    2014-04-01

    Omics research infrastructure such as databases and bio-repositories requires effective governance to support pre-competitive research. Governance includes the use of legal agreements, such as Material Transfer Agreements (MTAs). We analyze the use of such agreements in the mouse research commons, including by two large-scale resource development projects: the International Knockout Mouse Consortium (IKMC) and International Mouse Phenotyping Consortium (IMPC). We combine an analysis of legal agreements and semi-structured interviews with 87 members of the mouse model research community to examine legal agreements in four contexts: (1) between researchers; (2) deposit into repositories; (3) distribution by repositories; and (4) exchanges between repositories, especially those that are consortium members of the IKMC and IMPC. We conclude that legal agreements for the deposit and distribution of research reagents should be kept as simple and standard as possible, especially when minimal enforcement capacity and resources exist. Simple and standardized legal agreements reduce transactional bottlenecks and facilitate the creation of a vibrant and sustainable research commons, supported by repositories and databases.

  16. DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

    CERN Document Server

    Afify, Heba; Wahed, Manal Abdel

    2011-01-01

    Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information theory are often perceived as being of interest for data communication and storage. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison of genomic databases. This paper presents a differential compression algorithm that is based on production of difference sequences according to op-code table in order to optimize the compression of homologous sequences in dataset. Therefore, the stored data are composed of reference sequence, the set of differences, and differences locations, instead of storing each sequence individually. This algorithm does not require a priori knowledge about the statistics of the sequence set. The...

  17. Regulatory Features for Odorant Receptor Genes in the Mouse Genome.

    Science.gov (United States)

    Degl'Innocenti, Andrea; D'Errico, Anna

    2017-01-01

    The odorant receptor genes, seven transmembrane receptor genes constituting the vastest mammalian gene multifamily, are expressed monogenically and monoallelicaly in each sensory neuron in the olfactory epithelium. This characteristic, often referred to as the one neuron-one receptor rule, is driven by mostly uncharacterized molecular dynamics, generally named odorant receptor gene choice. Much attention has been paid by the scientific community to the identification of sequences regulating the expression of odorant receptor genes within their loci, where related genes are usually arranged in genomic clusters. A number of studies identified transcription factor binding sites on odorant receptor promoter sequences. Similar binding sites were also found on a number of enhancers that regulate in cis their transcription, but have been proposed to form interchromosomal networks. Odorant receptor gene choice seems to occur via the local removal of strongly repressive epigenetic markings, put in place during the maturation of the sensory neuron on each odorant receptor locus. Here we review the fast-changing state of art for the study of regulatory features for odorant receptor genes.

  18. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics.

    Science.gov (United States)

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F X

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db.

  19. Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases

    Science.gov (United States)

    Jain, Anubhav; Persson, Kristin A.; Ceder, Gerbrand

    2016-05-01

    Materials innovations enable new technological capabilities and drive major societal advancements but have historically required long and costly development cycles. The Materials Genome Initiative (MGI) aims to greatly reduce this time and cost. In this paper, we focus on data reuse in the MGI and, in particular, discuss the impact of three different computational databases based on density functional theory methods to the research community. We also discuss and provide recommendations on technical aspects of data reuse, outline remaining fundamental challenges, and present an outlook on the future of MGI's vision of data sharing.

  20. The genomic landscape shaped by selection on transposable elements across 18 mouse strains.

    Science.gov (United States)

    Nellåker, Christoffer; Keane, Thomas M; Yalcin, Binnaz; Wong, Kim; Agam, Avigail; Belgard, T Grant; Flint, Jonathan; Adams, David J; Frankel, Wayne N; Ponting, Chris P

    2012-06-15

    Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.

  1. Integration of mouse and human genome-wide association data identifies KCNIP4 as an asthma gene

    NARCIS (Netherlands)

    Himes, Blanca E.; Sheppard, Keith; Berndt, Annerose; Leme, Adriana S.; Myers, Rachel A.; Gignoux, Christopher R.; Levin, Albert M.; Gauderman, W. James; Yang, James J.; Mathias, Rasika A.; Romieu, Isabelle; Torgerson, Dara G.; Roth, Lindsey A.; Huntsman, Scott; Eng, Celeste; Klanderman, Barbara; Ziniti, John; Senter-Sylvia, Jody; Szefler, Stanley J.; Lemanske, Robert F.; Zeiger, Robert S.; Strunk, Robert C.; Martinez, Fernando D.; Boushey, Homer; Chinchilli, Vernon M.; Israel, Elliot; Mauger, David; Koppelman, Gerard H.; Postma, Dirkje S.; Nieuwenhuis, Maartje A. E.; Vonk, Judith M.; Lima, John J.; Irvin, Charles G.; Peters, Stephen P.; Kubo, Michiaki; Tamari, Mayumi; Nakamura, Yusuke; Litonjua, Augusto A.; Tantisira, Kelan G.; Raby, Benjamin A.; Bleecker, Eugene R.; Meyers, Deborah A.; London, Stephanie J.; Barnes, Kathleen C.; Gilliland, Frank D.; Williams, L. Keoki; Burchard, Esteban G.; Nicolae, Dan L.; Ober, Carole; DeMeo, Dawn L.; Silverman, Edwin K.; Paigen, Beverly; Churchill, Gary; Shapiro, Steve D.; Weiss, Scott

    2013-01-01

    Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR). The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify asthm

  2. 1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

    Science.gov (United States)

    Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

    2017-01-01

    The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network

  3. Translating human genetics into mouse: the impact of ultra-rapid in vivo genome editing.

    Science.gov (United States)

    Aida, Tomomi; Imahashi, Risa; Tanaka, Kohichi

    2014-01-01

    Gene-targeted mutant animals, such as knockout or knockin mice, have dramatically improved our understanding of the functions of genes in vivo and the genetic diversity that characterizes health and disease. However, the generation of targeted mice relies on gene targeting in embryonic stem (ES) cells, which is a time-consuming, laborious, and expensive process. The recent groundbreaking development of several genome editing technologies has enabled the targeted alteration of almost any sequence in any cell or organism. These technologies have now been applied to mouse zygotes (in vivo genome editing), thereby providing new avenues for simple, convenient, and ultra-rapid production of knockout or knockin mice without the need for ES cells. Here, we review recent achievements in the production of gene-targeted mice by in vivo genome editing. © 2013 The Authors Development, Growth & Differentiation © 2013 Japanese Society of Developmental Biologists.

  4. CRISPR/Cas9-Mediated Genome Editing of Mouse Small Intestinal Organoids.

    Science.gov (United States)

    Schwank, Gerald; Clevers, Hans

    2016-01-01

    The CRISPR/Cas9 system is an RNA-guided genome-editing tool that has been recently developed based on the bacterial CRISPR-Cas immune defense system. Due to its versatility and simplicity, it rapidly became the method of choice for genome editing in various biological systems, including mammalian cells. Here we describe a protocol for CRISPR/Cas9-mediated genome editing in murine small intestinal organoids, a culture system in which somatic stem cells are maintained by self-renewal, while giving rise to all major cell types of the intestinal epithelium. This protocol allows the study of gene function in intestinal epithelial homeostasis and pathophysiology and can be extended to epithelial organoids derived from other internal mouse and human organs.

  5. Enhancement of microhomology-mediated genomic rearrangements by transient loss of mouse Bloom syndrome helicase.

    Science.gov (United States)

    Yamanishi, Ayako; Yusa, Kosuke; Horie, Kyoji; Tokunaga, Masahiro; Kusano, Kohji; Kokubu, Chikara; Takeda, Junji

    2013-09-01

    Bloom syndrome, an autosomal recessive disorder of the BLM gene, confers predisposition to a broad spectrum of early-onset cancers in multiple tissue types. Loss of genomic integrity is a primary hallmark of such human malignancies, but many studies using disease-affected specimens are limited in that they are retrospective and devoid of an appropriate experimental control. To overcome this, we devised an experimental system to recapitulate the early molecular events in genetically engineered mouse embryonic stem cells, in which cells undergoing loss of heterozygosity (LOH) can be enriched after inducible down-regulation of Blm expression, with or without site-directed DNA double-strand break (DSB) induction. Transient loss of BLM increased the rate of LOH, whose breakpoints were distributed along the chromosome. Combined with site-directed DSB induction, loss of BLM synergistically increased the rate of LOH and concentrated the breakpoints around the targeted chromosomal region. We characterized the LOH events using specifically tailored genomic tools, such as high-resolution array comparative genomic hybridization and high-density single nucleotide polymorphism genotyping, revealing that the combination of BLM suppression and DSB induction enhanced genomic rearrangements, including deletions and insertions, whose breakpoints were clustered in genomic inverted repeats and associated with junctional microhomologies. Our experimental approach successfully uncovered the detailed molecular mechanisms of as-yet-uncharacterized loss of heterozygosities and reveals the significant contribution of microhomology-mediated genomic rearrangements, which could be widely applicable to the early steps of cancer formation in general.

  6. Comparative genome mapping of the deer mouse (Peromyscus maniculatus reveals greater similarity to rat (Rattus norvegicus than to the lab mouse (Mus musculus

    Directory of Open Access Journals (Sweden)

    O'Neill Rachel J

    2008-02-01

    Full Text Available Abstract Background Deer mice (Peromyscus maniculatus and congeneric species are the most common North American mammals. They represent an emerging system for the genetic analyses of the physiological and behavioral bases of habitat adaptation. Phylogenetic evidence suggests a much more ancient divergence of Peromyscus from laboratory mice (Mus and rats (Rattus than that separating latter two. Nevertheless, early karyotypic analyses of the three groups suggest Peromyscus to be exhibit greater similarities with Rattus than with Mus. Results Comparative linkage mapping of an estimated 35% of the deer mouse genome was done with respect to the Rattus and Mus genomes. We particularly focused on regions that span synteny breakpoint regions between the rat and mouse genomes. The linkage analysis revealed the Peromyscus genome to have a higher degree of synteny and gene order conservation with the Rattus genome. Conclusion These data suggest that: 1. the Rattus and Peromyscus genomes more closely represent ancestral Muroid and rodent genomes than that of Mus. 2. the high level of genome rearrangement observed in Muroid rodents is especially pronounced in Mus. 3. evolution of genome organization can operate independently of more commonly assayed measures of genetic change (e.g. SNP frequency.

  7. SNPpy--database management for SNP data from genome wide association studies.

    Directory of Open Access Journals (Sweden)

    Faheem Mitha

    Full Text Available BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS. This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP data. SNPpy and its dependencies are open source software. RESULTS: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. CONCLUSIONS: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

  8. Construction of a Pan-Genome Allele Database of Salmonella enterica Serovar Enteritidis for Molecular Subtyping and Disease Cluster Identification

    Directory of Open Access Journals (Sweden)

    Yen-Yi Liu

    2016-12-01

    Full Text Available We built a pan-genome allele database with 395 genomes of Salmonella enterica serovar Enteritidis and developed computer tools for analysis of whole genome sequencing (WGS data of bacterial isolates for disease cluster identification. A web server (http://wgmlst.imst.nsysu.edu.tw was set up with the database and the tools, allowing users to upload WGS data to generate whole genome multilocus sequence typing (wgMLST profiles and to perform cluster analysis of wgMLST profiles. The usefulness of the database in disease cluster identification was demonstrated by analyzing a panel of genomes from 55 epidemiologically well-defined S. Enteritidis isolates provided by the Minnesota Department of Health. The wgMLST-based cluster analysis revealed distinct clades that were concordant with the epidemiologically defined outbreaks. Thus, using a common pan-genome allele database, wgMLST can be a promising WGS-based subtyping approach for disease surveillance and outbreak investigation across laboratories.

  9. Analysis of segmental duplications reveals a distinct pattern of continuation-of-synteny between human and mouse genomes.

    Science.gov (United States)

    Mehan, Michael R; Almonte, Maricel; Slaten, Erin; Freimer, Nelson B; Rao, P Nagesh; Ophoff, Roel A

    2007-03-01

    About 5% of the human genome consists of large-scale duplicated segments of almost identical sequences. Segmental duplications (SDs) have been proposed to be involved in non-allelic homologous recombination leading to recurrent genomic variation and disease. It has also been suggested that these SDs are associated with syntenic rearrangements that have shaped the human genome. We have analyzed 14 members of a single family of closely related SDs in the human genome, some of which are associated with common inversion polymorphisms at chromosomes 8p23 and 4p16. Comparative analysis with the mouse genome revealed syntenic inversions for these two human polymorphic loci. In addition, 12 of the 14 SDs, while absent in the mouse genome, occur at the breaks of synteny; suggesting a non-random involvement of these sequences in genome evolution. Furthermore, we observed a syntenic familial relationship between 8 and 12 breakpoint-loci, where broken synteny that ends at one family member resumes at another, even across different chromosomes. Subsequent genome-wide assessment revealed that this relationship, which we named continuation-of-synteny, is not limited to the 8p23 family and occurs 46 times in the human genome with high frequency at specific chromosomes. Our analysis supports a non-random breakage model of genomic evolution with an active involvement of segmental duplications for specific regions of the human genome.

  10. Genomic targets of Brachyury (T in differentiating mouse embryonic stem cells.

    Directory of Open Access Journals (Sweden)

    Amanda L Evans

    Full Text Available The T-box transcription factor Brachyury (T is essential for formation of the posterior mesoderm and the notochord in vertebrate embryos. Work in the frog and the zebrafish has identified some direct genomic targets of Brachyury, but little is known about Brachyury targets in the mouse.Here we use chromatin immunoprecipitation and mouse promoter microarrays to identify targets of Brachyury in embryoid bodies formed from differentiating mouse ES cells. The targets we identify are enriched for sequence-specific DNA binding proteins and include components of signal transduction pathways that direct cell fate in the primitive streak and tailbud of the early embryo. Expression of some of these targets, such as Axin2, Fgf8 and Wnt3a, is down regulated in Brachyury mutant embryos and we demonstrate that they are also Brachyury targets in the human. Surprisingly, we do not observe enrichment of the canonical T-domain DNA binding sequence 5'-TCACACCT-3' in the vicinity of most Brachyury target genes. Rather, we have identified an (AC(n repeat sequence, which is conserved in the rat but not in human, zebrafish or Xenopus. We do not understand the significance of this sequence, but speculate that it enhances transcription factor binding in the regulatory regions of Brachyury target genes in rodents.Our work identifies the genomic targets of a key regulator of mesoderm formation in the early mouse embryo, thereby providing insights into the Brachyury-driven genetic regulatory network and allowing us to compare the function of Brachyury in different species.

  11. Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome

    Directory of Open Access Journals (Sweden)

    Dougan Gordon

    2009-12-01

    Full Text Available Abstract Background Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region. Results The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI, and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC to establish a standardized naming scheme for alpha-defensins. Conclusions Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene

  12. In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy

    OpenAIRE

    Nelson, Christopher E.; Hakim, Chady H.; Ousterout, David G.; Thakore, Pratiksha I.; Moreb, Eirik A.; Rivera, Ruth M. Castellanos; Madhavan, Sarina; Pan, Xiufang; Ran, F. Ann; Yan, Winston X.; Asokan, Aravind; Zhang, Feng; Duan, Dongsheng; Gersbach, Charles A.

    2015-01-01

    Duchenne muscular dystrophy (DMD) is a devastating disease affecting about 1 out of 5000 male births and caused by mutations in the dystrophin gene. Genome editing has the potential to restore expression of a modified dystrophin gene from the native locus to modulate disease progression. In this study, adeno-associated virus was used to deliver the CRISPR/Cas9 system to the mdx mouse model of DMD to remove the mutated exon 23 from the dystrophin gene. This includes local and systemic delivery...

  13. In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy

    OpenAIRE

    Nelson, Christopher E.; Hakim, Chady H.; Ousterout, David G.; Thakore, Pratiksha I; Moreb, Eirik A.; Rivera, Ruth M. Castellanos; Madhavan, Sarina; Pan, Xiufang; Ran, F. Ann; Yan, Winston X.; Asokan, Aravind; Zhang, Feng; Duan, Dongsheng; Gersbach, Charles A.

    2015-01-01

    Duchenne muscular dystrophy (DMD) is a devastating disease affecting about 1 out of 5000 male births and caused by mutations in the dystrophin gene. Genome editing has the potential to restore expression of a modified dystrophin gene from the native locus to modulate disease progression. In this study, adeno-associated virus was used to deliver the CRISPR/Cas9 system to the mdx mouse model of DMD to remove the mutated exon 23 from the dystrophin gene. This includes local and systemic delivery...

  14. Utilizing linkage disequilibrium information from Indian Genome Variation Database for mapping mutations: SCA12 case study

    Indian Academy of Sciences (India)

    Samira Bahl; Ikhlak Ahmed; The Indian Genome Variation Consortium; Mitali Mukerji

    2009-04-01

    Stratification in heterogeneous populations poses an enormous challenge in linkage disequilibrium (LD) based identification of causal loci using surrogate markers. In this study, we demonstrate the enormous potential of endogamous Indian populations for mapping mutations in candidate genes using minimal SNPs, mainly due to larger regions of LD. We show this by a case study of the PPP2R2B gene (∼400 kb) that harbours a CAG repeat, expansion of which has been implicated in spinocerebellar ataxia type 12 (SCA12). Using LD information derived from Indian Genome Variation database (IGVdb) on populations which share similar ethnic and linguistic backgrounds as the SCA12 study population, we could map the causal loci using a minimal set of three SNPs, without the generation of additional basal data from the ethnically matched population. We could also demonstrate transferability of tagSNPs from a related HapMap population for mapping the mutation.

  15. REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

    Science.gov (United States)

    Roberts, Richard J; Vincze, Tamas; Posfai, Janos; Macelis, Dana

    2015-01-01

    REBASE is a comprehensive and fully curated database of information about the components of restriction-modification (RM) systems. It contains fully referenced information about recognition and cleavage sites for both restriction enzymes and methyltransferases as well as commercial availability, methylation sensitivity, crystal and sequence data. All genomes that are completely sequenced are analyzed for RM system components, and with the advent of PacBio sequencing, the recognition sequences of DNA methyltransferases (MTases) are appearing rapidly. Thus, Type I and Type III systems can now be characterized in terms of recognition specificity merely by DNA sequencing. The contents of REBASE may be browsed from the web http://rebase.neb.com and selected compilations can be downloaded by FTP (ftp.neb.com). Monthly updates are also available via email.

  16. TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data.

    Science.gov (United States)

    Bouaoun, Liacine; Sonkin, Dmitriy; Ardin, Maude; Hollstein, Monica; Byrnes, Graham; Zavadil, Jiri; Olivier, Magali

    2016-09-01

    TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database (http://p53.iarc.fr) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA-binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database.

  17. Sleeping Beauty transposition in the mouse genome is associated with changes in DNA methylation at the site of insertion.

    Science.gov (United States)

    Park, Chang Won; Park, Jeongmin; Kren, Betsy T; Steer, Clifford J

    2006-08-01

    The Sleeping Beauty (SB) transposon (Tn) system is a nonviral gene delivery tool that has widespread application for transfer of therapeutic genes into the mammalian genome. To determine its utility as a gene delivery system, it was important to assess the epigenetic modifications associated with SB insertion into the genome and potential inactivation of the transgene. This study investigated the DNA methylation pattern of an SB Tn as well as the flanking genomic region at insertion sites in the mouse genome. The ubiquitous ROSA26 promoter and an initial part of the eGFP coding sequence in the SB Tn exhibited high levels of CpG methylation in transgenic mouse lines, irrespective of the chromosomal loci of the insertion sites. In contrast, no detectable CpG methylation in the endogenous mouse ROSA26 counterpart was observed in the same animals. Furthermore, significant hypomethylation was detected in neighboring chromosomal sequences of two unique SB Tn insertions compared to wild-type patterns. Taken together, these results suggest that SB Tn insertions into the mouse genome can be discriminated by DNA methylation machinery from an identical endogenous DNA sequence and can profoundly alter the DNA methylation status of the transgene cargo as well as flanking host genomic regions.

  18. Development of Database and Genomic Medicine for von Hippel-Lindau Disease in Japan

    Science.gov (United States)

    TAKAYANAGI, Shunsaku; MUKASA, Akitake; NAKATOMI, Hirofumi; KANNO, Hiroshi; KURATSU, Jun-ichi; NISHIKAWA, Ryo; MISHIMA, Kazuhiko; NATSUME, Atushi; WAKABAYASHI, Toshihiko; HOUKIN, Kiyohiro; TERASAKA, Shunsuke; YAO, Masahiro; SHINOHARA, Nobuo; SHUIN, Taro; SAITO, Nobuhito

    2017-01-01

    von Hippel-Lindau (VHL) disease is a hereditary tumor disease in which tumors develop in multiple organs, not only as hemangioblastomas (HBs) in the central nervous system, but also as kidney tumors, pheochromocytomas, and so on. Much about the epidemiology of VHL disease remained unknown until fairly recently in Japan, leading to calls for the establishment of a VHL disease epidemiological database in Japanese. To elucidate its epidemiology in Japan, the Japanese Ministry of Health, Labour and Welfare created the VHL Disease Study Group, which was put in charge of carrying out a nationwide epidemiological survey. The survey found close to 400 Japanese VHL disease patients throughout the country. Based on those results, the VHL Disease Study Group created the VHL Disease Treatment Guideline and also a severity classification. It is thought that the prognosis of VHL disease patients can be improved by performing genetic diagnosis and careful follow-up. Accordingly, the University of Tokyo Hospital put in place an in-hospital system for implementing genomic medicine for VHL disease based on genetic diagnosis. For that system, it was especially important to establish (I) accurate genetic diagnostic techniques, (II) genetic counseling capabilities for the patients and their families, and (III) a system of cooperation among multiple departments, including urology departments, and so on. Further elucidation of the epidemiology and the development of genomic medicine are needed to improve the treatment results of VHL disease in Japan. PMID:28070114

  19. Genomic imprinting is variably lost during reprogramming of mouse iPS cells.

    Science.gov (United States)

    Takikawa, Sachiko; Ray, Chelsea; Wang, Xin; Shamis, Yulia; Wu, Tien-Yuan; Li, Xiajun

    2013-09-01

    Derivation of induced pluripotent stem (iPS) cells is mainly an epigenetic reprogramming process. It is still quite controversial how genomic imprinting is reprogrammed in iPS cells. Thus, we derived multiple iPS clones from genetically identical mouse somatic cells. We found that parentally inherited imprint was variably lost among these iPS clones. Concurrent with the loss of DNA methylation imprint at the corresponding Snrpn and Peg3 imprinted regions, parental origin-specific expression of the Snrpn and Zim1 imprinted genes was also lost in these iPS clones. This loss of parental genomic imprinting in iPS cells was likely caused by the reprogramming process during iPS cell derivation because extended culture of iPS cells did not lead to significant increase in the loss of genomic imprinting. Intriguingly, one to several paternal chromosomes appeared to have acquired de novo methylation at the Snrpn and Zac1 imprinted regions in a high percentage of iPS clones. These results might have some implications for future therapeutic applications of iPS cells. Since DNA methylation imprint can be completely erased in some iPS clones at multiple imprinted regions, iPS cell reprogramming may also be employed to dissect the underlying mechanisms of erasure, reacquisition and maintenance of genomic imprinting in mammals.

  20. Novel LanT associated lantibiotic clusters identified by genome database mining.

    Directory of Open Access Journals (Sweden)

    Mangal Singh

    Full Text Available BACKGROUND: Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. METHODOLOGY/FINDINGS: Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. CONCLUSION: This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and

  1. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    Science.gov (United States)

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  2. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Science.gov (United States)

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  3. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, Tatiparthi B. K. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Thomas, Alex D. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Stamatis, Dimitri [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Bertsch, Jon [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Isbandi, Michelle [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Jansson, Jakob [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Mallajosyula, Jyothi [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Pagani, Ioanna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lobos, Elizabeth A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); King Abdulaziz Univ., Jeddah (Saudi Arabia)

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  4. Sexually transmitted diseases putative drug target database: A comprehensive database of putative drug targets of pathogens identified by comparative genomics

    Directory of Open Access Journals (Sweden)

    Vijayakumari Malipatil

    2013-01-01

    Conclusion: Diverse data merged in the common framework of this database is expected to be valuable not only for basic studies in clinical bioinformatics, but also for basic studies in immunological, biotechnological and clinical fields.

  5. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

    Science.gov (United States)

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A; Keseler, Ingrid M; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Karp, Peter D

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46,000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service.

  6. Cre Fused with RVG Peptide Mediates Targeted Genome Editing in Mouse Brain Cells In Vivo

    Directory of Open Access Journals (Sweden)

    Zhiyuan Zou

    2016-12-01

    Full Text Available Cell penetrating peptides (CPPs are short peptides that can pass through cell membranes. CPPs can facilitate the cellular entry of proteins, macromolecules, nanoparticles and drugs. RVG peptide (RVG hereinafter is a 29-amino-acid CPP derived from a rabies virus glycoprotein that can cross the blood-brain barrier (BBB and enter brain cells. However, whether RVG can be used for genome editing in the brain has not been reported. In this work, we combined RVG with Cre recombinase for bacterial expression. The purified RVG-Cre protein cut plasmids in vitro and traversed cell membranes in cultured Neuro2a cells. By tail vein-injecting RVG-Cre into Cre reporter mouse lines mTmG and Rosa26lacZ, we demonstrated that RVG-Cre could target brain cells and achieve targeted somatic genome editing in adult mice. This direct delivery of the gene-editing enzyme protein into mouse brains with RVG is much safer than plasmid- or viral-based methods, holding promise for further applications in the treatment of various brain diseases.

  7. Corticosterone rapidly promotes respiratory burst of mouse peritoneal macrophages via non-genomic mechanism

    Institute of Scientific and Technical Information of China (English)

    SHI Wen-lei; MA Qian; ZHANG Lu-ding; HUANG Jun-long; ZHOU Jian; LIU Lei; SHEN Xing-hua; JIANG Chun-lei

    2011-01-01

    Background The immunomodulatory effects of glucocorticoids (GCs) have been described as bimodal. High concentration of GCs exerts immunosuppressive effects and low levels of GCs are immunopermissive. While the immunosuppressive mechanisms of GCs have been investigated intensely, the immunopermissive effects of GCs remain unclear. A lot of studies showed GCs could exert rapid non-genomic actions. We herein studied the rapid immunopromoting effects of GCs.Methods We observed the rapid (within 30 minutes) effects of corticosterone on respiratory burst of mouse peritoneal macrophages and studied their mechanisms. The superoxide anions were measured by cytochrome C reduction assay.Protein kinase C phosphorylation was measured by Western blotting and membrane fluidity was evaluated by fluorescence polarization measurement.Results The 10-8 mol/L and 10-7 mol/L corticosterone rapidly increased the superoxide anions production by macrophages, which were insensitive to GC-receptor antagonist, mifepristone, and protein-synthesis inhibitor,cycloheximide. Corticosterone coupled to bovine serum albumin was able to mimic the effects of corticosterone. The effects were independent of protein kinase C pathway and the change in membrane fluidity.Conclusions The results indicate that corticosterone rapidly promote the superoxide anions production by mouse peritoneal macrophages may through non-genomic mechanisms. This study may contribute to understanding the effects of GCs under stress condition and the physiological significance of nongenomic effects of GCs.

  8. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

  9. Generation of mouse models of myeloid malignancy with combinatorial genetic lesions using CRISPR-Cas9 genome editing

    OpenAIRE

    Heckl, Dirk; Kowalczyk, Monika S.; Yudovich, David; Belizaire, Roger; Puram, Rishi V.; McConkey, Marie E.; Thielke, Anne; Aster, Jon C.; Regev, Aviv; Ebert, Benjamin L.

    2014-01-01

    Genome sequencing studies have shown that human malignancies often bear mutations in four or more driver genes[superscript 1], but it is difficult to recapitulate this degree of genetic complexity in mouse models using conventional breeding. Here we use the CRISPR-Cas9 system of genome editing[superscript 2, 3, 4] to overcome this limitation. By delivering combinations of small guide RNAs (sgRNAs) and Cas9 with a lentiviral vector, we modified up to five genes in a single mouse hematopoietic ...

  10. Inducing mutations in the mouse genome with the chemical mutagen ethylnitrosourea

    Directory of Open Access Journals (Sweden)

    S.M.G. Massironi

    2006-09-01

    Full Text Available When compared to other model organisms whose genome is sequenced, the number of mutations identified in the mouse appears extremely reduced and this situation seriously hampers our understanding of mammalian gene function(s. Another important consequence of this shortage is that a majority of human genetic diseases still await an animal model. To improve the situation, two strategies are currently used: the first makes use of embryonic stem cells, in which one can induce knockout mutations almost at will; the second consists of a genome-wide random chemical mutagenesis, followed by screening for mutant phenotypes and subsequent identification of the genetic alteration(s. Several projects are now in progress making use of one or the other of these strategies. Here, we report an original effort where we mutagenized BALB/c males, with the mutagen ethylnitrosourea. Offspring of these males were screened for dominant mutations and a three-generation breeding protocol was set to recover recessive mutations. Eleven mutations were identified (one dominant and ten recessives. Three of these mutations are new alleles (Otop1mlh, Foxn1sepe and probably rodador at loci where mutations have already been reported, while 4 are new and original alleles (carc, eqlb, frqz, and Sacc. This result indicates that the mouse genome, as expected, is far from being saturated with mutations. More mutations would certainly be discovered using more sophisticated phenotyping protocols. Seven of the 11 new mutant alleles induced in our experiment have been localized on the genetic map as a first step towards positional cloning.

  11. T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools

    Directory of Open Access Journals (Sweden)

    Na Han

    2016-01-01

    Full Text Available Type IV secretion system (T4SS can mediate the passage of macromolecules across cellular membranes and is essential for virulent and genetic material exchange among bacterial species. The Type IV Secretion Project 2.0 (T4SP 2.0 database is an improved and extended version of the platform released in 2013 aimed at assisting with the detection of Type IV secretion systems (T4SS in bacterial genomes. This advanced version provides users with web server tools for detecting the existence and variations of T4SS genes online. The new interface for the genome browser provides a user-friendly access to the most complete and accurate resource of T4SS gene information (e.g., gene number, name, type, position, sequence, related articles, and quick links to other webs. Currently, this online database includes T4SS information of 5239 bacterial strains. Conclusions. T4SS is one of the most versatile secretion systems necessary for the virulence and survival of bacteria and the secretion of protein and/or DNA substrates from a donor to a recipient cell. This database on virB/D genes of the T4SS system will help scientists worldwide to improve their knowledge on secretion systems and also identify potential pathogenic mechanisms of various microbial species.

  12. T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools

    Science.gov (United States)

    Han, Na; Yu, Weiwen; Qiang, Yujun

    2016-01-01

    Type IV secretion system (T4SS) can mediate the passage of macromolecules across cellular membranes and is essential for virulent and genetic material exchange among bacterial species. The Type IV Secretion Project 2.0 (T4SP 2.0) database is an improved and extended version of the platform released in 2013 aimed at assisting with the detection of Type IV secretion systems (T4SS) in bacterial genomes. This advanced version provides users with web server tools for detecting the existence and variations of T4SS genes online. The new interface for the genome browser provides a user-friendly access to the most complete and accurate resource of T4SS gene information (e.g., gene number, name, type, position, sequence, related articles, and quick links to other webs). Currently, this online database includes T4SS information of 5239 bacterial strains. Conclusions. T4SS is one of the most versatile secretion systems necessary for the virulence and survival of bacteria and the secretion of protein and/or DNA substrates from a donor to a recipient cell. This database on virB/D genes of the T4SS system will help scientists worldwide to improve their knowledge on secretion systems and also identify potential pathogenic mechanisms of various microbial species.

  13. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

    Directory of Open Access Journals (Sweden)

    Holt Carson

    2011-12-01

    Full Text Available Abstract Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  14. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    Science.gov (United States)

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  15. Simulated space radiation-induced mutants in the mouse kidney display widespread genomic change.

    Science.gov (United States)

    Turker, Mitchell S; Grygoryev, Dmytro; Lasarev, Michael; Ohlrich, Anna; Rwatambuga, Furaha A; Johnson, Sorrel; Dan, Cristian; Eckelmann, Bradley; Hryciw, Gwen; Mao, Jian-Hua; Snijders, Antoine M; Gauny, Stacey; Kronenberg, Amy

    2017-01-01

    Exposure to a small number of high-energy heavy charged particles (HZE ions), as found in the deep space environment, could significantly affect astronaut health following prolonged periods of space travel if these ions induce mutations and related cancers. In this study, we used an in vivo mutagenesis assay to define the mutagenic effects of accelerated 56Fe ions (1 GeV/amu, 151 keV/μm) in the mouse kidney epithelium exposed to doses ranging from 0.25 to 2.0 Gy. These doses represent fluences ranging from 1 to 8 particle traversals per cell nucleus. The Aprt locus, located on chromosome 8, was used to select induced and spontaneous mutants. To fully define the mutagenic effects, we used multiple endpoints including mutant frequencies, mutation spectrum for chromosome 8, translocations involving chromosome 8, and mutations affecting non-selected chromosomes. The results demonstrate mutagenic effects that often affect multiple chromosomes for all Fe ion doses tested. For comparison with the most abundant sparsely ionizing particle found in space, we also examined the mutagenic effects of high-energy protons (1 GeV, 0.24 keV/μm) at 0.5 and 1.0 Gy. Similar doses of protons were not as mutagenic as Fe ions for many assays, though genomic effects were detected in Aprt mutants at these doses. Considered as a whole, the data demonstrate that Fe ions are highly mutagenic at the low doses and fluences of relevance to human spaceflight, and that cells with considerable genomic mutations are readily induced by these exposures and persist in the kidney epithelium. The level of genomic change produced by low fluence exposure to heavy ions is reminiscent of the extensive rearrangements seen in tumor genomes suggesting a potential initiation step in radiation carcinogenesis.

  16. A geographically-diverse collection of 418 human gut microbiome pathway genome databases

    KAUST Repository

    Hahn, Aria S.

    2017-04-11

    Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.

  17. FunCoup 3.0: database of genome-wide functional coupling networks.

    Science.gov (United States)

    Schmitt, Thomas; Ogris, Christoph; Sonnhammer, Erik L L

    2014-01-01

    We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.

  18. Complete Genome Sequence of Turicibacter sp. Strain H121, Isolated from the Feces of a Contaminated Germ-Free Mouse

    Science.gov (United States)

    Auchtung, T. A.; Holder, M. E.; Gesell, J. R.; Ajami, N. J.; Duarte, R. T. D.; Itoh, K.; Caspi, R. R.; Petrosino, J. F.; Horai, R.

    2016-01-01

    Turicibacter bacteria are commonly detected in the gastrointestinal tracts and feces of humans and animals, but their phylogeny, ecological role, and pathogenic potential remain unclear. We present here the first complete genome sequence of Turicibacter sp. strain H121, which was isolated from the feces of a mouse line contaminated following germ-free derivation. PMID:27013036

  19. Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features

    Directory of Open Access Journals (Sweden)

    Chen Chih-yu

    2012-04-01

    Full Text Available Abstract Background Epigenetic modifications, transcription factor (TF availability and differences in chromatin folding influence how the genome is interpreted by the transcriptional machinery responsible for gene expression. Enhancers buried in non-coding regions are found to be associated with significant differences in histone marks between different cell types. In contrast, gene promoters show more uniform modifications across cell types. Here we used histone modification and chromatin-associated protein ChIP-Seq data sets in mouse embryonic stem (ES cells as well as genomic features to identify functional enhancer regions. Using co-bound sites of OCT4, SOX2 and NANOG (co-OSN, validated enhancers and co-bound sites of MYC and MYCN (limited enhancer activity as enhancer positive and negative training sets, we performed multinomial logistic regression with LASSO regularization to identify key features. Results Cross validations reveal that a combination of p300, H3K4me1, MED12 and NIPBL features to be top signatures of co-OSN regions. Using a model from 10 signatures, 83% of top 1277 putative 1 kb enhancer regions (probability greater than or equal to 0.8 overlapped with at least one TF peak from 7 mouse ES cell ChIP-Seq data sets. These putative enhancers are associated with increased gene expression of neighbouring genes and significantly enriched in multiple TF bound loci in agreement with combinatorial models of TF binding. Furthermore, we identified several motifs of known TFs significantly enriched in putative enhancer regions compared to random promoter regions and background. Comparison with an active H3K27ac mark in various cell types confirmed cell type-specificity of these enhancers. Conclusions The top enhancer signatures we identified (p300, H3K4me1, MED12 and NIPBL will allow for the identification of cell type-specific enhancer regions in diverse cell types.

  20. Organization and evolution of a gene-rich region of the mouse genome: a 12.7-Mb region deleted in the Del(13)Svea36H mouse.

    Science.gov (United States)

    Mallon, Ann-Marie; Wilming, Laurens; Weekes, Joseph; Gilbert, James G R; Ashurst, Jennifer; Peyrefitte, Sandrine; Matthews, Lucy; Cadman, Matthew; McKeone, Richard; Sellick, Chris A; Arkell, Ruth; Botcherby, Marc R M; Strivens, Mark A; Campbell, R Duncan; Gregory, Simon; Denny, Paul; Hancock, John M; Rogers, Jane; Brown, Steve D M

    2004-10-01

    Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution.

  1. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes.

    Science.gov (United States)

    Chan, Patricia P; Lowe, Todd M

    2016-01-01

    Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/.

  2. TcruziDB, an Integrated Database, and the WWW Information Server for the Trypanosoma cruzi Genome Project

    Directory of Open Access Journals (Sweden)

    Degrave Wim

    1997-01-01

    Full Text Available Data analysis, presentation and distribution is of utmost importance to a genome project. A public domain software, ACeDB, has been chosen as the common basis for parasite genome databases, and a first release of TcruziDB, the Trypanosoma cruzi genome database, is available by ftp from ftp://iris.dbbm.fiocruz.br/pub/genomedb/TcruziDB as well as versions of the software for different operating systems (ftp://iris.dbbm.fiocruz.br/pub/unixsoft/. Moreover, data originated from the project are available from the WWW server at http://www.dbbm.fiocruz.br. It contains biological and parasitological data on CL Brener, its karyotype, all available T. cruzi sequences from Genbank, data on the EST-sequencing project and on available libraries, a T. cruzi codon table and a listing of activities and participating groups in the genome project, as well as meeting reports. T. cruzi discussion lists (tcruzi-l@iris.dbbm.fiocruz.br and tcgenics@iris.dbbm.fiocruz.br are being maintained for communication and to promote collaboration in the genome project

  3. Generation of mouse models of myeloid malignancy with combinatorial genetic lesions using CRISPR-Cas9 genome editing.

    Science.gov (United States)

    Heckl, Dirk; Kowalczyk, Monika S; Yudovich, David; Belizaire, Roger; Puram, Rishi V; McConkey, Marie E; Thielke, Anne; Aster, Jon C; Regev, Aviv; Ebert, Benjamin L

    2014-09-01

    Genome sequencing studies have shown that human malignancies often bear mutations in four or more driver genes, but it is difficult to recapitulate this degree of genetic complexity in mouse models using conventional breeding. Here we use the CRISPR-Cas9 system of genome editing to overcome this limitation. By delivering combinations of small guide RNAs (sgRNAs) and Cas9 with a lentiviral vector, we modified up to five genes in a single mouse hematopoietic stem cell (HSC), leading to clonal outgrowth and myeloid malignancy. We thereby generated models of acute myeloid leukemia (AML) with cooperating mutations in genes encoding epigenetic modifiers, transcription factors and mediators of cytokine signaling, recapitulating the combinations of mutations observed in patients. Our results suggest that lentivirus-delivered sgRNA:Cas9 genome editing should be useful to engineer a broad array of in vivo cancer models that better reflect the complexity of human disease.

  4. Identification of novel tissue-specific genes by analysis of microarray databases: a human and mouse model.

    Science.gov (United States)

    Song, Yan; Ahn, Jinsoo; Suh, Yeunsu; Davis, Michael E; Lee, Kichoon

    2013-01-01

    Understanding the tissue-specific pattern of gene expression is critical in elucidating the molecular mechanisms of tissue development, gene function, and transcriptional regulations of biological processes. Although tissue-specific gene expression information is available in several databases, follow-up strategies to integrate and use these data are limited. The objective of the current study was to identify and evaluate novel tissue-specific genes in human and mouse tissues by performing comparative microarray database analysis and semi-quantitative PCR analysis. We developed a powerful approach to predict tissue-specific genes by analyzing existing microarray data from the NCBI's Gene Expression Omnibus (GEO) public repository. We investigated and confirmed tissue-specific gene expression in the human and mouse kidney, liver, lung, heart, muscle, and adipose tissue. Applying our novel comparative microarray approach, we confirmed 10 kidney, 11 liver, 11 lung, 11 heart, 8 muscle, and 8 adipose specific genes. The accuracy of this approach was further verified by employing semi-quantitative PCR reaction and by searching for gene function information in existing publications. Three novel tissue-specific genes were discovered by this approach including AMDHD1 (amidohydrolase domain containing 1) in the liver, PRUNE2 (prune homolog 2) in the heart, and ACVR1C (activin A receptor, type IC) in adipose tissue. We further confirmed the tissue-specific expression of these 3 novel genes by real-time PCR. Among them, ACVR1C is adipose tissue-specific and adipocyte-specific in adipose tissue, and can be used as an adipocyte developmental marker. From GEO profiles, we predicted the processes in which AMDHD1 and PRUNE2 may participate. Our approach provides a novel way to identify new sets of tissue-specific genes and to predict functions in which they may be involved.

  5. Polycomb repressive complex PRC1 spatially constrains the mouse embryonic stem cell genome.

    Science.gov (United States)

    Schoenfelder, Stefan; Sugar, Robert; Dimond, Andrew; Javierre, Biola-Maria; Armstrong, Harry; Mifsud, Borbala; Dimitrova, Emilia; Matheson, Louise; Tavares-Cadete, Filipe; Furlan-Magaril, Mayra; Segonds-Pichon, Anne; Jurkowski, Wiktor; Wingett, Steven W; Tabbada, Kristina; Andrews, Simon; Herman, Bram; LeProust, Emily; Osborne, Cameron S; Koseki, Haruhiko; Fraser, Peter; Luscombe, Nicholas M; Elderkin, Sarah

    2015-10-01

    The Polycomb repressive complexes PRC1 and PRC2 maintain embryonic stem cell (ESC) pluripotency by silencing lineage-specifying developmental regulator genes. Emerging evidence suggests that Polycomb complexes act through controlling spatial genome organization. We show that PRC1 functions as a master regulator of mouse ESC genome architecture by organizing genes in three-dimensional interaction networks. The strongest spatial network is composed of the four Hox gene clusters and early developmental transcription factor genes, the majority of which contact poised enhancers. Removal of Polycomb repression leads to disruption of promoter-promoter contacts in the Hox gene network. In contrast, promoter-enhancer contacts are maintained in the absence of Polycomb repression, with accompanying widespread acquisition of active chromatin signatures at network enhancers and pronounced transcriptional upregulation of network genes. Thus, PRC1 physically constrains developmental transcription factor genes and their enhancers in a silenced but poised spatial network. We propose that the selective release of genes from this spatial network underlies cell fate specification during early embryonic development.

  6. In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy.

    Science.gov (United States)

    Nelson, Christopher E; Hakim, Chady H; Ousterout, David G; Thakore, Pratiksha I; Moreb, Eirik A; Castellanos Rivera, Ruth M; Madhavan, Sarina; Pan, Xiufang; Ran, F Ann; Yan, Winston X; Asokan, Aravind; Zhang, Feng; Duan, Dongsheng; Gersbach, Charles A

    2016-01-22

    Duchenne muscular dystrophy (DMD) is a devastating disease affecting about 1 out of 5000 male births and caused by mutations in the dystrophin gene. Genome editing has the potential to restore expression of a modified dystrophin gene from the native locus to modulate disease progression. In this study, adeno-associated virus was used to deliver the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system to the mdx mouse model of DMD to remove the mutated exon 23 from the dystrophin gene. This includes local and systemic delivery to adult mice and systemic delivery to neonatal mice. Exon 23 deletion by CRISPR-Cas9 resulted in expression of the modified dystrophin gene, partial recovery of functional dystrophin protein in skeletal myofibers and cardiac muscle, improvement of muscle biochemistry, and significant enhancement of muscle force. This work establishes CRISPR-Cas9-based genome editing as a potential therapy to treat DMD. Copyright © 2016, American Association for the Advancement of Science.

  7. The mouse Fau gene: genomic structure, chromosomal localization, and characterization of two retropseudogenes.

    Science.gov (United States)

    Casteels, D; Poirier, C; Guénet, J L; Merregaert, J

    1995-01-01

    The Fau gene is the cellular homolog of the fox sequence of the Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV). FBR-MuSV acquired the Fau gene by transduction in a transcriptional orientation opposite to that of the genomic Fau gene. The genomic structure of the mouse Fau gene (MMFAU) and its upstream elements have been determined and are similar to those of the human FAU gene. The gene consists of five exons and is located on chromosome 19. The first exon is not translated. The promoter region has no well-defined TATA box but contains the polypyrimidine initiator flanked by regions of high GC content (65%) and shows all of the characteristics of a housekeeping gene. The 5' end of the mRNA transcript was determined by 5' RACE analysis and is located, as expected, in the polypyrimidine initiator site. Furthermore, the sequences of two retropseudogenes (Fau-ps1 and Fau-ps2) are reported. Both pseudogenes are approximately 75% identical to the Fau cDNA, but both are shorter due to a deletion at the 5' end and do not encode a functional protein. Fau-prs is interrupted by an AG-rich region of about 350 bp within the S30 region of the Fau cDNA. Fau-ps1 was localized on chromosome 1 and Fau-ps2 on chromosome 7.

  8. Generation of Mouse Haploid Somatic Cells by Small Molecules for Genome-wide Genetic Screening

    Directory of Open Access Journals (Sweden)

    Zheng-Quan He

    2017-08-01

    Full Text Available The recent success of derivation of mammalian haploid embryonic stem cells (haESCs has provided a powerful tool for large-scale functional analysis of the mammalian genome. However, haESCs rapidly become diploidized after differentiation, posing challenges for genetic analysis. Here, we show that the spontaneous diploidization of haESCs happens in metaphase due to mitotic slippage. Diploidization can be suppressed by small-molecule-mediated inhibition of CDK1 and ROCK. Through ROCK inhibition, we can generate haploid somatic cells of all three germ layers from haESCs, including terminally differentiated neurons. Using piggyBac transposon-based insertional mutagenesis, we generated a haploid neural cell library harboring genome-wide mutations for genetic screening. As a proof of concept, we screened for Mn2+-mediated toxicity and identified the Park2 gene. Our findings expand the applications of mouse haploid cell technology to somatic cell types and may also shed light on the mechanisms of ploidy maintenance.

  9. A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm

    Directory of Open Access Journals (Sweden)

    Allen Eric E

    2008-10-01

    Full Text Available Abstract Background The process of horizontal gene transfer (HGT is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not. Description The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource http://darkhorse.ucsd.edu. Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence. Conclusion The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and

  10. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database.

    Science.gov (United States)

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome's content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called "Cynara cardunculus MicroSatellite DataBase" (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates.

  11. Plastid-like Seq in mt Genome - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available erences for individual fragments is available. Data file...t were migrated from the plastid genome to the mitochondrial genome. Information on sizes, positions, gene names, homologies and diff

  12. Microarray and comparative genomics-based identification of genes and gene regulatory regions of the mouse immune system

    Directory of Open Access Journals (Sweden)

    Katz Jonathan D

    2004-10-01

    Full Text Available Abstract Background In this study we have built and mined a gene expression database composed of 65 diverse mouse tissues for genes preferentially expressed in immune tissues and cell types. Using expression pattern criteria, we identified 360 genes with preferential expression in thymus, spleen, peripheral blood mononuclear cells, lymph nodes (unstimulated or stimulated, or in vitro activated T-cells. Results Gene clusters, formed based on similarity of expression-pattern across either all tissues or the immune tissues only, had highly significant associations both with immunological processes such as chemokine-mediated response, antigen processing, receptor-related signal transduction, and transcriptional regulation, and also with more general processes such as replication and cell cycle control. Within-cluster gene correlations implicated known associations of known genes, as well as immune process-related roles for poorly described genes. To characterize regulatory mechanisms and cis-elements of genes with similar patterns of expression, we used a new version of a comparative genomics-based cis-element analysis tool to identify clusters of cis-elements with compositional similarity among multiple genes. Several clusters contained genes that shared 5–6 cis-elements that included ETS and zinc-finger binding sites. cis-Elements AP2 EGRF ETSF MAZF SP1F ZF5F and AREB ETSF MZF1 PAX5 STAT were shared in a thymus-expressed set; AP4R E2FF EBOX ETSF MAZF SP1F ZF5F and CREB E2FF MAZF PCAT SP1F STAT cis-clusters occurred in activated T-cells; CEBP CREB NFKB SORY and GATA NKXH OCT1 RBIT occurred in stimulated lymph nodes. Conclusion This study demonstrates a series of analytic approaches that have allowed the implication of genes and regulatory elements that participate in the differentiation, maintenance, and function of the immune system. Polymorphism or mutation of these could adversely impact immune system functions.

  13. A Guide to the PLAZA 3.0 Plant Comparative Genomic Database.

    Science.gov (United States)

    Vandepoele, Klaas

    2017-01-01

    PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .

  14. A genome-wide survey on basic helix-loop-helix transcription factors in rat and mouse.

    Science.gov (United States)

    Zheng, Xiaodong; Zheng, X; Wang, Yong; Wang, Y; Yao, Qin; Yao, Q; Yang, Zhe; Yang, Z; Chen, Keping; Chen, K

    2009-04-01

    The basic helix-loop-helix (bHLH) proteins play essential roles in a wide range of developmental processes in higher organisms. bHLH family members have been identified in over 20 organisms, including nematode, fruit fly, and human. Our study identified 114 rat and 14 additional mouse bHLH members in rat and mouse genomes, respectively. Phylogenetic analyses revealed that both rat and mouse had 49, 26, 15, 4, 12, and 4 bHLH members in groups A, B, C, D, E, and F, respectively. Only the rat Mxi1 gene has two copies in the genome. All other rat bHLH genes and all mouse bHLH genes are single-copy genes. The chromosomal distribution pattern of mouse, rat, and human bHLH genes suggests the emergence of some bHLH genes through gene duplication, which probably happened at least before the divergence of vertebrates from invertebrates. The present study provides useful information for future studies using rat as a model animal for mammalian development.

  15. Genomic landscapes of endogenous retroviruses unveil intricate genetics of conventional and genetically-engineered laboratory mouse strains.

    Science.gov (United States)

    Lee, Kang-Hoon; Lim, Debora; Chiu, Sophia; Greenhalgh, David; Cho, Kiho

    2016-04-01

    Laboratory strains of mice, both conventional and genetically engineered, have been introduced as critical components of a broad range of studies investigating normal and disease biology. Currently, the genetic identity of laboratory mice is primarily confirmed by surveying polymorphisms in selected sets of "conventional" genes and/or microsatellites in the absence of a single completely sequenced mouse genome. First, we examined variations in the genomic landscapes of transposable repetitive elements, named the TREome, in conventional and genetically engineered mouse strains using murine leukemia virus-type endogenous retroviruses (MLV-ERVs) as a probe. A survey of the genomes from 56 conventional strains revealed strain-specific TREome landscapes, and certain families (e.g., C57BL) of strains were discernible with defined patterns. Interestingly, the TREome landscapes of C3H/HeJ (toll-like receptor-4 [TLR4] mutant) inbred mice were different from its control C3H/HeOuJ (TLR4 wild-type) strain. In addition, a CD14 knock-out strain had a distinct TREome landscape compared to its control/backcross C57BL/6J strain. Second, an examination of superantigen (SAg, a "TREome gene") coding sequences of mouse mammary tumor virus-type ERVs in the genomes of the 46 conventional strains revealed a high diversity, suggesting a potential role of SAgs in strain-specific immune phenotypes. The findings from this study indicate that unexplored and intricate genomic variations exist in laboratory mouse strains, both conventional and genetically engineered. The TREome-based high-resolution genetics surveillance system for laboratory mice would contribute to efficient study design with quality control and accurate data interpretation. This genetics system can be easily adapted to other species ranging from plants to humans.

  16. Large genomic fragment deletions and insertions in mouse using CRISPR/Cas9.

    Directory of Open Access Journals (Sweden)

    Luqing Zhang

    Full Text Available ZFN, TALENs and CRISPR/Cas9 system have been used to generate point mutations and large fragment deletions and insertions in genomic modifications. CRISPR/Cas9 system is the most flexible and fast developing technology that has been extensively used to make mutations in all kinds of organisms. However, the most mutations reported up to date are small insertions and deletions. In this report, CRISPR/Cas9 system was used to make large DNA fragment deletions and insertions, including entire Dip2a gene deletion, about 65kb in size, and β-galactosidase (lacZ reporter gene insertion of larger than 5kb in mouse. About 11.8% (11/93 are positive for 65kb deletion from transfected and diluted ES clones. High targeting efficiencies in ES cells were also achieved with G418 selection, 46.2% (12/26 and 73.1% (19/26 for left and right arms respectively. Targeted large fragment deletion efficiency is about 21.4% of live pups or 6.0% of injected embryos. Targeted insertion of lacZ reporter with NEO cassette showed 27.1% (13/48 of targeting rate by ES cell transfection and 11.1% (2/18 by direct zygote injection. The procedures have bypassed in vitro transcription by directly co-injection of zygotes or co-transfection of embryonic stem cells with circular plasmid DNA. The methods are technically easy, time saving, and cost effective in generating mouse models and will certainly facilitate gene function studies.

  17. The FlyBase database of the Drosophila genome projects andcommunity literature

    Energy Technology Data Exchange (ETDEWEB)

    Gelbart, William; Bayraktaroglu, Leyla; Bettencourt, Brian; Campbell, Kathy; Crosby, Madeline; Emmert, David; Hradecky, Pavel; Huang,Yanmei; Letovsky, Stan; Matthews, Beverly; Russo, Susan; Schroeder,Andrew; Smutniak, Frank; Zhou, Pinglei; Zytkovicz, Mark; Ashburner,Michael; Drysdale, Rachel; de Grey, Aubrey; Foulger, Rebecca; Millburn,Gillian; Yamada, Chihiro; Kaufman, Thomas; Matthews, Kathy; Gilbert, Don; Grumbling, Gary; Strelets, Victor; Shemen, C.; Rubin, Gerald; Berman,Brian; Frise, Erwin; Gibson, Mark; Harris, Nomi; Kaminker, Josh; Lewis,Suzanna; Marshall, Brad; Misra, Sima; Mungall, Christopher; Prochnik,Simon; Richter, John; Smith, Christopher; Shu, ShengQiang; Tupy,Jonathan; Wiel, Colin

    2002-09-16

    FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.

  18. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.

    Science.gov (United States)

    Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.

  19. An analysis of possible off target effects following CAS9/CRISPR targeted deletions of neuropeptide gene enhancers from the mouse genome.

    Science.gov (United States)

    Hay, Elizabeth Anne; Khalaf, Abdulla Razak; Marini, Pietro; Brown, Andrew; Heath, Karyn; Sheppard, Darrin; MacKenzie, Alasdair

    2017-08-01

    We have successfully used comparative genomics to identify putative regulatory elements within the human genome that contribute to the tissue specific expression of neuropeptides such as galanin and receptors such as CB1. However, a previous inability to rapidly delete these elements from the mouse genome has prevented optimal assessment of their function in-vivo. This has been solved using CAS9/CRISPR genome editing technology which uses a bacterial endonuclease called CAS9 that, in combination with specifically designed guide RNA (gRNA) molecules, cuts specific regions of the mouse genome. However, reports of "off target" effects, whereby the CAS9 endonuclease is able to cut sites other than those targeted, limits the appeal of this technology. We used cytoplasmic microinjection of gRNA and CAS9 mRNA into 1-cell mouse embryos to rapidly generate enhancer knockout mouse lines. The current study describes our analysis of the genomes of these enhancer knockout lines to detect possible off-target effects. Bioinformatic analysis was used to identify the most likely putative off-target sites and to design PCR primers that would amplify these sequences from genomic DNA of founder enhancer deletion mouse lines. Amplified DNA was then sequenced and blasted against the mouse genome sequence to detect off-target effects. Using this approach we were unable to detect any evidence of off-target effects in the genomes of three founder lines using any of the four gRNAs used in the analysis. This study suggests that the problem of off-target effects in transgenic mice have been exaggerated and that CAS9/CRISPR represents a highly effective and accurate method of deleting putative neuropeptide gene enhancer sequences from the mouse genome. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  20. Accelerating discovery for complex neurological and behavioral disorders through systems genetics and integrative genomics in the laboratory mouse.

    Science.gov (United States)

    Bubier, Jason A; Chesler, Elissa J

    2012-04-01

    Recent advances in systems genetics and integrative functional genomics have greatly improved the study of complex neurological and behavioral traits. The methods developed for the integrated characterization of new, high-resolution mouse genetic reference populations and systems genetics enable behavioral geneticists an unprecedented opportunity to address questions of the molecular basis of neurological and psychiatric disorders and their comorbidities. Integrative genomics augment these strategies by enabling rapid informatics-assisted candidate gene prioritization, cross-species translation, and mechanistic comparison across related disorders from a wealth of existing data in mouse and other model organisms. Ultimately, through these complementary approaches, finding the mechanisms and sources of genetic variation underlying complex neurobehavioral disease related traits is becoming tractable. Furthermore, these methods enable categorization of neurobehavioral disorders through their underlying biological basis. Together, these model organism-based approaches can lead to a refinement of diagnostic categories and targeted treatment of neurological and psychiatric disease.

  1. License - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available f you use data from this database, please be sure attribute this database as follows: ... PGDBj Registered plan... Policy | Contact Us License - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ... ...switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods

  2. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life.

    Science.gov (United States)

    Elbourne, Liam D H; Tetu, Sasha G; Hassan, Karl A; Paulsen, Ian T

    2017-01-04

    All cellular life contains an extensive array of membrane transport proteins. The vast majority of these transporters have not been experimentally characterized. We have developed a bioinformatic pipeline to identify and annotate complete sets of transporters in any sequenced genome. This pipeline is now fully automated enabling it to better keep pace with the accelerating rate of genome sequencing. This manuscript describes TransportDB 2.0 (http://www.membranetransport.org/transportDB2/), a completely updated version of TransportDB, which provides access to the large volumes of data generated by our automated transporter annotation pipeline. The TransportDB 2.0 web portal has been rebuilt to utilize contemporary JavaScript libraries, providing a highly interactive interface to the annotation information, and incorporates analysis tools that enable users to query the database on a number of levels. For example, TransportDB 2.0 includes tools that allow users to select annotated genomes of interest from the thousands of species held in the database and compare their complete transporter complements.

  3. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life

    Science.gov (United States)

    Elbourne, Liam D. H.; Tetu, Sasha G.; Hassan, Karl A.; Paulsen, Ian T.

    2017-01-01

    All cellular life contains an extensive array of membrane transport proteins. The vast majority of these transporters have not been experimentally characterized. We have developed a bioinformatic pipeline to identify and annotate complete sets of transporters in any sequenced genome. This pipeline is now fully automated enabling it to better keep pace with the accelerating rate of genome sequencing. This manuscript describes TransportDB 2.0 (http://www.membranetransport.org/transportDB2/), a completely updated version of TransportDB, which provides access to the large volumes of data generated by our automated transporter annotation pipeline. The TransportDB 2.0 web portal has been rebuilt to utilize contemporary JavaScript libraries, providing a highly interactive interface to the annotation information, and incorporates analysis tools that enable users to query the database on a number of levels. For example, TransportDB 2.0 includes tools that allow users to select annotated genomes of interest from the thousands of species held in the database and compare their complete transporter complements. PMID:27899676

  4. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2007-01-01

    NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2,879,860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.

  5. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Ussery, David

    2004-01-01

    , these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web...... content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues....

  6. Clarifying the biological significance of the CHK2 K373E somatic mutation discovered in The Cancer Genome Atlas database.

    Science.gov (United States)

    Higashiguchi, Masayoshi; Nagatomo, Izumi; Kijima, Takashi; Morimura, Osamu; Miyake, Kotaro; Minami, Toshiyuki; Koyama, Shohei; Hirata, Haruhiko; Iwahori, Kota; Takimoto, Takayuki; Takeda, Yoshito; Kida, Hiroshi; Kumanogoh, Atsushi

    2016-12-01

    We identified CHK2 K373E as a recurrent mutation in The Cancer Genome Atlas (TCGA) database. In this study, we demonstrate that the K373E mutation disrupts CHK2 autophosphorylation as well as kinase activity, thus leading to impairment of CHK2 functions in suppressing cell proliferation and promoting cell survival after ionizing radiation. We propose that K373E impairs p53-independent induction of p21(WAF)(1/)(CIP)(1) by CHK2. Our data implicate the K373E mutation of CHK2 in tumorigenesis. © 2016 Federation of European Biochemical Societies.

  7. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes

    OpenAIRE

    Kumar, Pankaj; Chaitanya, Pasumarthy S.; Nagarajaram, Hampapathalu A

    2010-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1–6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in s...

  8. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    Science.gov (United States)

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830

  9. Genome-Scale Assessment of Age-Related DNA Methylation Changes in Mouse Spermatozoa

    Science.gov (United States)

    Kobayashi, Norio; Okae, Hiroaki; Hiura, Hitoshi; Chiba, Hatsune; Shirakata, Yoshiki; Hara, Kenshiro; Tanemura, Kentaro; Arima, Takahiro

    2016-01-01

    DNA methylation plays important roles in the production and functioning of spermatozoa. Recent studies have suggested that DNA methylation patterns in spermatozoa can change with age, but the regions susceptible to age-related methylation changes remain to be fully elucidated. In this study, we conducted genome-scale DNA methylation profiling of spermatozoa obtained from C57BL/6N mice at 8 weeks (8w), 18 weeks (18w) and 17 months of age (17m). There was no substantial difference in the global DNA methylation patterns between 18w and 17m samples except for a slight increase of methylation levels in long interspersed nuclear elements in the 17m samples. We found that maternally methylated imprinting control regions (mICRs) and spermatogenesis-related gene promoters had 5–10% higher methylation levels in 8w samples than in 18w or 17m samples. Analysis of individual sequence reads suggested that these regions were fully methylated (80–100%) in a subset of 8w spermatozoa. These regions are also known to be highly methylated in a subset of postnatal spermatogonia, which might be the source of the increased DNA methylation in 8w spermatozoa. Another possible source was contamination by somatic cells. Although we carefully purified the spermatozoa, it was difficult to completely exclude the possibility of somatic cell contamination. Further studies are needed to clarify the source of the small increase in DNA methylation in the 8w samples. Overall, our findings suggest that DNA methylation patterns in mouse spermatozoa are relatively stable throughout reproductive life. PMID:27880848

  10. Genomic organization and the tissue distribution of alternatively spliced isoforms of the mouse Spatial gene

    Directory of Open Access Journals (Sweden)

    Mattei Marie-Geneviève

    2004-07-01

    Full Text Available Abstract Background The stromal component of the thymic microenvironment is critical for T lymphocyte generation. Thymocyte differentiation involves a cascade of coordinated stromal genes controlling thymocyte survival, lineage commitment and selection. The "Stromal Protein Associated with Thymii And Lymph-node" (Spatial gene encodes a putative transcription factor which may be involved in T-cell development. In the testis, the Spatial gene is also expressed by round spermatids during spermatogenesis. Results The Spatial gene maps to the B3-B4 region of murine chromosome 10 corresponding to the human syntenic region 10q22.1. The mouse Spatial genomic DNA is organised into 10 exons and is alternatively spliced to generate two short isoforms (Spatial-α and -γ and two other long isoforms (Spatial-δ and -ε comprising 5 additional exons on the 3' site. Here, we report the cloning of a new short isoform, Spatial-β, which differs from other isoforms by an additional alternative exon of 69 bases. This new exon encodes an interesting proline-rich signature that could confer to the 34 kDa Spatial-β protein a particular function. By quantitative TaqMan RT-PCR, we have shown that the short isoforms are highly expressed in the thymus while the long isoforms are highly expressed in the testis. We further examined the inter-species conservation of Spatial between several mammals and identified that the protein which is rich in proline and positive amino acids, is highly conserved. Conclusions The Spatial gene generates at least five alternative spliced variants: three short isoforms (Spatial-α, -β and -γ highly expressed in the thymus and two long isoforms (Spatial-δ and -ε highly expressed in the testis. These alternative spliced variants could have a tissue specific function.

  11. The Littorina sequence database (LSD)--an online resource for genomic data.

    Science.gov (United States)

    Canbäck, Björn; André, Carl; Galindo, Juan; Johannesson, Kerstin; Johansson, Tomas; Panova, Marina; Tunlid, Anders; Butlin, Roger

    2012-01-01

    We present an interactive, searchable expressed sequence tag database for the periwinkle snail Littorina saxatilis, an upcoming model species in evolutionary biology. The database is the result of a hybrid assembly between Sanger and 454 sequences, 1290 and 147,491 sequences respectively. Normalized and non-normalized cDNA was obtained from different ecotypes of L. saxatilis collected in the UK and Sweden. The Littorina sequence database (LSD) contains 26,537 different contigs, of which 2453 showed similarity with annotated proteins in UniProt. Querying the LSD permits the selection of the taxonomic origin of blast hits for each contig, and the search can be restricted to particular taxonomic groups. The database allows access to UniProt annotations, blast output, protein family domains (PFAM) and Gene Ontology. The database will allow users to search for genetic markers and identifying candidate genes or genes for expression analyses. It is open for additional deposition of sequence information for L. saxatilis and other species of the genus Littorina. The LSD is available at http://mbio-serv2.mbioekol.lu.se/Littorina/.

  12. Development of genomic resources for the prairie vole (Microtus ochrogaster: construction of a BAC library and vole-mouse comparative cytogenetic map

    Directory of Open Access Journals (Sweden)

    Young Larry J

    2010-01-01

    Full Text Available Abstract Background The prairie vole (Microtus ochrogaster is a premier animal model for understanding the genetic and neurological basis of social behaviors. Unlike other biomedical models, prairie voles display a rich repertoire of social behaviors including the formation of long-term pair bonds and biparental care. However, due to a lack of genomic resources for this species, studies have been limited to a handful of candidate genes. To provide a substrate for future development of genomic resources for this unique model organism, we report the construction and characterization of a bacterial artificial chromosome (BAC library from a single male prairie vole and a prairie vole-mouse (Mus musculus comparative cytogenetic map. Results We constructed a prairie vole BAC library (CHORI-232 consisting of 194,267 recombinant clones with an average insert size of 139 kb. Hybridization-based screening of the gridded library at 19 loci established that the library has an average depth of coverage of ~10×. To obtain a small-scale sampling of the prairie vole genome, we generated 3884 BAC end-sequences totaling ~2.8 Mb. One-third of these BAC-end sequences could be mapped to unique locations in the mouse genome, thereby anchoring 1003 prairie vole BAC clones to an orthologous position in the mouse genome. Fluorescence in situ hybridization (FISH mapping of 62 prairie vole clones with BAC-end sequences mapping to orthologous positions in the mouse genome was used to develop a first-generation genome-wide prairie vole-mouse comparative cytogenetic map. While conserved synteny was observed between this pair of rodent genomes, rearrangements between the prairie vole and mouse genomes were detected, including a minimum of five inversions and 16 inter-chromosomal rearrangements. Conclusions The construction of the prairie vole BAC library and the vole-mouse comparative cytogenetic map represent the first genome-wide modern genomic resources developed for this

  13. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    Science.gov (United States)

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  14. FGF: a web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...... to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...... is freely available on a web server at http://fgf.genomics.org.cn/...

  15. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...... to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...... is freely available on a web server at http://fgf.genomics.org.cn/...

  16. Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq.

    Science.gov (United States)

    Sun, Hao; Wu, Jiejun; Wickramasinghe, Priyankara; Pal, Sharmistha; Gupta, Ravi; Bhattacharyya, Anirban; Agosto-Perez, Francisco J; Showe, Louise C; Huang, Tim H-M; Davuluri, Ramana V

    2011-01-01

    Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.

  17. An encyclopedia of mouse DNA elements (Mouse ENCODE).

    Science.gov (United States)

    Stamatoyannopoulos, John A; Snyder, Michael; Hardison, Ross; Ren, Bing; Gingeras, Thomas; Gilbert, David M; Groudine, Mark; Bender, Michael; Kaul, Rajinder; Canfield, Theresa; Giste, Erica; Johnson, Audra; Zhang, Mia; Balasundaram, Gayathri; Byron, Rachel; Roach, Vaughan; Sabo, Peter J; Sandstrom, Richard; Stehling, A Sandra; Thurman, Robert E; Weissman, Sherman M; Cayting, Philip; Hariharan, Manoj; Lian, Jin; Cheng, Yong; Landt, Stephen G; Ma, Zhihai; Wold, Barbara J; Dekker, Job; Crawford, Gregory E; Keller, Cheryl A; Wu, Weisheng; Morrissey, Christopher; Kumar, Swathi A; Mishra, Tejaswini; Jain, Deepti; Byrska-Bishop, Marta; Blankenberg, Daniel; Lajoie, Bryan R; Jain, Gaurav; Sanyal, Amartya; Chen, Kaun-Bei; Denas, Olgert; Taylor, James; Blobel, Gerd A; Weiss, Mitchell J; Pimkin, Max; Deng, Wulan; Marinov, Georgi K; Williams, Brian A; Fisher-Aylor, Katherine I; Desalvo, Gilberto; Kiralusha, Anthony; Trout, Diane; Amrhein, Henry; Mortazavi, Ali; Edsall, Lee; McCleary, David; Kuan, Samantha; Shen, Yin; Yue, Feng; Ye, Zhen; Davis, Carrie A; Zaleski, Chris; Jha, Sonali; Xue, Chenghai; Dobin, Alex; Lin, Wei; Fastuca, Meagan; Wang, Huaien; Guigo, Roderic; Djebali, Sarah; Lagarde, Julien; Ryba, Tyrone; Sasaki, Takayo; Malladi, Venkat S; Cline, Melissa S; Kirkup, Vanessa M; Learned, Katrina; Rosenbloom, Kate R; Kent, W James; Feingold, Elise A; Good, Peter J; Pazin, Michael; Lowdon, Rebecca F; Adams, Leslie B

    2012-08-13

    To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.

  18. Sequence Collection - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available English ]; } else { document.getElementById(lang).innerHTML= '[ Japanese | English ]'; } } window.onload = ...ane protein predictions. For a genome that have multiple chromosomes, the entry set for each chormosome is g...aryota) Organism Name Name of the organism. Chromosome number is added to the name if the organism has multiple chromo

  19. Bridging the gap between Big Genome Data Analysis and Database Management Systems

    NARCIS (Netherlands)

    Cijvat, C.P.

    2014-01-01

    The bioinformatics field has encountered a data deluge over the last years, due to in- creasing speed and decreasing cost of DNA sequencing technology. Today, sequencing the DNA of a single genome only takes about a week, and it can result in up to a ter- abyte of data. The sequencing data are usual

  20. Genome mapping data table of Drosophila GAL4 enhancer trap lines (Clone List) - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available GETDB Genome mapping data table of Drosophila GAL4 enhancer trap lines (Clone List) Data detail Data name Ge...nome mapping data table of Drosophila GAL4 enhancer trap lines (Clone List) Description of data contents A t...able showing the insert position of the Drosophila GAL4 enhancer trap element and...iption Clone Name Name of the clone of the genome sequence adjacent to the 5'-end of the Drosophila GAL4 enhancer trap...date History of This Database Site Policy | Contact Us Genome mapping data table of Drosophila GAL4 enhancer trap lines (Clone List) - GETDB | LSDB Archive ...

  1. Defending the genome from the enemy within: mechanisms of retrotransposon suppression in the mouse germline.

    Science.gov (United States)

    Crichton, James H; Dunican, Donncha S; Maclennan, Marie; Meehan, Richard R; Adams, Ian R

    2014-05-01

    The viability of any species requires that the genome is kept stable as it is transmitted from generation to generation by the germ cells. One of the challenges to transgenerational genome stability is the potential mutagenic activity of transposable genetic elements, particularly retrotransposons. There are many different types of retrotransposon in mammalian genomes, and these target different points in germline development to amplify and integrate into new genomic locations. Germ cells, and their pluripotent developmental precursors, have evolved a variety of genome defence mechanisms that suppress retrotransposon activity and maintain genome stability across the generations. Here, we review recent advances in understanding how retrotransposon activity is suppressed in the mammalian germline, how genes involved in germline genome defence mechanisms are regulated, and the consequences of mutating these genome defence genes for the developing germline.

  2. Index of /data/mouse-b6n-bac-clone-db/20120215 [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Index of /data/mouse-b6n-bac-clone-db/20120215 Name Last modified Size Description ...Parent Directory - README.html 02-Apr-2014 10:25 8.1K mouse_b6n_bac_clone.zip 15-Feb-2012 13:33 64M Index of /data/mouse-b6n-bac-clone-db/20120215 ...

  3. Genome-Wide Expression Profiling of Five Mouse Models Identifies Similarities and Differences with Human Psoriasis

    NARCIS (Netherlands)

    Swindell, William R.; Johnston, Andrew; Carbajal, Steve; Han, Gangwen; Wohn, Christian; Lu, Jun; Xing, Xianying; Nair, Rajan P.; Voorhees, John J.; Elder, James T.; Wang, Xiao-Jing; Sano, Shigetoshi; Prens, Errol P.; DiGiovanni, John; Pittelkow, Mark R.; Ward, Nicole L.; Gudjonsson, Johann E.

    2011-01-01

    Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the huma

  4. Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis

    NARCIS (Netherlands)

    W.R. Swindell (William R.); A. Johnston (Andrew); S. Carbajal (Steve); G. Han (Gangwen); C.T. Wohn (Christopher); J. Lu (Jun); X. Xing (Xianying); R.P. Nair (Rajan P.); J.J. Voorhees (John); J.T. Elder (James); X.J. Wang (Xian Jiang); S. Sano (Shigetoshi); E.P. Prens (Errol); J. DiGiovanni (John); M.R. Pittelkow (Mark R.); N.L. Ward (Nicole); J.E. Gudjonsson (Johann Eli)

    2011-01-01

    textabstractDevelopment of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features

  5. Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis

    NARCIS (Netherlands)

    W.R. Swindell (William R.); A. Johnston (Andrew); S. Carbajal (Steve); G. Han (Gangwen); C.T. Wohn (Christopher); J. Lu (Jun); X. Xing (Xianying); R.P. Nair (Rajan P.); J.J. Voorhees (John); J.T. Elder (James); X.J. Wang (Xian Jiang); S. Sano (Shigetoshi); E.P. Prens (Errol); J. DiGiovanni (John); M.R. Pittelkow (Mark R.); N.L. Ward (Nicole); J.E. Gudjonsson (Johann Eli)

    2011-01-01

    textabstractDevelopment of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features

  6. Genomic and protein structural maps of adaptive evolution of human influenza A virus to increased virulence in the mouse.

    Directory of Open Access Journals (Sweden)

    Jihui Ping

    Full Text Available Adaptive evolution is characterized by positive and parallel, or repeated selection of mutations. Mouse adaptation of influenza A virus (IAV produces virulent mutants that demonstrate positive and parallel evolution of mutations in the hemagglutinin (HA receptor and non-structural protein 1 (NS1 interferon antagonist genes. We now present a genomic analysis of all 11 genes of 39 mouse adapted IAV variants from 10 replicate adaptation experiments. Mutations were mapped on the primary and structural maps of each protein and specific mutations were validated with respect to virulence, replication, and RNA polymerase activity. Mouse adapted (MA variants obtained after 12 or 20-21 serial infections acquired on average 5.8 and 7.9 nonsynonymous mutations per genome of 11 genes, respectively. Among a total of 115 nonsynonymous mutations, 51 demonstrated properties of natural selection including 27 parallel mutations. The greatest degree of parallel evolution occurred in the HA receptor and ribonucleocapsid components, polymerase subunits (PB1, PB2, PA and NP. Mutations occurred in host nuclear trafficking factor binding sites as well as sites of virus-virus protein subunit interaction for NP, NS1, HA and NA proteins. Adaptive regions included cap binding and endonuclease domains in the PB2 and PA polymerase subunits. Four mutations in NS1 resulted in loss of binding to the host cleavage and polyadenylation specificity factor (CPSF30 suggesting that a reduction in inhibition of host gene expression was being selected. The most prevalent mutations in PB2 and NP were shown to increase virulence but differed in their ability to enhance replication and demonstrated epistatic effects. Several positively selected RNA polymerase mutations demonstrated increased virulence associated with >300% enhanced polymerase activity. Adaptive mutations that control host range and virulence were identified by their repeated selection to comprise a defined model for

  7. Citrus sinensis annotation project (CAP: a comprehensive database for sweet orange genome.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia, and constructed the Citrus sinensis annotation project (CAP to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  8. Genome Transfer Prevents Fragmentation and Restores Developmental Potential of Developmentally Compromised Postovulatory Aged Mouse Oocytes

    Directory of Open Access Journals (Sweden)

    Mitsutoshi Yamada

    2017-03-01

    Full Text Available Changes in oocyte quality can have great impact on the developmental potential of early embryos. Here we test whether nuclear genome transfer from a developmentally incompetent to a developmentally competent oocyte can restore developmental potential. Using in vitro oocyte aging as a model system we performed nuclear transfer in mouse oocytes at metaphase II or at the first interphase, and observed that development to the blastocyst stage and to term was as efficient as in control embryos. The increased developmental potential is explained primarily by correction of abnormal cytokinesis at anaphase of meiosis and mitosis, by a reduction in chromosome segregation errors, and by normalization of the localization of chromosome passenger complex components survivin and cyclin B1. These observations demonstrate that developmental decline is primarily due to abnormal function of cytoplasmic factors involved in cytokinesis, while the genome remains developmentally fully competent.

  9. A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis

    Directory of Open Access Journals (Sweden)

    Denoeud France

    2001-03-01

    Full Text Available Abstract Background Some pathogenic bacteria are genetically very homogeneous, making strain discrimination difficult. In the last few years, tandem repeats have been increasingly recognized as markers of choice for genotyping a number of pathogens. The rapid evolution of these structures appears to contribute to the phenotypic flexibility of pathogens. The availability of whole-genome sequences has opened the way to the systematic evaluation of tandem repeats diversity and application to epidemiological studies. Results This report presents a database (http://minisatellites.u-psud.fr of tandem repeats from publicly available bacterial genomes which facilitates the identification and selection of tandem repeats. We illustrate the use of this database by the characterization of minisatellites from two important human pathogens, Yersinia pestis and Bacillus anthracis. In order to avoid simple sequence contingency loci which may be of limited value as epidemiological markers, and to provide genotyping tools amenable to ordinary agarose gel electrophoresis, only tandem repeats with repeat units at least 9 bp long were evaluated. Yersinia pestis contains 64 such minisatellites in which the unit is repeated at least 7 times. An additional collection of 12 loci with at least 6 units, and a high internal conservation were also evaluated. Forty-nine are polymorphic among five Yersinia strains (twenty-five among three Y. pestis strains. Bacillus anthracis contains 30 comparable structures in which the unit is repeated at least 10 times. Half of these tandem repeats show polymorphism among the strains tested. Conclusions Analysis of the currently available bacterial genome sequences classifies Bacillus anthracis and Yersinia pestis as having an average (approximately 30 per Mb density of tandem repeat arrays longer than 100 bp when compared to the other bacterial genomes analysed to date. In both cases, testing a fraction of these sequences for

  10. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure

    DEFF Research Database (Denmark)

    Torarinsson, Elfar; Sawera, Milena; Havgaard, Jakob Hull

    2006-01-01

    overlapped by transfrags than regions that are not overlapped by transfrags. To verify the coexpression between predicted candidates in human and mouse, we conducted expression studies by RT-PCR and Northern blotting on mouse candidates, which overlap with transfrags on human chromosome 20. RT-PCR results...... confirmed expression of 32 out of 36 candidates, whereas Northern blots confirmed four out of 12 candidates. Furthermore, many RT-PCR results indicate differential expression in different tissues. Hence, our findings suggest that there are corresponding regions between human and mouse, which contain...

  11. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF......Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...

  12. Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

    CERN Document Server

    Cox, Anthony J; Jakobi, Tobias; Rosone, Giovanna

    2012-01-01

    Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the length of the reads and the level of sampling of the underlying genome and compare choices of second-stage compression algorithm. We demonstrate that compression may be greatly improved by a particular reordering of the sequences in the collection and give a novel `implicit sorting' strategy that enables these benefits to be re...

  13. The New Genomics: What Molecular Databases Can Tell Us About Human Population Variation and Endocrine Disease.

    Science.gov (United States)

    Rotwein, Peter

    2017-07-01

    Major recent advances in genetics and genomics present unique opportunities for enhancing our understanding of human physiology and disease predisposition. Here I demonstrate how analysis of genomic information can provide new insights into endocrine systems, using the human growth hormone (GH) signaling pathway as an illustrative example. GH is essential for normal postnatal growth in children, and plays important roles in other biological processes throughout life. GH actions are mediated by the GH receptor, primarily via the JAK2 protein tyrosine kinase and the STAT5B transcription factor, and inactivating mutations in this pathway all lead to impaired somatic growth. Variation in GH signaling genes has been evaluated using DNA sequence data from the Exome Aggregation Consortium, a compendium of information from >60,000 individuals. Results reveal many potential missense and other alterations in the coding regions of GH1, GHR, JAK2, and STAT5B, with most changes being uncommon. The total number of different alleles per gene varied by ~threefold, from 101 for GH1 to 338 for JAK2. Several known disease-linked mutations in GH1, GHR, and JAK2 were present but infrequent in the population; however, three amino acid changes in GHR were sufficiently prevalent (~4% to 44% of chromosomes) to suggest that they are not disease causing. Collectively, these data provide new opportunities to understand how genetically driven variability in GH signaling and action may modify human physiology and disease. Copyright © 2017 Endocrine Society.

  14. Efficient genome engineering by targeted homologous recombination in mouse embryos using transcription activator-like effector nucleases.

    Science.gov (United States)

    Sommer, Daniel; Peters, Annika; Wirtz, Tristan; Mai, Maren; Ackermann, Justus; Thabet, Yasser; Schmidt, Jürgen; Weighardt, Heike; Wunderlich, F Thomas; Degen, Joachim; Schultze, Joachim L; Beyer, Marc

    2014-01-01

    Generation of mouse models by introducing transgenes using homologous recombination is critical for understanding fundamental biology and pathology of human diseases. Here we investigate whether artificial transcription activator-like effector nucleases (TALENs)-powerful tools that induce DNA double-strand breaks at specific genomic locations-can be combined with a targeting vector to induce homologous recombination for the introduction of a transgene in embryonic stem cells and fertilized murine oocytes. We describe the generation of a conditional mouse model using TALENs, which introduce double-strand breaks at the genomic locus of the special AT-rich sequence-binding protein-1 in combination with a large 14.4 kb targeting template vector. We report successful germline transmission of this allele and demonstrate its recombination in primary cells in the presence of Cre-recombinase. These results suggest that TALEN-assisted induction of DNA double-strand breaks can facilitate homologous recombination of complex targeting constructs directly in oocytes.

  15. The Importance of Biological Databases in Biological Discovery.

    Science.gov (United States)

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  16. Transgenerational developmental effects and genomic instability after X-irradiation of preimplantation embryos: Studies on two mouse strains

    Energy Technology Data Exchange (ETDEWEB)

    Jacquet, P., E-mail: pjacquet@sckcen.be [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium); Buset, J.; Neefs, M. [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium); Vankerkom, J. [Division of Environmental Research, VITO, Boeretang 200, B-2400 Mol (Belgium); Benotmane, M.A.; Derradji, H. [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium); Hildebrandt, G. [Department of Radiotherapy and Radiation Oncology, University of Leipzig, Stephanstrasse 9a, D-04103 Leipzig (Germany); Department of Radiotherapy, University of Rostock, Suedring 75, D-18059 Rostock (Germany); Baatout, S. [Molecular and Cellular Biology, Institute for Environment, Health and Safety, SCK.CEN, Boeretang 200, B-2400 Mol (Belgium)

    2010-05-01

    Recent results have shown that irradiation of a single cell, the zygote or 1-cell embryo of various mouse strains, could lead to congenital anomalies in the fetuses. In the Heiligenberger strain, a link between the radiation-induced congenital anomalies and the development of a genomic instability was also suggested. Moreover, further studies showed that in that strain, both congenital anomalies and genomic instability could be transmitted to the next generation. The aim of the experiments described in this paper was to investigate whether such non-targeted transgenerational effects could also be observed in two other radiosensitive mouse strains (CF1 and ICR), using lower radiation doses. Irradiation of the CF1 and ICR female zygotes with 0.2 or 0.4 Gy did not result in a decrease of their fertility after birth, when they had reached sexual maturity. Moreover, females of both strains that had been X-irradiated with 0.2 Gy exhibited higher rates of pregnancy, less resorptions and more living fetuses. Additionally, the mean weight of living fetuses in these groups had significantly increased. Exencephaly and dwarfism were observed in CF1 fetuses issued from control and X-irradiated females. In the control group of that strain, polydactyly and limb deformity were also found. The yields of abnormal fetuses did not differ significantly between the control and X-irradiated groups. Polydactyly, exencephaly and dwarfism were observed in fetuses issued from ICR control females. In addition to these anomalies, gastroschisis, curly tail and open eye were observed at low frequencies in ICR fetuses issued from X-irradiated females. Again, the frequencies of abnormal fetuses found in the different groups did not differ significantly. In both CF1 and ICR mouse strains, irradiation of female zygotes did not result in the development of a genomic instability in the next generation embryos. Overall, our results suggest that, at the moderate doses used, developmental defects

  17. The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae.

    Science.gov (United States)

    Teixeira, Miguel Cacho; Monteiro, Pedro Tiago; Guerreiro, Joana Fernandes; Gonçalves, Joana Pinho; Mira, Nuno Pereira; dos Santos, Sandra Costa; Cabrito, Tânia Rodrigues; Palma, Margarida; Costa, Catarina; Francisco, Alexandre Paulo; Madeira, Sara Cordeiro; Oliveira, Arlindo Limede; Freitas, Ana Teresa; Sá-Correia, Isabel

    2014-01-01

    The YEASTRACT (http://www.yeastract.com) information system is a tool for the analysis and prediction of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2013, this database contains over 200,000 regulatory associations between transcription factors (TFs) and target genes, including 326 DNA binding sites for 113 TFs. All regulatory associations stored in YEASTRACT were revisited and new information was added on the experimental conditions in which those associations take place and on whether the TF is acting on its target genes as activator or repressor. Based on this information, new queries were developed allowing the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. This release further offers tools to rank the TFs controlling a gene or genome-wide response by their relative importance, based on (i) the percentage of target genes in the data set; (ii) the enrichment of the TF regulon in the data set when compared with the genome; or (iii) the score computed using the TFRank system, which selects and prioritizes the relevant TFs by walking through the yeast regulatory network. We expect that with the new data and services made available, the system will continue to be instrumental for yeast biologists and systems biology researchers.

  18. Functional role of bacteriophage transfer RNAs: codon usage analysis of genomic sequences stored in the GENBANK/EMBL/DDBJ databases

    Directory of Open Access Journals (Sweden)

    T Kunisawa

    2006-01-01

    Full Text Available Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The data-analysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.

  19. Integration of mouse and human genome-wide association data identifies KCNIP4 as an asthma gene.

    Directory of Open Access Journals (Sweden)

    Blanca E Himes

    Full Text Available Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR. The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify asthma-related genes by integrating AHR associations in mouse with human genome-wide association study (GWAS data. We used Efficient Mixed Model Association (EMMA analysis to conduct a GWAS of baseline AHR measures from males and females of 31 mouse strains. Genes near or containing SNPs with EMMA p-values <0.001 were selected for further study in human GWAS. The results of the previously reported EVE consortium asthma GWAS meta-analysis consisting of 12,958 diverse North American subjects from 9 study centers were used to select a subset of homologous genes with evidence of association with asthma in humans. Following validation attempts in three human asthma GWAS (i.e., Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG and two human AHR GWAS (i.e., SHARP, DAG, the Kv channel interacting protein 4 (KCNIP4 gene was identified as nominally associated with both asthma and AHR at a gene- and SNP-level. In EVE, the smallest KCNIP4 association was at rs6833065 (P-value 2.9e-04, while the strongest associations for Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG were 1.5e-03, 1.0e-03, 3.1e-03 at rs7664617, rs4697177, rs4696975, respectively. At a SNP level, the strongest association across all asthma GWAS was at rs4697177 (P-value 1.1e-04. The smallest P-values for association with AHR were 2.3e-03 at rs11947661 in SHARP and 2.1e-03 at rs402802 in DAG. Functional studies are required to validate the potential involvement of KCNIP4 in modulating asthma susceptibility and/or AHR. Our results suggest that a useful approach to identify genes associated with human asthma is to leverage mouse AHR association data.

  20. CRISPR/Cas9-Mediated Genome Editing of Mouse Small Intestinal Organoids

    NARCIS (Netherlands)

    Schwank, Gerald; Clevers, Hans

    2016-01-01

    The CRISPR/Cas9 system is an RNA-guided genome-editing tool that has been recently developed based on the bacterial CRISPR-Cas immune defense system. Due to its versatility and simplicity, it rapidly became the method of choice for genome editing in various biological systems, including mammalian ce

  1. Draft Genome Sequences of Five Novel Polyketide Synthetase-Containing Mouse Escherichia coli Strains

    Science.gov (United States)

    Mannion, Anthony; Shen, Zeli; Feng, Yan; Garcia, Alexis

    2016-01-01

    We report herein the draft genomes of five novel Escherichia coli strains isolated from surveillance and experimental mice housed at MIT and the Whitehead Institute and describe their genomic characteristics in context with the polyketide synthetase (PKS)-containing pathogenic E. coli strains NC101, IHE3034, and A192PP.

  2. Genome patterns of selection and introgression of haplotypes in natural populations of the house mouse (Mus musculus.

    Directory of Open Access Journals (Sweden)

    Fabian Staubach

    Full Text Available General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1, homologues of human genes involved in adaptations (e.g. alpha-amylase genes or in genetic diseases (e.g. Huntingtin and Parkin. Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice

  3. Genome-based identification of cancer genes by proviral tagging in mouse retrovirus-induced T-cell lymphomas.

    Science.gov (United States)

    Kim, Rachel; Trubetskoy, Alla; Suzuki, Takeshi; Jenkins, Nancy A; Copeland, Neal G; Lenz, Jack

    2003-02-01

    The identification of tumor-inducing genes is a driving force for elucidating the molecular mechanisms underlying cancer. Many retroviruses induce tumors by insertion of viral DNA adjacent to cellular oncogenes, resulting in altered expression and/or structure of the encoded proteins. The availability of the mouse genome sequence now allows analysis of retroviral common integration sites in murine tumors to be used as a genetic screen for identification of large numbers of candidate cancer genes. By positioning the sequences of inverse PCR-amplified, virus-host junction fragments within the mouse genome, 19 target genes were identified in T-cell lymphomas induced by the retrovirus SL3-3. The candidate cancer genes included transcription factors (Fos, Gfi1, Lef1, Myb, Myc, Runx3, and Sox3), all three D cyclins, Ras signaling pathway components (Rras2/TC21 and Rasgrp1), and Cmkbr7/CCR7. The most frequent target was Rras2. Insertions as far as 57 kb away from the transcribed portion were associated with substantially increased transcription of Rras2, and no coding sequence mutations, including those typically involved in Ras activation, were detected. These studies demonstrate the power of genome-based analysis of retroviral insertion sites for cancer gene discovery, identify several new genes worth examining for a role in human cancer, and implicate the pathways in which those genes act in lymphomagenesis. They also provide strong genetic evidence that overexpression of unmutated Rras2 contributes to tumorigenesis, thus suggesting that it may also do so if it is inappropriately expressed in human tumors.

  4. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    Science.gov (United States)

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage.

  5. Human/mouse homology relationships

    Energy Technology Data Exchange (ETDEWEB)

    DeBry, R.W.; Seldin, M.F. [Duke Univ. Medical Center, Durham, NC (United States)

    1996-05-01

    Conservation of genomic organization in different mammalian species has long been recognized, but only recently has it been possible to examine these relationships systematically on a genome-wide scale in some detail. Mapping of several mammalian species in progressing rapidly, but by far the most detailed information is still to be found in the human and mouse databases. Perhaps the most important aspect of recent progress in genome mapping data. With mapping databases continuing to expand at a greater than linear rate, any attempt at a comprehensive comparative map is doomed to be out of date by the time it is published. However, we feel that it is valuable to provide a summary that is as nearly up to date as possible. We have made a particular effort to include recent human physical mapping data and to identify those mouse genes that have been well-mapped with respect to each other by virtue of having been examined in the same cross. As the human-mouse comparative map becomes more dense, it is not surprising that the observed number of conserved linkage groups continues to increase. Nadeau et al. placed 425 loci on both maps, which delineated over 100 conserved linkage groups. Copeland et al. put a total of 917 markers on both the human and the mouse maps, marking 101 segments of conserved linkage groups. In the present summary, we have placed 1416 loci, and these define at least 181 different conserved linkage groups. 47 refs., 1 fig.

  6. Functionally Charged Polystyrene Particles Activate Immortalized Mouse Microglia (BV2): Cellular and Genomic Response

    Science.gov (United States)

    The effect of particle surface charge on the biological activation of immortalized mouse microglia (BV2) was examined. Same size (~850-950 nm) spherical polystyrene microparticles (SPM) with net negative (carboxyl, COOH-) or positive (dimethyl amino, CH3)2

  7. Genome Editing in Mouse Spermatogonial Stem Cell Lines Using TALEN and Double-Nicking CRISPR/Cas9

    Directory of Open Access Journals (Sweden)

    Takuya Sato

    2015-07-01

    Full Text Available Mouse spermatogonial stem cells (SSCs can be cultured for multiplication and maintained for long periods while preserving their spermatogenic ability. Although the cultured SSCs, named germline stem (GS cells, are targets of genome modification, this process remains technically difficult. In the present study, we tested TALEN and double-nicking CRISPR/Cas9 on GS cells, targeting Rosa26 and Stra8 loci as representative genes dispensable and indispensable in spermatogenesis, respectively. Harvested GS cell colonies showed a high targeting efficiency with both TALEN and CRISPR/Cas9. The Rosa26-targeted GS cells differentiated into fertility-competent sperm following transplantation. On the other hand, Stra8-targeted GS cells showed defective spermatogenesis following transplantation, confirming its prime role in the initiation of meiosis. TALEN and CRISPR/Cas9, when applied in GS cells, will be valuable tools in the study of spermatogenesis and for revealing the genetic mechanism of spermatogenic failure.

  8. Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions.

    Science.gov (United States)

    Turner, Leslie M; Harr, Bettina

    2014-12-09

    Mapping hybrid defects in contact zones between incipient species can identify genomic regions contributing to reproductive isolation and reveal genetic mechanisms of speciation. The house mouse features a rare combination of sophisticated genetic tools and natural hybrid zones between subspecies. Male hybrids often show reduced fertility, a common reproductive barrier between incipient species. Laboratory crosses have identified sterility loci, but each encompasses hundreds of genes. We map genetic determinants of testis weight and testis gene expression using offspring of mice captured in a hybrid zone between M. musculus musculus and M. m. domesticus. Many generations of admixture enables high-resolution mapping of loci contributing to these sterility-related phenotypes. We identify complex interactions among sterility loci, suggesting multiple, non-independent genetic incompatibilities contribute to barriers to gene flow in the hybrid zone.

  9. Genome-wide copy number profiling of mouse neural stem cells during differentiation

    Directory of Open Access Journals (Sweden)

    U. Fischer

    2015-09-01

    Full Text Available There is growing evidence that gene amplifications were present in neural stem and progenitor cells during differentiation. We used array-CGH to discover copy number changes including gene amplifications and deletions during differentiation of mouse neural stem cells using TGF-ß and FCS for differentiation induction. Array data were deposited in GEO (Gene Expression Omnibus, NCBI under accession number GSE35523. Here, we describe in detail the cell culture features and our TaqMan qPCR-experiments to validate the array-CGH analysis. Interpretation of array-CGH experiments regarding gene amplifications in mouse and further detailed analysis of amplified chromosome regions associated with these experiments were published by Fischer and colleagues in Oncotarget (Fischer et al., 2015. We provide additional information on deleted chromosome regions during differentiation and give an impressive overview on copy number changes during differentiation induction at a time line.

  10. Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs

    OpenAIRE

    Jin, Victor X.; Singer, Gregory AC; Agosto-Pérez, Francisco J; Liyanarachchi, Sandya; Davuluri, Ramana V.

    2006-01-01

    Background The canonical core promoter elements consist of the TATA box, initiator (Inr), downstream core promoter element (DPE), TFIIB recognition element (BRE) and the newly-discovered motif 10 element (MTE). The motifs for these core promoter elements are highly degenerate, which tends to lead to a high false discovery rate when attempting to detect them in promoter sequences. Results In this study, we have performed the first analysis of these core promoter elements in orthologous mouse a...

  11. Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9.

    Science.gov (United States)

    Park, Jeongbin; Kim, Jin-Soo; Bae, Sangsu

    2016-07-01

    CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult. We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users. Free access at http://www.rgenome.net/cas-database/ sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. Application of oocyte cryopreservation technology in TALEN-mediated mouse genome editing.

    Science.gov (United States)

    Nakagawa, Yoshiko; Sakuma, Tetsushi; Nakagata, Naomi; Yamasaki, Sho; Takeda, Naoki; Ohmuraya, Masaki; Yamamoto, Takashi

    2014-01-01

    Reproductive engineering techniques, such as in vitro fertilization (IVF) and cryopreservation of embryos or spermatozoa, are essential for preservation, reproduction, and transportation of genetically engineered mice. However, it has not yet been elucidated whether these techniques can be applied for the generation of genome-edited mice using engineered nucleases such as transcription activator-like effector nucleases (TALENs). Here, we demonstrate the usefulness of frozen oocytes fertilized in vitro using frozen sperm for TALEN-mediated genome editing in mice. We examined side-by-side comparisons concerning sperm (fresh vs. frozen), fertilization method (mating vs. IVF), and fertilized oocytes (fresh vs. frozen) for the source of oocytes used for TALEN injection; we found that fertilized oocytes created under all tested conditions were applicable for TALEN-mediated mutagenesis. In addition, we investigated whether the ages in weeks of parental female mice can affect the efficiency of gene modification, by comparing 5-week-old and 8-12-week-old mice as the source of oocytes used for TALEN injection. The genome editing efficiency of an endogenous gene was consistently 95-100% when either 5-week-old or 8-12-week-old mice were used with or without freezing the oocytes. Thus, our report describes the availability of freeze-thawed oocytes and oocytes from female mice at various weeks of age for TALEN-mediated genome editing, thus boosting the convenience of such innovative gene targeting strategies.

  13. Chromosomal evolution of Arvicolinae (Cricetidae, Rodentia). I. The genome homology of tundra vole, field vole, mouse and golden hamster revealed by comparative chromosome painting.

    Science.gov (United States)

    Sitnikova, Natalia A; Romanenko, Svetlana A; O'Brien, Patricia C M; Perelman, Polina L; Fu, Beiyuan; Rubtsova, Nadezhda V; Serdukova, Natalya A; Golenishchev, Feodor N; Trifonov, Vladimir A; Ferguson-Smith, Malcolm A; Yang, Fengtang; Graphodatsky, Alexander S

    2007-01-01

    Cross-species chromosome painting has become the mainstay of comparative cytogenetic and chromosome evolution studies. Here we have made a set of chromosomal painting probes for the field vole (Microtus agrestis) by DOP-PCR amplification of flow-sorted chromosomes. Together with painting probes of golden hamster (Mesocricetus auratus) and mouse (Mus musculus), the field vole probes have been hybridized onto the metaphases of the tundra vole (Microtus oeconomus). A comparative chromosome map between these two voles, golden hamster and mouse has been established based on the results of cross-species chromosome painting and G-banding comparisons. The sets of paints from the field vole, golden hamster and mouse identified a total of 27, 40 and 47 homologous autosomal regions, respectively, in the genome of tundra vole; 16, 41 and 51 fusion/fission rearrangements differentiate the karyotype of the tundra vole from the karyotypes of the field vole, golden hamster and mouse, respectively.

  14. Gene mutations and genomic rearrangements in the mouse as a result of transposon mobilization from chromosomal concatemers.

    Directory of Open Access Journals (Sweden)

    Aron M Geurts

    2006-09-01

    Full Text Available Previous studies of the Sleeping Beauty (SB transposon system, as an insertional mutagen in the germline of mice, have used reverse genetic approaches. These studies have led to its proposed use for regional saturation mutagenesis by taking a forward-genetic approach. Thus, we used the SB system to mutate a region of mouse Chromosome 11 in a forward-genetic screen for recessive lethal and viable phenotypes. This work represents the first reported use of an insertional mutagen in a phenotype-driven approach. The phenotype-driven approach was successful in both recovering visible and behavioral mutants, including dominant limb and recessive behavioral phenotypes, and allowing for the rapid identification of candidate gene disruptions. In addition, a high frequency of recessive lethal mutations arose as a result of genomic rearrangements near the site of transposition, resulting from transposon mobilization. The results suggest that the SB system could be used in a forward-genetic approach to recover interesting phenotypes, but that local chromosomal rearrangements should be anticipated in conjunction with single-copy, local transposon insertions in chromosomes. Additionally, these mice may serve as a model for chromosome rearrangements caused by transposable elements during the evolution of vertebrate genomes.

  15. ATRX contributes to epigenetic asymmetry and silencing of major satellite transcripts in the maternal genome of the mouse embryo.

    Science.gov (United States)

    De La Fuente, Rabindranath; Baumann, Claudia; Viveiros, Maria M

    2015-05-15

    A striking proportion of human cleavage-stage embryos exhibit chromosome instability (CIN). Notably, until now, no experimental model has been described to determine the origin and mechanisms of complex chromosomal rearrangements. Here, we examined mouse embryos deficient for the chromatin remodeling protein ATRX to determine the cellular mechanisms activated in response to CIN. We demonstrate that ATRX is required for silencing of major satellite transcripts in the maternal genome, where it confers epigenetic asymmetry to pericentric heterochromatin during the transition to the first mitosis. This stage is also characterized by a striking kinetochore size asymmetry established by differences in CENP-C protein between the parental genomes. Loss of ATRX results in increased centromeric mitotic recombination, a high frequency of sister chromatid exchanges and double strand DNA breaks, indicating the formation of mitotic recombination break points. ATRX-deficient embryos exhibit a twofold increase in transcripts for aurora kinase B, the centromeric cohesin ESCO2, DNMT1, the ubiquitin-ligase (DZIP3) and the histone methyl transferase (EHMT1). Thus, loss of ATRX activates a pathway that integrates epigenetic modifications and DNA repair in response to chromosome breaks. These results reveal the cellular response of the cleavage-stage embryo to CIN and uncover a mechanism by which centromeric fission induces the formation of large-scale chromosomal rearrangements. Our results have important implications to determine the epigenetic origins of CIN that lead to congenital birth defects and early pregnancy loss, as well as the mechanisms involved in the oocyte to embryo transition.

  16. REDIdb: the RNA editing database.

    Science.gov (United States)

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

  17. MouseMine: a new data warehouse for MGI.

    Science.gov (United States)

    Motenko, H; Neuhauser, S B; O'Keefe, M; Richardson, J E

    2015-08-01

    MouseMine (www.mousemine.org) is a new data warehouse for accessing mouse data from Mouse Genome Informatics (MGI). Based on the InterMine software framework, MouseMine supports powerful query, reporting, and analysis capabilities, the ability to save and combine results from different queries, easy integration into larger workflows, and a comprehensive Web Services layer. Through MouseMine, users can access a significant portion of MGI data in new and useful ways. Importantly, MouseMine is also a member of a growing community of online data resources based on InterMine, including those established by other model organism databases. Adopting common interfaces and collaborating on data representation standards are critical to fostering cross-species data analysis. This paper presents a general introduction to MouseMine, presents examples of its use, and discusses the potential for further integration into the MGI interface.

  18. High-resolution magnetic resonance histology of the embryonic and neonatal mouse: a 4D atlas and morphologic database.

    Science.gov (United States)

    Petiet, Alexandra E; Kaufman, Matthew H; Goddeeris, Matthew M; Brandenburg, Jeffrey; Elmore, Susan A; Johnson, G Allan

    2008-08-26

    Engineered mice play an ever-increasing role in defining connections between genotype and phenotypic expression. The potential of magnetic resonance microscopy (MRM) for morphologic phenotyping in the mouse has previously been demonstrated; however, applications have been limited by long scan times, availability of the technology, and a foundation of normative data. This article describes an integrated environment for high-resolution study of normal, transgenic, and mutant mouse models at embryonic and neonatal stages. Three-dimensional images are shown at an isotropic resolution of 19.5 microm (voxel volumes of 8 pL), acquired in 3 h at embryonic days 10.5-19.5 (10 stages) and postnatal days 0-32 (6 stages). A web-accessible atlas encompassing this data was developed, and for critical stages of embryonic development (prenatal days 14.5-18.5), >200 anatomical structures have been identified and labeled. Also, matching optical histology and analysis tools are provided to compare multiple specimens at multiple developmental stages. The utility of the approach is demonstrated in characterizing cardiac septal defects in conditional mutant embryos lacking the Smoothened receptor gene. Finally, a collaborative paradigm is presented that allows sharing of data across the scientific community. This work makes magnetic resonance microscopy of the mouse embryo and neonate broadly available with carefully annotated normative data and an extensive environment for collaborations.

  19. A gene expression resource generated by genome-wide lacZ profiling in the mouse

    Directory of Open Access Journals (Sweden)

    Elizabeth Tuck

    2015-11-01

    Full Text Available Knowledge of the expression profile of a gene is a critical piece of information required to build an understanding of the normal and essential functions of that gene and any role it may play in the development or progression of disease. High-throughput, large-scale efforts are on-going internationally to characterise reporter-tagged knockout mouse lines. As part of that effort, we report an open access adult mouse expression resource, in which the expression profile of 424 genes has been assessed in up to 47 different organs, tissues and sub-structures using a lacZ reporter gene. Many specific and informative expression patterns were noted. Expression was most commonly observed in the testis and brain and was most restricted in white adipose tissue and mammary gland. Over half of the assessed genes presented with an absent or localised expression pattern (categorised as 0-10 positive structures. A link between complexity of expression profile and viability of homozygous null animals was observed; inactivation of genes expressed in ≥21 structures was more likely to result in reduced viability by postnatal day 14 compared with more restricted expression profiles. For validation purposes, this mouse expression resource was compared with Bgee, a federated composite of RNA-based expression data sets. Strong agreement was observed, indicating a high degree of specificity in our data. Furthermore, there were 1207 observations of expression of a particular gene in an anatomical structure where Bgee had no data, indicating a large amount of novelty in our data set. Examples of expression data corroborating and extending genotype-phenotype associations and supporting disease gene candidacy are presented to demonstrate the potential of this powerful resource.

  20. Role of the first mitosis in the remodeling of the parental genomes in mouse embryos

    Institute of Scientific and Technical Information of China (English)

    Hong Lin LIU; Kentaro T.HARA; Fugaku AOKI

    2005-01-01

    Although male and female pronuclei reside in the same zygotic cytoplasm, they differ in many respects, such as volume and transcriptional activity. The aim of this study is to investigate whether these differences are lost during the first mitosis. For this purpose, a new method was developed to inhibit the mixing of two parental chromosomes during mitosis, thus to induce the formation of two nuclei after they exit from the mitotic phase. In this method, one-cell embryos are arrested at metaphase by treatment with nocodazole, and whn exitting from the mitotic phase, two nuclei were formed in a single karyocyte following treatment with 6-dimethylaminopurine (6-DMAP). These embryos were designated as post-mitotic embryos (PM-embryos), in which the two nuclei were derived from the male and female genomes. We found that in the control one-cell embryos that had not been treated with the reagents, the volume of the male pronucleus was about 1.65-fold greater than that of the female pronucleus, whereas the volumes of the two nuclei in the PM-embryos were similar (volume ratio of 1.01). Although a two-fold difference in transcriptional activity was detected between the male and female pronuclei in the control embryos, no difference in transcriptional activity was detected between the two nuclei of PM-embryos. The ratio of transcriptional activity in the nucleus derived from the paternal genome to that from the maternal genome was 1.02, for which no significant difference was detected by the x2fitness test. Therefore, the volumes and transcriptional activities of the male and female nuclei were approximately equal in PM-embryos, which suggests that the asymmetries of pronuclear volume and transcriptional activity between male and female genomes are somehow losted during the first mitosis.

  1. Application of Oocyte Cryopreservation Technology in TALEN-Mediated Mouse Genome Editing

    OpenAIRE

    Nakagawa, Yoshiko; Sakuma, Tetsushi; Nakagata, Naomi; Yamasaki, Sho; Takeda, Naoki; Ohmuraya, Masaki; Yamamoto, Takashi

    2014-01-01

    Reproductive engineering techniques, such as in vitro fertilization (IVF) and cryopreservation of embryos or spermatozoa, are essential for preservation, reproduction, and transportation of genetically engineered mice. However, it has not yet been elucidated whether these techniques can be applied for the generation of genome-edited mice using engineered nucleases such as transcription activator-like effector nucleases (TALENs). Here, we demonstrate the usefulness of frozen oocytes fertilized...

  2. An efficient method to successively introduce transgenes into a given genomic locus in the mouse

    Directory of Open Access Journals (Sweden)

    Weiler Hartmut

    2001-06-01

    Full Text Available Abstract Background Expression of transgenes in mice requires transcriptional regulatory elements that direct expression in a chosen cell type. Unfortunately, the availability of well-characterized promoters that direct bona-fide expression of transgenes in transgenic mice is limited. Here we described a method that allows highly efficient targeting of transgenes to a preselected locus in ES cells. Results A pgk-LoxP-Neo cassette was introduced into a desired genomic locus by homologous recombination in ES cells. The pgk promoter was then removed from the targeted ES cells by Cre recombinase thereby restoring the ES cells' sensitivity to G418. We demonstrated that transgenes could be efficiently introduced into this genomic locus by reconstituting a functional Neo gene. Conclusion This approach is simple and extremely efficient in facilitating the introduction of single-copy transgenes into defined genomic loci. The availability of such an approach greatly enhances the ease of using endogenous regulatory elements to control transgene expression and, in turn, expands the repertoire of elements available for transgene expression.

  3. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine.

    Science.gov (United States)

    Stenson, Peter D; Mort, Matthew; Ball, Edward V; Shaw, Katy; Phillips, Andrew; Cooper, David N

    2014-01-01

    The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.

  4. CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies.

    Science.gov (United States)

    Nakamura, Yoji; Komiyama, Tomoyoshi; Furue, Motoki; Gojobori, Takashi; Akiyama, Yasuto

    2010-07-27

    Immunoglobulin (IG or antibody) and the T-cell receptor (TR) are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]). This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I) and 1,470 on hematological tumors (Group II). Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable. The CIG-DB serves as a bridge between immunological gene data and cancer studies, presenting annotation on IG, TR, and their epitopes. This database

  5. CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies

    Directory of Open Access Journals (Sweden)

    Furue Motoki

    2010-07-01

    Full Text Available Abstract Background Immunoglobulin (IG or antibody and the T-cell receptor (TR are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]. Description This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I and 1,470 on hematological tumors (Group II. Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable. Conclusions The CIG-DB serves as a bridge between immunological gene data and cancer studies, presenting

  6. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  7. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries.

    Science.gov (United States)

    Rodrigues, N B; Loverde, P T; Romanha, A J; Oliveira, G

    2002-01-01

    In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3%) sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds). Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8%) contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds). The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds). From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  8. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  9. Genomic imprinting variations in the mouse type 3 deiodinase gene between tissues and brain regions.

    Science.gov (United States)

    Martinez, M Elena; Charalambous, Marika; Saferali, Aabida; Fiering, Steven; Naumova, Anna K; St Germain, Donald; Ferguson-Smith, Anne C; Hernandez, Arturo

    2014-11-01

    The Dio3 gene, which encodes for the type 3 deiodinase (D3), controls thyroid hormone (TH) availability. The lack of D3 in mice results in tissue overexposure to TH and a broad neuroendocrine phenotype. Dio3 is an imprinted gene, preferentially expressed from the paternally inherited allele in the mouse fetus. However, heterozygous mice with paternal inheritance of the inactivating Dio3 mutation exhibit an attenuated phenotype when compared with that of Dio3 null mice. To investigate this milder phenotype, the allelic expression of Dio3 was evaluated in different mouse tissues. Preferential allelic expression of Dio3 from the paternal allele was observed in fetal tissues and neonatal brain regions, whereas the biallelic Dio3 expression occurred in the developing eye, testes, and cerebellum and in the postnatal brain neocortex, which expresses a larger Dio3 mRNA transcript. The newborn hypothalamus manifests the highest degree of Dio3 expression from the paternal allele, compared with other brain regions, and preferential allelic expression of Dio3 in the brain relaxed in late neonatal life. A methylation analysis of two regulatory regions of the Dio3 imprinted domain revealed modest but significant differences between tissues, but these did not consistently correlate with the observed patterns of Dio3 allelic expression. Deletion of the Dio3 gene and promoter did not result in significant changes in the tissue-specific patterns of Dio3 allelic expression. These results suggest the existence of unidentified epigenetic determinants of tissue-specific Dio3 imprinting. The resulting variation in the Dio3 allelic expression between tissues likely explains the phenotypic variation that results from paternal Dio3 haploinsufficiency.

  10. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure

    DEFF Research Database (Denmark)

    Torarinsson, Elfar; Sawera, Milena; Havgaard, Jakob Hull

    2006-01-01

    been investigated. Owing to the limitations in computational methods, comparative genomics has been lacking the ability to compare such nonconserved sequence regions for conserved structural RNA elements. We have investigated the presence of structural RNA elements by conducting a local structural...... alignment, using FOLDALIGN, on a subset of these 100,000 corresponding regions and estimate that 1800 contain common RNA structures. Comparing our results with the recent mapping of transcribed fragments (transfrags) in human, we find that high-scoring candidates are twice as likely to be found in regions...... expressed non-coding RNA sequences not alignable in primary sequence....

  11. Stimulatory effect of Echinacea purpurea extract on the trafficking activity of mouse dendritic cells: revealed by genomic and proteomic analyses.

    Science.gov (United States)

    Yin, Shu-Yi; Wang, Wen-Hsin; Wang, Bi-Xue; Aravindaram, Kandan; Hwang, Pei-Ing; Wu, Han-Ming; Yang, Ning-Sun

    2010-11-01

    Several Echinacea species have been used as nutraceuticals or botanical drugs for "immunostimulation", but scientific evidence supporting their therapeutic use is still controversial. In this study, a phytocompound mixture extracted from the butanol fraction (BF) of a stem and leaf (S+L) extract of E. purpurea ([BF/S+L/Ep]) containing stringently defined bioactive phytocompounds was obtained using standardized and published procedures. The transcriptomic and proteomic effects of this phytoextract on mouse bone marrow-derived dendritic cells (BMDCs) were analyzed using primary cultures. Treatment of BMDCs with [BF/S+L/Ep] did not significantly influence the phenotypic maturation activity of dendritic cells (DCs). Affymetrix DNA microarray and bioinformatics analyses of genes differentially expressed in DCs treated with [BF/S+L/Ep] for 4 or 12 h revealed that the majority of responsive genes were related to cell adhesion or motility (Cdh10, Itga6, Cdh1, Gja1 and Mmp8), or were chemokines (Cxcl2, Cxcl7) or signaling molecules (Nrxn1, Pkce and Acss1). TRANSPATH database analyses of gene expression and related signaling pathways in treated-DCs predicted the JNK, PP2C-α, AKT, ERK1/2 or MAPKAPK pathways as the putative targets of [BF/S+L/Ep]. In parallel, proteomic analysis showed that the expressions of metabolic-, cytoskeleton- or NF-κB signaling-related proteins were regulated by treatment with [BF/S+L/Ep]. In vitro flow cytometry analysis of chemotaxis-related receptors and in vivo cell trafficking assay further showed that DCs treated with [BF/S+L/Ep] were able to migrate more effectively to peripheral lymph node and spleen tissues than DCs treated as control groups. Results from this study suggest that [BF/S+L/Ep] modulates DC mobility and related cellular physiology in the mouse immune system. Moreover, the signaling networks and molecules highlighted here are potential targets for nutritional or clinical application of Echinacea or other candidate medicinal

  12. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  13. TcoF-DB v2: update of the database of human and mouse transcription co-factors and transcription factor interactions

    KAUST Repository

    Schmeier, Sebastian

    2016-10-17

    Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect information on TcoFs. In this article, we describe updates and improvements implemented in TcoF-DB v2. TcoF-DB v2 provides several new features that enables exploration of the roles of TcoFs. The content of the database has significantly expanded, and is enriched with information from Gene Ontology, biological pathways, diseases and molecular signatures. TcoF-DB v2 now includes many more TFs; has substantially increased the number of human TcoFs to 958, and now includes information on mouse (418 new TcoFs). TcoF-DB v2 enables the exploration of information on TcoFs and allows investigations into their influence on transcriptional regulation in humans and mice. TcoF-DB v2 can be accessed at http://tcofdb.org/.

  14. Genome-wide screen for differential DNA methylation associated with neural cell differentiation in mouse.

    Directory of Open Access Journals (Sweden)

    Rene Cortese

    Full Text Available Cellular differentiation involves widespread epigenetic reprogramming, including modulation of DNA methylation patterns. Using Differential Methylation Hybridization (DMH in combination with a custom DMH array containing 51,243 features covering more than 16,000 murine genes, we carried out a genome-wide screen for cell- and tissue-specific differentially methylated regions (tDMRs in undifferentiated embryonic stem cells (ESCs, in in-vitro induced neural stem cells (NSCs and 8 differentiated embryonic and adult tissues. Unsupervised clustering of the generated data showed distinct cell- and tissue-specific DNA methylation profiles, revealing 202 significant tDMRs (p1.96 enrichment for genes involved in neural differentiation, including, for example, Jag1 and Tcf4. Our results provide robust evidence for the relevance of DNA methylation in early neural development and identify novel marker candidates for neural cell differentiation.

  15. Pairing of Homologous Regions in the Mouse Genome Is Associated with Transcription but Not Imprinting Status

    Science.gov (United States)

    Krueger, Christel; King, Michelle R.; Krueger, Felix; Branco, Miguel R.; Osborne, Cameron S.; Niakan, Kathy K.; Higgins, Michael J.; Reik, Wolf

    2012-01-01

    Although somatic homologous pairing is common in Drosophila it is not generally observed in mammalian cells. However, a number of regions have recently been shown to come into close proximity with their homologous allele, and it has been proposed that pairing might be involved in the establishment or maintenance of monoallelic expression. Here, we investigate the pairing properties of various imprinted and non-imprinted regions in mouse tissues and ES cells. We find by allele-specific 4C-Seq and DNA FISH that the Kcnq1 imprinted region displays frequent pairing but that this is not dependent on monoallelic expression. We demonstrate that pairing involves larger chromosomal regions and that the two chromosome territories come close together. Frequent pairing is not associated with imprinted status or DNA repair, but is influenced by chromosomal location and transcription. We propose that homologous pairing is not exclusive to specialised regions or specific functional events, and speculate that it provides the cell with the opportunity of trans-allelic effects on gene regulation. PMID:22802932

  16. Retinoic acid receptors recognize the mouse genome through binding elements with diverse spacing and topology.

    Science.gov (United States)

    Moutier, Emmanuel; Ye, Tao; Choukrallah, Mohamed-Amin; Urban, Sylvia; Osz, Judit; Chatagnon, Amandine; Delacroix, Laurence; Langer, Diana; Rochel, Natacha; Moras, Dino; Benoit, Gerard; Davidson, Irwin

    2012-07-27

    Retinoic acid receptors (RARs) heterodimerize with retinoid X receptors (RXRs) and bind to RA response elements (RAREs) in the regulatory regions of their target genes. Although previous studies on limited sets of RA-regulated genes have defined canonical RAREs as direct repeats of the consensus RGKTCA separated by 1, 2, or 5 nucleotides (DR1, DR2, DR5), we show that in mouse embryoid bodies or F9 embryonal carcinoma cells, RARs occupy a large repertoire of sites with DR0, DR8, and IR0 (inverted repeat 0) elements. Recombinant RAR-RXR binds these non-canonical spacings in vitro with comparable affinities to DR2 and DR5. Most DR8 elements comprise three half-sites with DR2 and DR0 spacings. This specific half-site organization constitutes a previously unrecognized but frequent signature of RAR binding elements. In functional assays, DR8 and IR0 elements act as independent RAREs, whereas DR0 does not. Our results reveal an unexpected diversity in the spacing and topology of binding elements for the RAR-RXR heterodimer. The differential ability of RAR-RXR bound to DR0 compared to DR2, DR5, and DR8 to mediate RA-dependent transcriptional activation indicates that half-site spacing allosterically regulates RAR function.

  17. Genome-wide analysis of the p53 gene regulatory network in the developing mouse kidney.

    Science.gov (United States)

    Li, Yuwen; Liu, Jiao; McLaughlin, Nathan; Bachvarov, Dimcho; Saifudeen, Zubaida; El-Dahr, Samir S

    2013-10-16

    Despite mounting evidence that p53 senses and responds to physiological cues in vivo, existing knowledge regarding p53 function and target genes is largely derived from studies in cancer or stressed cells. Herein we utilize p53 transcriptome and ChIP-Seq (chromatin immunoprecipitation-high throughput sequencing) analyses to identify p53 regulated pathways in the embryonic kidney, an organ that develops via mesenchymal-epithelial interactions. This integrated approach allowed identification of novel genes that are possible direct p53 targets during kidney development. We find the p53-regulated transcriptome in the embryonic kidney is largely composed of genes regulating developmental, morphogenesis, and metabolic pathways. Surprisingly, genes in cell cycle and apoptosis pathways account for kidney lie within proximal promoters of annotated genes compared with 7% in a representative cancer cell line; 25% of the differentially expressed p53-bound genes are present in nephron progenitors and nascent nephrons, including key transcriptional regulators, components of Fgf, Wnt, Bmp, and Notch pathways, and ciliogenesis genes. The results indicate widespread p53 binding to the genome in vivo and context-dependent differences in the p53 regulon between cancer, stress, and development. To our knowledge, this is the first comprehensive analysis of the p53 transcriptome and cistrome in a developing mammalian organ, substantiating the role of p53 as a bona fide developmental regulator. We conclude p53 targets transcriptional networks regulating nephrogenesis and cellular metabolism during kidney development.

  18. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    Science.gov (United States)

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  19. Database Description - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us RMG Database... Description General information of database Database name RMG Alternative name Rice Mitochondri...ational Institute of Agrobiological Sciences E-mail : Database classification Nucleotide Sequence Databases ...Organism Taxonomy Name: Oryza sativa Japonica Group Taxonomy ID: 39947 Database description This database co...e of rice mitochondrial genome and information on the analysis results. Features and manner of utilization of database

  20. Correlation between sequence conservation and structural thermodynamics of microRNA precursors from human, mouse, and chicken genomes

    Directory of Open Access Journals (Sweden)

    Wang Shengqi

    2010-10-01

    Full Text Available Abstract Background Previous studies have shown that microRNA precursors (pre-miRNAs have considerably more stable secondary structures than other native RNAs (tRNA, rRNA, and mRNA and artificial RNA sequences. However, pre-miRNAs with ultra stable secondary structures have not been investigated. It is not known if there is a tendency in pre-miRNA sequences towards or against ultra stable structures? Furthermore, the relationship between the structural thermodynamic stability of pre-miRNA and their evolution remains unclear. Results We investigated the correlation between pre-miRNA sequence conservation and structural stability as measured by adjusted minimum folding free energies in pre-miRNAs isolated from human, mouse, and chicken. The analysis revealed that conserved and non-conserved pre-miRNA sequences had structures with similar average stabilities. However, the relatively ultra stable and unstable pre-miRNAs were more likely to be non-conserved than pre-miRNAs with moderate stability. Non-conserved pre-miRNAs had more G+C than A+U nucleotides, while conserved pre-miRNAs contained more A+U nucleotides. Notably, the U content of conserved pre-miRNAs was especially higher than that of non-conserved pre-miRNAs. Further investigations showed that conserved and non-conserved pre-miRNAs exhibited different structural element features, even though they had comparable levels of stability. Conclusions We proposed that there is a correlation between structural thermodynamic stability and sequence conservation for pre-miRNAs from human, mouse, and chicken genomes. Our analyses suggested that pre-miRNAs with relatively ultra stable or unstable structures were less favoured by natural selection than those with moderately stable structures. Comparison of nucleotide compositions between non-conserved and conserved pre-miRNAs indicated the importance of U nucleotides in the pre-miRNA evolutionary process. Several characteristic structural elements were

  1. Stimulatory effect of Echinacea purpurea extract on the trafficking activity of mouse dendritic cells: revealed by genomic and proteomic analyses

    Directory of Open Access Journals (Sweden)

    Wang Bi-Xue

    2010-11-01

    Full Text Available Abstract Background Several Echinacea species have been used as nutraceuticals or botanical drugs for "immunostimulation", but scientific evidence supporting their therapeutic use is still controversial. In this study, a phytocompound mixture extracted from the butanol fraction (BF of a stem and leaf (S+L extract of E. purpurea ([BF/S+L/Ep] containing stringently defined bioactive phytocompounds was obtained using standardized and published procedures. The transcriptomic and proteomic effects of this phytoextract on mouse bone marrow-derived dendritic cells (BMDCs were analyzed using primary cultures. Results Treatment of BMDCs with [BF/S+L/Ep] did not significantly influence the phenotypic maturation activity of dendritic cells (DCs. Affymetrix DNA microarray and bioinformatics analyses of genes differentially expressed in DCs treated with [BF/S+L/Ep] for 4 or 12 h revealed that the majority of responsive genes were related to cell adhesion or motility (Cdh10, Itga6, Cdh1, Gja1 and Mmp8, or were chemokines (Cxcl2, Cxcl7 or signaling molecules (Nrxn1, Pkce and Acss1. TRANSPATH database analyses of gene expression and related signaling pathways in treated-DCs predicted the JNK, PP2C-α, AKT, ERK1/2 or MAPKAPK pathways as the putative targets of [BF/S+L/Ep]. In parallel, proteomic analysis showed that the expressions of metabolic-, cytoskeleton- or NF-κB signaling-related proteins were regulated by treatment with [BF/S+L/Ep]. In vitro flow cytometry analysis of chemotaxis-related receptors and in vivo cell trafficking assay further showed that DCs treated with [BF/S+L/Ep] were able to migrate more effectively to peripheral lymph node and spleen tissues than DCs treated as control groups. Conclusion Results from this study suggest that [BF/S+L/Ep] modulates DC mobility and related cellular physiology in the mouse immune system. Moreover, the signaling networks and molecules highlighted here are potential targets for nutritional or clinical

  2. BGDB: a database of bivalent genes.

    Science.gov (United States)

    Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua

    2013-01-01

    Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/

  3. Genome-Wide Scleral Micro- and Messenger-RNA Regulation During Myopia Development in the Mouse.

    Science.gov (United States)

    Metlapally, Ravikanth; Park, Han Na; Chakraborty, Ranjay; Wang, Kevin K; Tan, Christopher C; Light, Jacob G; Pardue, Machelle T; Wildsoet, Christine F

    2016-11-01

    MicroRNA (miRNAs) have been previously implicated in scleral remodeling in normal eye growth. They have the potential to be therapeutic targets for prevention/retardation of exaggerated eye growth in myopia by modulating scleral matrix remodeling. To explore this potential, genome-wide miRNA and messenger RNA (mRNA) scleral profiles in myopic and control eyes from mice were studied. C57BL/6J mice (n = 7; P28) reared under a 12L:12D cycle were form-deprived (FD) unilaterally for 2 weeks. Refractive error and axial length changes were measured using photorefraction and 1310-nm spectral-domain optical coherence tomography, respectively. Scleral RNA samples from FD and fellow control eyes were processed for microarray assay. Statistical analyses were performed using National Institute of Aging array analysis tool; group comparisons were made using ANOVA, and gene ontologies were identified using software available on the Web. Findings were confirmed using quantitative PCR in a separate group of mice (n = 7). Form-deprived eyes showed myopic shifts in refractive error (-2.02 ± 0.47 D; P RNA profiles of test eyes with those of control eyes revealed 54 differentially expressed miRNAs and 261 mRNAs fold-change >1.25 (maximum fold change = 1.63 and 2.7 for miRNAs and mRNAs, respectively) (P < 0.05; minimum, P = 0.0001). Significant ontologies showing gene over-representation (P < 0.05) included intermediate filament organization, scaffold protein binding, detection of stimuli, calcium ion, G protein, and phototransduction. Significant differential expression of Let-7a and miR-16-2, and Smok4a, Prph2, and Gnat1 were confirmed. Scleral mi- and mRNAs showed differential expression linked to myopia, supporting the involvement of miRNAs in eye growth regulation. The observed general trend of relatively small fold-changes suggests a tightly controlled, regulatory mechanism for scleral gene expression.

  4. Tripartite ATP-independent periplasmic transporters: application of a relational database for genome-wide analysis of transporter gene frequency and organization.

    Science.gov (United States)

    Mulligan, Christopher; Kelly, David J; Thomas, Gavin H

    2007-01-01

    Tripartite ATP-independent periplasmic (TRAP) transporters are a family of extracytoplasmic solute receptor-dependent secondary transporters that are widespread in the prokaryotic world but which have not been extensively studied. Here, we present results of a genome-wide analysis of TRAP sequences and genome organization from application of TRAPDb, a relational database created for the collection, curation and analysis of TRAP sequences. This has revealed a specific enrichment in the number of TRAP transporters in several bacteria which is consistent with increased use of TRAP transporters in saline environments. Additionally, we report a number of new organizations of TRAP transporter genes and proteins which suggest the recruitment of TRAP transporter components for use in other biological contexts.

  5. An encyclopedia of mouse genes.

    Science.gov (United States)

    Marra, M; Hillier, L; Kucaba, T; Allen, M; Barstead, R; Beck, C; Blistain, A; Bonaldo, M; Bowers, Y; Bowles, L; Cardenas, M; Chamberlain, A; Chappell, J; Clifton, S; Favello, A; Geisel, S; Gibbons, M; Harvey, N; Hill, F; Jackson, Y; Kohn, S; Lennon, G; Mardis, E; Martin, J; Mila, L; McCann, R; Morales, R; Pape, D; Person, B; Prange, C; Ritter, E; Soares, M; Schurk, R; Shin, T; Steptoe, M; Swaller, T; Theising, B; Underwood, K; Wylie, T; Yount, T; Wilson, R; Waterston, R

    1999-02-01

    The laboratory mouse is the premier model system for studies of mammalian development due to the powerful classical genetic analysis possible (see also the Jackson Laboratory web site, http://www.jax.org/) and the ever-expanding collection of molecular tools. To enhance the utility of the mouse system, we initiated a program to generate a large database of expressed sequence tags (ESTs) that can provide rapid access to genes. Of particular significance was the possibility that cDNA libraries could be prepared from very early stages of development, a situation unrealized in human EST projects. We report here the development of a comprehensive database of ESTs for the mouse. The project, initiated in March 1996, has focused on 5' end sequences from directionally cloned, oligo-dT primed cDNA libraries. As of 23 October 1998, 352,040 sequences had been generated, annotated and deposited in dbEST, where they comprised 93% of the total ESTs available for mouse. EST data are versatile and have been applied to gene identification, comparative sequence analysis, comparative gene mapping and candidate disease gene identification, genome sequence annotation, microarray development and the development of gene-based map resources.

  6. Transcription of mouse Sp2 yields alternatively spliced and sub-genomic mRNAs in a tissue- and cell type-specific fashion

    Science.gov (United States)

    Yin, Haifeng; Nichols, Teresa D.; Horowitz, Jonathan M.

    2010-01-01

    The Sp-family of transcription factors is comprised by nine members, Sp1-9, that share a highly-conserved DNA-binding domain. Sp2 is a poorly characterized member of this transcription factor family that is widely expressed in murine and human cell lines yet exhibits little DNA-binding or trans-activation activity in these settings. As a prelude to the generation of a “knock-out” mouse strain, we isolated a mouse Sp2 cDNA and performed a detailed analysis of Sp2 transcription in embryonic and adult mouse tissues. We report that (1) the 5′ untranslated region of Sp2 is subject to alternative splicing, (2) Sp2 transcription is regulated by at least two promoters that differ in their cell-type specificity, (3) one Sp2 promoter is highly active in nine mammalian cell lines and strains and is regulated by at least five discrete stimulatory and inhibitory elements, (4) a variety of sub-genomic messages are synthesized from the Sp2 locus in a tissue- and cell type-specific fashion and these transcripts have the capacity to encode a novel partial-Sp2 protein, and (5) RNA in situ hybridization assays indicate that Sp2 is widely expressed during mouse embryogenesis, particularly in the embryonic brain, and robust Sp2 expression occurs in neurogenic regions of the post-natal and adult brain. PMID:20353838

  7. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  8. A new single-nucleotide polymorphisms database for rainbow trout generated through whole genome resequencing of selected samples

    Science.gov (United States)

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  9. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata.

    Science.gov (United States)

    Inglis, Diane O; Arnaud, Martha B; Binkley, Jonathan; Shah, Prachi; Skrzypek, Marek S; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2012-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

  10. Dysregulation of mitotic machinery genes precedes genome instability during spontaneous pre-malignant transformation of mouse ovarian surface epithelial cells

    Directory of Open Access Journals (Sweden)

    Ulises Urzúa

    2016-10-01

    Full Text Available Abstract Background Based in epidemiological evidence, repetitive ovulation has been proposed to play a role in the origin of ovarian cancer by inducing an aberrant wound rupture-repair process of the ovarian surface epithelium (OSE. Accordingly, long term cultures of isolated OSE cells undergo in vitro spontaneous transformation thus developing tumorigenic capacity upon extensive subcultivation. In this work, C57BL/6 mouse OSE (MOSE cells were cultured up to passage 28 and their RNA and DNA copy number profiles obtained at passages 2, 5, 7, 10, 14, 18, 23, 25 and 28 by means of DNA microarrays. Gene ontology, pathway and network analyses were focused in passages earlier than 20, which is a hallmark of malignancy in this model. Results At passage 14, 101 genes were up-regulated in absence of significant DNA copy number changes. Among these, the top-3 enriched functions (>30 fold, adj p < 0.05 comprised 7 genes coding for centralspindlin, chromosome passenger and minichromosome maintenance protein complexes. The genes Ccnb1 (Cyclin B1, Birc5 (Survivin, Nusap1 and Kif23 were the most recurrent in over a dozen GO terms related to the mitotic process. On the other hand, Pten plus the large non-coding RNAs Malat1 and Neat1 were among the 80 down-regulated genes with mRNA processing, nuclear bodies, ER-stress response and tumor suppression as relevant terms. Interestingly, the earliest discrete segmental aneuploidies arose by passage 18 in chromosomes 7, 10, 11, 13, 15, 17 and 19. By passage 23, when MOSE cells express the malignant phenotype, the dysregulated gene expression repertoire expanded, DNA imbalances enlarged in size and covered additional loci. Conclusion Prior to early aneuploidies, overexpression of genes coding for the mitotic apparatus in passage-14 pre-malignant MOSE cells indicate an increased proliferation rate suggestive of replicative stress. Concomitant down-regulation of nuclear bodies and RNA processing related genes

  11. Conserved cis-regulatory regions in a large genomic landscape control SHH and BMP-regulated Gremlin1 expression in mouse limb buds

    Directory of Open Access Journals (Sweden)

    Zuniga Aimée

    2012-08-01

    Full Text Available Abstract Background Mouse limb bud is a prime model to study the regulatory interactions that control vertebrate organogenesis. Major aspects of limb bud development are controlled by feedback loops that define a self-regulatory signalling system. The SHH/GREM1/AER-FGF feedback loop forms the core of this signalling system that operates between the posterior mesenchymal organiser and the ectodermal signalling centre. The BMP antagonist Gremlin1 (GREM1 is a critical node in this system, whose dynamic expression is controlled by BMP, SHH, and FGF signalling and key to normal progression of limb bud development. Previous analysis identified a distant cis-regulatory landscape within the neighbouring Formin1 (Fmn1 locus that is required for Grem1 expression, reminiscent of the genomic landscapes controlling HoxD and Shh expression in limb buds. Results Three highly conserved regions (HMCO1-3 were identified within the previously defined critical genomic region and tested for their ability to regulate Grem1 expression in mouse limb buds. Using a combination of BAC and conventional transgenic approaches, a 9 kb region located ~70 kb downstream of the Grem1 transcription unit was identified. This region, termed Grem1 Regulatory Sequence 1 (GRS1, is able to recapitulate major aspects of Grem1 expression, as it drives expression of a LacZ reporter into the posterior and, to a lesser extent, in the distal-anterior mesenchyme. Crossing the GRS1 transgene into embryos with alterations in the SHH and BMP pathways established that GRS1 depends on SHH and is modulated by BMP signalling, i.e. integrates inputs from these pathways. Chromatin immunoprecipitation revealed interaction of endogenous GLI3 proteins with the core cis-regulatory elements in the GRS1 region. As GLI3 is a mediator of SHH signal transduction, these results indicated that SHH directly controls Grem1 expression through the GRS1 region. Finally, all cis-regulatory regions within the Grem1

  12. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome.

    Science.gov (United States)

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle.

  13. A Genome-wide Gene-Expression Analysis and Database in Transgenic Mice during Development of Amyloid or Tau Pathology

    Directory of Open Access Journals (Sweden)

    Mar Matarin

    2015-02-01

    Full Text Available We provide microarray data comparing genome-wide differential expression and pathology throughout life in four lines of “amyloid” transgenic mice (mutant human APP, PSEN1, or APP/PSEN1 and “TAU” transgenic mice (mutant human MAPT gene. Microarray data were validated by qPCR and by comparison to human studies, including genome-wide association study (GWAS hits. Immune gene expression correlated tightly with plaques whereas synaptic genes correlated negatively with neurofibrillary tangles. Network analysis of immune gene modules revealed six hub genes in hippocampus of amyloid mice, four in common with cortex. The hippocampal network in TAU mice was similar except that Trem2 had hub status only in amyloid mice. The cortical network of TAU mice was entirely different with more hub genes and few in common with the other networks, suggesting reasons for specificity of cortical dysfunction in FTDP17. This Resource opens up many areas for investigation. All data are available and searchable at http://www.mouseac.org.

  14. Mining novel starch-converting Glycoside Hydrolase 70 enzymes from the Nestlé Culture Collection genome database: The Lactobacillus reuteri NCC 2613 GtfB.

    Science.gov (United States)

    Gangoiti, Joana; van Leeuwen, Sander S; Meng, Xiangfeng; Duboux, Stéphane; Vafiadi, Christina; Pijning, Tjaard; Dijkhuizen, Lubbert

    2017-08-30

    The Glycoside hydrolase (GH) family 70 originally was established for glucansucrases of lactic acid bacteria (LAB) converting sucrose into α-glucan polymers. In recent years we have identified 3 subfamilies of GH70 enzymes (designated GtfB, GtfC and GtfD) as 4,6-α-glucanotransferases, cleaving (α1 → 4)-linkages in maltodextrins/starch and synthesizing new (α1 → 6)-linkages. In this work, 106 putative GtfBs were identified in the Nestlé Culture Collection genome database with ~2700 genomes, and the L. reuteri NCC 2613 one was selected for further characterization based on variations in its conserved motifs. Using amylose the L. reuteri NCC 2613 GtfB synthesizes a low-molecular-mass reuteran-like polymer consisting of linear (α1 → 4) sequences interspersed with (α1 → 6) linkages, and (α1 → 4,6) branching points. This product specificity is novel within the GtfB subfamily, mostly comprising 4,6-α-glucanotransferases synthesizing consecutive (α1 → 6)-linkages. Instead, its activity resembles that of the GtfD 4,6-α-glucanotransferases identified in non-LAB strains. This study demonstrates the potential of large-scale genome sequence data for the discovery of enzymes of interest for the food industry. The L. reuteri NCC 2613 GtfB is a valuable addition to the starch-converting GH70 enzyme toolbox. It represents a new evolutionary intermediate between families GH13 and GH70, and provides further insights into the structure-function relationships of the GtfB subfamily enzymes.

  15. Linkage of cDNA expression profiles of mesencephalic dopaminergic neurons to a genome-wide in situ hybridization database

    Directory of Open Access Journals (Sweden)

    Simon Horst H

    2009-01-01

    Full Text Available Abstract Midbrain dopaminergic neurons are involved in control of emotion, motivation and motor behavior. The loss of one of the subpopulations, substantia nigra pars compacta, is the pathological hallmark of one of the most prominent neurological disorders, Parkinson's disease. Several groups have looked at the molecular identity of midbrain dopaminergic neurons and have suggested the gene expression profile of these neurons. Here, after determining the efficiency of each screen, we provide a linked database of the genes, expressed in this neuronal population, by combining and comparing the results of six previous studies and verification of expression of each gene in dopaminergic neurons, using the collection of in situ hybridization in the Allen Brain Atlas.

  16. Computational prediction of candidate miRNAs and their targets from the completed Linum ussitatissimum genome and EST database

    Directory of Open Access Journals (Sweden)

    Tiffanie Y. Moss

    2012-06-01

    Full Text Available Flax is an important agronomic crop grown for its fiber (linen and oil (linseed oil. In spite of many thousands of years of breeding some fiber varieties have been shown to rapidly respond to environmental stress with heritable changes to its genome. Many miRNAs appear to be induced by abiotic or biotic conditions experienced through the plant life cycle. Computational miRNA analysis of the flax genome provides a foundation for subsequent research on miRNA function in Linum usitatissimum and may also provide novel insight into any regulatory role the RNAi pathway may play in generating adaptive structural variation in response to environmental stress. Here a bioinformatics approach is used to screen for miRNAs previously identified in other plant species, as well as to predict putative miRNAs unique to a particular species which may not have been identified as they are less abundant or dependent upon a specific set of environmental conditions. Twelve miRNA genes were identified in flax on the basis of unique pre-miRNA positions with structural homology to plant pre-miRNAs and complete sequence homology to published plant miRNAs. These miRNAs were found to belong to 7 miRNA families, with an additional 2 matches corresponding to as yet unnamed poplar miRNAs and a parologous miRNA with partial sequence homology to mtr-miR4414b. An additional 649 novel and distinct flax miRNA genes were identified to form from canonical hairpin structures and to have putative targets among the ~30,000 flax Unigenes.

  17. High-throughput mouse phenotyping.

    Science.gov (United States)

    Gates, Hilary; Mallon, Ann-Marie; Brown, Steve D M

    2011-04-01

    Comprehensive phenotyping will be required to reveal the pleiotropic functions of a gene and to uncover the wider role of genetic loci within diverse biological systems. The challenge will be to devise phenotyping approaches to characterise the thousands of mutants that are being generated as part of international efforts to acquire a mutant for every gene in the mouse genome. In order to acquire robust datasets of broad based phenotypes from mouse mutants it is necessary to design and implement pipelines that incorporate standardised phenotyping platforms that are validated across diverse mouse genetics centres or mouse clinics. We describe here the rationale and methodology behind one phenotyping pipeline, EMPReSSslim, that was designed as part of the work of the EUMORPHIA and EUMODIC consortia, and which exemplifies some of the challenges facing large-scale phenotyping. EMPReSSslim captures a broad range of data on diverse biological systems, from biochemical to physiological amongst others. Data capture and dissemination is pivotal to the operation of large-scale phenotyping pipelines, including the definition of parameters integral to each phenotyping test and the associated ontological descriptions. EMPReSSslim data is displayed within the EuroPhenome database, where a variety of tools are available to allow the user to search for interesting biological or clinical phenotypes. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. Tet3 and DNA replication mediate demethylation of both the maternal and paternal genomes in mouse zygotes.

    Science.gov (United States)

    Shen, Li; Inoue, Azusa; He, Jin; Liu, Yuting; Lu, Falong; Zhang, Yi

    2014-10-02

    With the exception of imprinted genes and certain repeats, DNA methylation is globally erased during preimplantation development. Recent studies have suggested that Tet3-mediated oxidation of 5-methylcytosine (5mC) and DNA replication-dependent dilution both contribute to global paternal DNA demethylation, but demethylation of the maternal genome occurs via replication. Here we present genome-scale DNA methylation maps for both the paternal and maternal genomes of Tet3-depleted and/or DNA replication-inhibited zygotes. In both genomes, we found that inhibition of DNA replication blocks DNA demethylation independently from Tet3 function and that Tet3 facilitates DNA demethylation largely by coupling with DNA replication. For both genomes, our data indicate that replication-dependent dilution is the major contributor to demethylation, but Tet3 plays an important role, particularly at certain loci. Our study thus defines the respective functions of Tet3 and DNA replication in paternal DNA demethylation and reveals an unexpected contribution of Tet3 to demethylation of the maternal genome.

  19. Mouse p53-Deficient Cancer Models as Platforms for Obtaining Genomic Predictors of Human Cancer Clinical Outcomes

    Science.gov (United States)

    Dueñas, Marta; Santos, Mirentxu; Aranda, Juan F.; Bielza, Concha; Martínez-Cruz, Ana B.; Lorz, Corina; Taron, Miquel; Ciruelos, Eva M.; Rodríguez-Peralto, José L.; Martín, Miguel; Larrañaga, Pedro; Dahabreh, Jubrail; Stathopoulos, George P.; Rosell, Rafael; Paramio, Jesús M.; García-Escudero, Ramón

    2012-01-01

    Mutations in the TP53 gene are very common in human cancers, and are associated with poor clinical outcome. Transgenic mouse models lacking the Trp53 gene or that express mutant Trp53 transgenes produce tumours with malignant features in many organs. We previously showed the transcriptome of a p53-deficient mouse skin carcinoma model to be similar to those of human cancers with TP53 mutations and associated with poor clinical outcomes. This report shows that much of the 682-gene signature of this murine skin carcinoma transcriptome is also present in breast and lung cancer mouse models in which p53 is inhibited. Further, we report validated gene-expression-based tests for predicting the clinical outcome of human breast and lung adenocarcinoma. It was found that human patients with cancer could be stratified based on the similarity of their transcriptome with the mouse skin carcinoma 682-gene signature. The results also provide new targets for the treatment of p53-defective tumours. PMID:22880004

  20. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  1. RSSsite: a reference database and prediction tool for the identification of cryptic Recombination Signal Sequences in human and murine genomes.

    Science.gov (United States)

    Merelli, Ivan; Guffanti, Alessandro; Fabbri, Marco; Cocito, Andrea; Furia, Laura; Grazini, Ursula; Bonnal, Raoul J; Milanesi, Luciano; McBlane, Fraser

    2010-07-01

    Recombination signal sequences (RSSs) flanking V, D and J gene segments are recognized and cut by the VDJ recombinase during development of B and T lymphocytes. All RSSs are composed of seven conserved nucleotides, followed by a spacer (containing either 12 +/- 1 or 23 +/- 1 poorly conserved nucleotides) and a conserved nonamer. Errors in V(D)J recombination, including cleavage of cryptic RSS outside the immunoglobulin and T cell receptor loci, are associated with oncogenic translocations observed in some lymphoid malignancies. We present in this paper the RSSsite web server, which is available from the address http://www.itb.cnr.it/rss. RSSsite consists of a web-accessible database, RSSdb, for the identification of pre-computed potential RSSs, and of the related search tool, DnaGrab, which allows the scoring of potential RSSs in user-supplied sequences. This latter algorithm makes use of probability models, which can be recasted to Bayesian network, taking into account correlations between groups of positions of a sequence, developed starting from specific reference sets of RSSs. In validation laboratory experiments, we selected 33 predicted cryptic RSSs (cRSSs) from 11 chromosomal regions outside the immunoglobulin and TCR loci for functional testing.

  2. The database of chromosome imbalance regions and genes resided in lung cancer from Asian and Caucasian identified by array-comparative genomic hybridization

    Directory of Open Access Journals (Sweden)

    Lo Fang-Yi

    2012-06-01

    Full Text Available Abstract Background Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Methods Array-comparative genomic hybridization (array-CGH was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR, chromogenic in situ hybridization (CISH, reverse transcriptase-qPCR (RT-qPCR, and immunohistochemistry (IHC in more patients. Results We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1 functioning in Rho activity control, FRAT2 (10q24.1 involved in Wnt signaling, PAFAH1B1 (17p13.3 functioning in motility control, and ZNF322A (6p22.1 involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (PP=0.06. In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of

  3. DNA repair efficiency in germ cells and early mouse embryos and consequences for radiation-induced transgenerational genomic damage

    Energy Technology Data Exchange (ETDEWEB)

    Marchetti, Francesco; Wyrobek, Andrew J.

    2009-01-18

    Exposure to ionizing radiation and other environmental agents can affect the genomic integrity of germ cells and induce adverse health effects in the progeny. Efficient DNA repair during gametogenesis and the early embryonic cycles after fertilization is critical for preventing transmission of DNA damage to the progeny and relies on maternal factors stored in the egg before fertilization. The ability of the maternal repair machinery to repair DNA damage in both parental genomes in the fertilizing egg is especially crucial for the fertilizing male genome that has not experienced a DNA repair-competent cellular environment for several weeks prior to fertilization. During the DNA repair-deficient period of spermatogenesis, DNA lesions may accumulate in sperm and be carried into the egg where, if not properly repaired, could result in the formation of heritable chromosomal aberrations or mutations and associated birth defects. Studies with female mice deficient in specific DNA repair genes have shown that: (i) cell cycle checkpoints are activated in the fertilized egg by DNA damage carried by the sperm; and (ii) the maternal genotype plays a major role in determining the efficiency of repairing genomic lesions in the fertilizing sperm and directly affect the risk for abnormal reproductive outcomes. There is also growing evidence that implicates DNA damage carried by the fertilizing gamete as a mediator of postfertilization processes that contribute to genomic instability in subsequent generations. Transgenerational genomic instability most likely involves epigenetic mechanisms or error-prone DNA repair processes in the early embryo. Maternal and embryonic DNA repair processes during the early phases of mammalian embryonic development can have far reaching consequences for the genomic integrity and health of subsequent generations.

  4. Integration of the Rat Recombination and EST Maps in the Rat Genomic Sequence and Comparative Mapping Analysis With the Mouse Genome

    OpenAIRE

    Wilder, Steven P.; Bihoreau, Marie-Thérèse; Argoud, Karène; Watanabe, Takeshi K.; Lathrop, Mark; Gauguier, Dominique

    2004-01-01

    Inbred strains of the laboratory rat are widely used for identifying genetic regions involved in the control of complex quantitative phenotypes of biomedical importance. The draft genomic sequence of the rat now provides essential information for annotating rat quantitative trait locus (QTL) maps. Following the survey of unique rat microsatellite (11,585 including 1648 new markers) and EST (10,067) markers currently available, we have incorporated a selection of 7952 rat EST sequences in an i...

  5. Analysis of SSRs in grape genome and development of SSR database%葡萄全基因组SSR分析和数据库构建

    Institute of Scientific and Technical Information of China (English)

    蔡斌; 李成慧; 姚泉洪; 周军; 陶建敏; 章镇

    2009-01-01

    We developed a Perl script-SSRFinder to detect SSRs in grape genome sequence. A total of 114 520 SSRs were isolated from publicly available Vitis vinifera L. ' Pinor Nori PN40024' genomic DNA sequence. Among them, 37 648 mononucleotide repeats, 30 123 dinucleotide repeats, 18 705 trinucleotide repeats, 14 566 tetranucleotide repeats, 3 492 pentanucleotide repeats, and 9 986 hexanucleotide repeats were found, accounting for 32. 9% , 26. 3% , 16. 3% , 12. 7% , 3. 0% , and 8. 7% of the total SSRs respectively. SSRs with poly ( A/T)_n repeats represented the most abundant type, whereas C/G-rich motifs were the rarest type. We also assessed the distribution of SSRs on genome fragment. The results showed that the SSRs distributed mainly in inter-genic region and were moderately abundant in UTRs. In coding region, the distribution of all repeat types was less frequent except tri- and hexa-nucleotide repeats. To make use of these SSRs, we developed a database on the Internet. The database of grape SSRs ( DGSSR) is a database comprehensively collecting and annotating grape SSRs. The DGSSR contains all the SSRs with their related information detected in the study. It provides flexible query interface and detailed annotations for individual SSR. It also contains SSRs detected from Vitis vinifera L. ESTs dataset. The DGSSR is available at http: //www. yaolab. sh. cn/ssr.%利用Perl语言开发了用于探寻基因组SSR的程序SSRFinder,并利用其从法国国家基因测序中心(Genoscope)公布的欧亚种葡萄(Vitis vinifera L.)黑比诺品系PN40024的基因组序列中检索到114 520个SSR.其中含单核苷酸、二核苷酸、三核苷酸、四核苷酸、五核苷酸和六核苷酸重复单元的SSR数目分别为37 648(329%)、30 123(263%)、18 705(163%)、14 566(127%)、3 492(30%)和9 986(87%)个.在各类SSR中,不同核苷酸组成的重复单元频率间存在较大的差异,其中富含A/T重复单元的SSR频率最高,而富含C/G重复单元的SSR频率

  6. Identification and evolutionary analysis of novel exons and alternative splicing events using cross-species EST-to-genome comparisons in human, mouse and rat

    Directory of Open Access Journals (Sweden)

    Ho Jar-Yi

    2006-03-01

    Full Text Available Abstract Background Alternative splicing (AS is important for evolution and major biological functions in complex organisms. However, the extent of AS in mammals other than human and mouse is largely unknown, making it difficult to study AS evolution in mammals and its biomedical implications. Results Here we describe a cross-species EST-to-genome comparison algorithm (ENACE that can identify novel exons for EST-scanty species and distinguish conserved and lineage-specific exons. The identified exons represent not only novel exons but also evolutionarily meaningful AS events that are not previously annotated. A genome-wide AS analysis in human, mouse and rat using ENACE reveals a total of 758 novel cassette-on exons and 167 novel retained introns that have no EST evidence from the same species. RT-PCR-sequencing experiments validated ~50 ~80% of the tested exons, indicating high presence of exons predicted by ENACE. ENACE is particularly powerful when applied to closely related species. In addition, our analysis shows that the ENACE-identified AS exons tend not to pass the nonsynonymous-to-synonymous substitution ratio test and not to contain protein domain, implying that such exons may be under positive selection or relaxed negative selection. These AS exons may contribute to considerable inter-species functional divergence. Our analysis further indicates that a large number of exons may have been gained or lost during mammalian evolution. Moreover, a functional analysis shows that inter-species divergence of AS events may be substantial in protein carriers and receptor proteins in mammals. These exons may be of interest to studies of AS evolution. The ENACE programs and sequences of the ENACE-identified AS events are available for download. Conclusion ENACE can identify potential novel cassette exons and retained introns between closely related species using a comparative approach. It can also provide information regarding lineage- or species

  7. Genome-wide and phase-specific DNA-binding rhythms of BMAL1 control circadian output functions in mouse liver.

    Directory of Open Access Journals (Sweden)

    Guillaume Rey

    2011-02-01

    Full Text Available The mammalian circadian clock uses interlocked negative feedback loops in which the heterodimeric basic helix-loop-helix transcription factor BMAL1/CLOCK is a master regulator. While there is prominent control of liver functions by the circadian clock, the detailed links between circadian regulators and downstream targets are poorly known. Using chromatin immunoprecipitation combined with deep sequencing we obtained a time-resolved and genome-wide map of BMAL1 binding in mouse liver, which allowed us to identify over 2,000 binding sites, with peak binding narrowly centered around Zeitgeber time 6. Annotation of BMAL1 targets confirms carbohydrate and lipid metabolism as the major output of the circadian clock in mouse liver. Moreover, transcription regulators are largely overrepresented, several of which also exhibit circadian activity. Genes of the core circadian oscillator stand out as strongly bound, often at promoter and distal sites. Genomic sequence analysis of the sites identified E-boxes and tandem E1-E2 consensus elements. Electromobility shift assays showed that E1-E2 sites are bound by a dimer of BMAL1/CLOCK heterodimers with a spacing-dependent cooperative interaction, a finding that was further validated in transactivation assays. BMAL1 target genes showed cyclic mRNA expression profiles with a phase distribution centered at Zeitgeber time 10. Importantly, sites with E1-E2 elements showed tighter phases both in binding and mRNA accumulation. Finally, analyzing the temporal profiles of BMAL1 binding, precursor mRNA and mature mRNA levels showed how transcriptional and post-transcriptional regulation contribute differentially to circadian expression phase. Together, our analysis of a dynamic protein-DNA interactome uncovered how genes of the core circadian oscillator crosstalk and drive phase-specific circadian output programs in a complex tissue.

  8. The SWI/SNF protein ATRX co-regulates pseudoautosomal genes that have translocated to autosomes in the mouse genome

    Directory of Open Access Journals (Sweden)

    Fernandes Andrew D

    2008-10-01

    Full Text Available Abstract Background Pseudoautosomal regions (PAR1 and PAR2 in eutherians retain homologous regions between the X and Y chromosomes that play a critical role in the obligatory X-Y crossover during male meiosis. Genes that reside in the PAR1 are exceptional in that they are rich in repetitive sequences and undergo a very high rate of recombination. Remarkably, murine PAR1 homologs have translocated to various autosomes, reflecting the complex recombination history during the evolution of the mammalian X chromosome. Results We now report that the SNF2-type chromatin remodeling protein ATRX controls the expression of eutherian ancestral PAR1 genes that have translocated to autosomes in the mouse. In addition, we have identified two potentially novel mouse PAR1 orthologs. Conclusion We propose that the ancestral PAR1 genes share a common epigenetic environment that allows ATRX to control their expression.

  9. Generation of a Knockout Mouse Embryonic Stem Cell Line Using a Paired CRISPR/Cas9 Genome Engineering Tool.

    Science.gov (United States)

    Wettstein, Rahel; Bodak, Maxime; Ciaudo, Constance

    2016-01-01

    CRISPR/Cas9, originally discovered as a bacterial immune system, has recently been engineered into the latest tool to successfully introduce site-specific mutations in a variety of different organisms. Composed only of the Cas9 protein as well as one engineered guide RNA for its functionality, this system is much less complex in its setup and easier to handle than other guided nucleases such as Zinc-finger nucleases or TALENs.Here, we describe the simultaneous transfection of two paired CRISPR sgRNAs-Cas9 plasmids, in mouse embryonic stem cells (mESCs), resulting in the knockout of the selected target gene. Together with a four primer-evaluation system, it poses an efficient way to generate new independent knockout mouse embryonic stem cell lines.

  10. Genome-wide ENU mutagenesis in combination with high density SNP analysis and exome sequencing provides rapid identification of novel mouse models of developmental disease.

    Directory of Open Access Journals (Sweden)

    Georgina Caruana

    Full Text Available BACKGROUND: Mice harbouring gene mutations that cause phenotypic abnormalities during organogenesis are invaluable tools for linking gene function to normal development and human disorders. To generate mouse models harbouring novel alleles that are involved in organogenesis we conducted a phenotype-driven, genome-wide mutagenesis screen in mice using the mutagen N-ethyl-N-nitrosourea (ENU. METHODOLOGY/PRINCIPAL FINDINGS: ENU was injected into male C57BL/6 mice and the mutations transmitted through the germ-line. ENU-induced mutations were bred to homozygosity and G3 embryos screened at embryonic day (E 13.5 and E18.5 for abnormalities in limb and craniofacial structures, skin, blood, vasculature, lungs, gut, kidneys, ureters and gonads. From 52 pedigrees screened 15 were detected with anomalies in one or more of the structures/organs screened. Using single nucleotide polymorphism (SNP-based linkage analysis in conjunction with candidate gene or next-generation sequencing (NGS we identified novel recessive alleles for Fras1, Ift140 and Lig1. CONCLUSIONS/SIGNIFICANCE: In this study we have generated mouse models in which the anomalies closely mimic those seen in human disorders. The association between novel mutant alleles and phenotypes will lead to a better understanding of gene function in normal development and establish how their dysfunction causes human anomalies and disease.

  11. Glial cell line-derived neurotrophic factor alters the growth characteristics and genomic imprinting of mouse multipotent adult germline stem cells

    Energy Technology Data Exchange (ETDEWEB)

    Jung, Yoon Hee [Department of Bioscience and Biotechnology, Bio-Organ Research Center/Animal Resources Research Center, Konkuk University, Hwayang-dong, Gwangjin-Gu, Seoul 143 701 (Korea, Republic of); Gupta, Mukesh Kumar, E-mail: goops@konkuk.ac.kr [Department of Animal Biotechnology, Bio-Organ Research Center/Animal Resources Research Center, Konkuk University, Hwayang-dong, Gwangjin-Gu, Seoul 143 701 (Korea, Republic of); Oh, Shin Hye [Department of Bioscience and Biotechnology, Bio-Organ Research Center/Animal Resources Research Center, Konkuk University, Hwayang-dong, Gwangjin-Gu, Seoul 143 701 (Korea, Republic of); Uhm, Sang Jun [Department of Animal Biotechnology, Bio-Organ Research Center/Animal Resources Research Center, Konkuk University, Hwayang-dong, Gwangjin-Gu, Seoul 143 701 (Korea, Republic of); Lee, Hoon Taek, E-mail: htl3675@konkuk.ac.kr [Department of Bioscience and Biotechnology, Bio-Organ Research Center/Animal Resources Research Center, Konkuk University, Hwayang-dong, Gwangjin-Gu, Seoul 143 701 (Korea, Republic of); Department of Animal Biotechnology, Bio-Organ Research Center/Animal Resources Research Center, Konkuk University, Hwayang-dong, Gwangjin-Gu, Seoul 143 701 (Korea, Republic of)

    2010-03-10

    This study evaluated the essentiality of glial cell line-derived neurotrophic factor (GDNF) for in vitro culture of established mouse multipotent adult germline stem (maGS) cell lines by culturing them in the presence of GDNF, leukemia inhibitory factor (LIF) or both. We show that, in the absence of LIF, GDNF slows the proliferation of maGS cells and result in smaller sized colonies without any change in distribution of cells to different cell-cycle stages, expression of pluripotency genes and in vitro differentiation potential. Furthermore, in the absence of LIF, GDNF increased the expression of male germ-line genes and repopulated the empty seminiferous tubule of W/W{sup v} mutant mouse without the formation of teratoma. GDNF also altered the genomic imprinting of Igf2, Peg1, and H19 genes but had no effect on DNA methylation of Oct4, Nanog and Stra8 genes. However, these effects of GDNF were masked in the presence of LIF. GDNF also did not interfere with the multipotency of maGS cells if they are cultured in the presence of LIF. In conclusion, our results suggest that, in the absence of LIF, GDNF alters the growth characteristics of maGS cells and partially impart them some of the germline stem (GS) cell-like characteristics.

  12. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database

    Science.gov (United States)

    Tian, Feng; Zhao, Jinlong; Kang, Zhenxing

    2017-01-01

    Background Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. Methods We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Results Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. Conclusions The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC.

  13. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database.

    Science.gov (United States)

    Tian, Feng; Zhao, Jinlong; Fan, Xinlei; Kang, Zhenxing

    2017-01-01

    Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC.

  14. Genomic organization and chromosomal localization of the human and mouse genes encoding the alpha receptor component for ciliary neurotrophic factor.

    Science.gov (United States)

    Valenzuela, D M; Rojas, E; Le Beau, M M; Espinosa, R; Brannan, C I; McClain, J; Masiakowski, P; Ip, N Y; Copeland, N G; Jenkins, N A

    1995-01-01

    Ciliary neurotrophic factor (CNTF) has recently been found to share receptor components with, and to be structurally related to, a family of broadly acting cytokines, including interleukin-6, leukemia inhibitory factor, and oncostatin M. However, the CNTF receptor complex also includes a CNTF-specific component known as CNTF receptor alpha (CNTFR alpha). Here we describe the molecular cloning of the human and mouse genes encoding CNTFR. We report that the human and mouse genes have an identical intron-exon structure that correlates well with the domain structure of CNTFR alpha. That is, the signal peptide and the immunoglobulin-like domain are each encoded by single exons, the cytokine receptor-like domain is distributed among 4 exons, and the C-terminal glycosyl phosphatidylinositol recognition domain is encoded by the final coding exon. The position of the introns within the cytokine receptor-like domain corresponds to those found in other members of the cytokine receptor superfamily. Confirming a recent study using radiation hybrids, we have also mapped the human CNTFR gene to chromosome band 9p13 and the mouse gene to a syntenic region of chromosome 4.

  15. A versatile genome-scale PCR-based pipeline for high-definition DNA FISH.

    Science.gov (United States)

    Bienko, Magda; Crosetto, Nicola; Teytelman, Leonid; Klemm, Sandy; Itzkovitz, Shalev; van Oudenaarden, Alexander

    2013-02-01

    We developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a database of over 4.3 million primer pairs targeting the human and mouse genomes that is readily usable for rapid and flexible generation of probes.

  16. Molecular marker databases.

    Science.gov (United States)

    Lai, Kaitao; Lorenc, Michał Tadeusz; Edwards, David

    2015-01-01

    The detection and analysis of genetic variation plays an important role in plant breeding and this role is increasing with the continued development of genome sequencing technologies. Molecular genetic markers are important tools to characterize genetic variation and assist with genomic breeding. Processing and storing the growing abundance of molecular marker data being produced requires the development of specific bioinformatics tools and advanced databases. Molecular marker databases range from species specific through to organism wide and often host a variety of additional related genetic, genomic, or phenotypic information. In this chapter, we will present some of the features of plant molecular genetic marker databases, highlight the various types of marker resources, and predict the potential future direction of crop marker databases.

  17. In vivo 3D digital atlas database of the adult C57BL/6J mouse brain by magnetic resonance microscopy

    Directory of Open Access Journals (Sweden)

    Yu Ma

    2008-04-01

    Full Text Available In this study, a 3D digital atlas of the live mouse brain based on magnetic resonance microscopy (MRM is presented. C57BL/6J adult mouse brains were imaged in vivo on a 9.4 Tesla MR instrument at an isotropic spatial resolution of 100 μm. With sufficient signal-to-noise (SNR and contrast-to-noise ratio (CNR, 20 brain regions were identified. Several atlases were constructed including 12 individual brain atlases, an average atlas, a probabilistic atlas and average geometrical deformation maps. We also investigated the feasibility of using lower spatial resolution images to improve time efficiency for future morphological phenotyping. All of the new in vivo data were compared to previous published in vitro C57BL/6J mouse brain atlases and the morphological differences were characterized. Our analyses revealed significant volumetric as well as unexpected geometrical differences between the in vivo and in vitro brain groups which in some instances were predictable (e.g. collapsed and smaller ventricles in vitro but not in other instances. Based on these findings we conclude that although in vitro datasets, compared to in vivo images, offer higher spatial resolutions, superior SNR and CNR, leading to improved image segmentation, in vivo atlases are likely to be an overall better geometric match for in vivo studies, which are necessary for longitudinal examinations of the same animals and for functional brain activation studies. Thus the new in vivo mouse brain atlas dataset presented here is a valuable complement to the current mouse brain atlas collection and will be accessible to the neuroscience community on our public domain mouse brain atlas website.

  18. In Vivo 3D Digital Atlas Database of the Adult C57BL/6J Mouse Brain by Magnetic Resonance Microscopy.

    Science.gov (United States)

    Ma, Yu; Smith, David; Hof, Patrick R; Foerster, Bernd; Hamilton, Scott; Blackband, Stephen J; Yu, Mei; Benveniste, Helene

    2008-01-01

    In this study, a 3D digital atlas of the live mouse brain based on magnetic resonance microscopy (MRM) is presented. C57BL/6J adult mouse brains were imaged in vivo on a 9.4 Tesla MR instrument at an isotropic spatial resolution of 100 mum. With sufficient signal-to-noise (SNR) and contrast-to-noise ratio (CNR), 20 brain regions were identified. Several atlases were constructed including 12 individual brain atlases, an average atlas, a probabilistic atlas and average geometrical deformation maps. We also investigated the feasibility of using lower spatial resolution images to improve time efficiency for future morphological phenotyping. All of the new in vivo data were compared to previous published in vitro C57BL/6J mouse brain atlases and the morphological differences were characterized. Our analyses revealed significant volumetric as well as unexpected geometrical differences between the in vivo and in vitro brain groups which in some instances were predictable (e.g. collapsed and smaller ventricles in vitro) but not in other instances. Based on these findings we conclude that although in vitro datasets, compared to in vivo images, offer higher spatial resolutions, superior SNR and CNR, leading to improved image segmentation, in vivo atlases are likely to be an overall better geometric match for in vivo studies, which are necessary for longitudinal examinations of the same animals and for functional brain activation studies. Thus the new in vivo mouse brain atlas dataset presented here is a valuable complement to the current mouse brain atlas collection and will be accessible to the neuroscience community on our public domain mouse brain atlas website.

  19. The Knockout Mouse Project

    OpenAIRE

    Austin, Christopher P.; Battey, James F.; Bradley, Allan; Bucan, Maja; Capecchi, Mario; Collins, Francis S; Dove, William F.; Duyk, Geoffrey; Dymecki, Susan; Eppig, Janan T.; Grieder, Franziska B.; Heintz, Nathaniel; Hicks, Geoff; Insel, Thomas R; Joyner, Alexandra

    2004-01-01

    Mouse knockout technology provides a powerful means of elucidating gene function in vivo, and a publicly available genome-wide collection of mouse knockouts would be significantly enabling for biomedical discovery. To date, published knockouts exist for only about 10% of mouse genes. Furthermore, many of these are limited in utility because they have not been made or phenotyped in standardized ways, and many are not freely available to researchers. It is time to harness new technologies and e...

  20. MDP, a database linking drug response data to genomic information, identifies dasatinib and statins as a combinatorial strategy to inhibit YAP/TAZ in cancer cells.

    Science.gov (United States)

    Taccioli, Cristian; Sorrentino, Giovanni; Zannini, Alessandro; Caroli, Jimmy; Beneventano, Domenico; Anderlucci, Laura; Lolli, Marco; Bicciato, Silvio; Del Sal, Giannino

    2015-11-17

    Targeted anticancer therapies represent the most effective pharmacological strategies in terms of clinical responses. In this context, genetic alteration of several oncogenes represents an optimal predictor of response to targeted therapy. Integration of large-scale molecular and pharmacological data from cancer cell lines promises to be effective in the discovery of new genetic markers of drug sensitivity and of clinically relevant anticancer compounds. To define novel pharmacogenomic dependencies in cancer, we created the Mutations and Drugs Portal (MDP, http://mdp.unimore.it), a web accessible database that combines the cell-based NCI60 screening of more than 50,000 compounds with genomic data extracted from the Cancer Cell Line Encyclopedia and the NCI60 DTP projects. MDP can be queried for drugs active in cancer cell lines carrying mutations in specific cancer genes or for genetic markers associated to sensitivity or resistance to a given compound. As proof of performance, we interrogated MDP to identify both known and novel pharmacogenomics associations and unveiled an unpredicted combination of two FDA-approved compounds, namely statins and Dasatinib, as an effective strategy to potently inhibit YAP/TAZ in cancer cells.

  1. Genomic organization and phylogenetic utility of deer mouse (Peromyscus maniculatus lymphotoxin-alpha and lymphotoxin-beta

    Directory of Open Access Journals (Sweden)

    Prescott Joseph

    2008-10-01

    Full Text Available Abstract Background Deer mice (Peromyscus maniculatus are among the most common mammals in North America and are important reservoirs of several human pathogens, including Sin Nombre hantavirus (SNV. SNV can establish a life-long apathogenic infection in deer mice, which can shed virus in excrement for transmission to humans. Patients that die from hantavirus cardiopulmonary syndrome (HCPS have been found to express several proinflammatory cytokines, including lymphotoxin (LT, in the lungs. It is thought that these cytokines contribute to the pathogenesis of HCPS. LT is not expressed by virus-specific CD4+ T cells from infected deer mice, suggesting a limited role for this pathway in reservoir responses to hantaviruses. Results We have cloned the genes encoding deer mouse LTα and LTβ and have found them to be highly similar to orthologous rodent sequences but with some differences in promoters elements. The phylogenetic analyses performed on the LTα, LTβ, and combined data sets yielded a strongly-supported sister-group relationship between the two murines (the house mouse and the rat. The deer mouse, a sigmodontine, appeared as the sister group to the murine clade in all of the analyses. High bootstrap values characterized the grouping of murids. Conclusion No conspicuous differences compared to other species are present in the predicted amino acid sequences of LTα or LTβ; however, some promoter differences were noted in LTβ. Although more extensive taxonomic sampling is required to confirm the results of our analyses, the preliminary findings indicate that both genes (analyzed both separately and in combination hold potential for resolving relationships among rodents and other mammals at the subfamily level.

  2. DMPD: The p110delta subunit of phosphoinositide 3-kinase is required for thelipopolysaccharide response of mouse B cells. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 15494016 The p110delta subunit of phosphoinositide 3-kinase is required for thelipo... 5):789-91. (.png) (.svg) (.html) (.csml) Show The p110delta subunit of phosphoinositide 3-kinase is required...e p110delta subunit of phosphoinositide 3-kinase is required for thelipopolysaccharide response of mouse B c

  3. New routes for transgenesis of the mouse.

    Science.gov (United States)

    Belizário, José E; Akamini, Priscilla; Wolf, Philip; Strauss, Bryan; Xavier-Neto, José

    2012-08-01

    Transgenesis refers to the molecular genetic techniques for directing specific insertions, deletions and point mutations in the genome of germ cells in order to create genetically modified organisms (GMO). Genetic modification is becoming more practicable, efficient and predictable with the development and use of a variety of cell and molecular biology tools and DNA sequencing technologies. A collection of plasmidial and viral vectors, cell-type specific promoters, positive and negative selectable markers, reporter genes, drug-inducible Cre-loxP and Flp/FRT recombinase systems are available which ensure efficient transgenesis in the mouse. The technologies for the insertion and removal of genes by homologous-directed recombination in embryonic stem cells (ES) and generation of targeted gain- and loss-of function alleles have allowed the creation of thousands of mouse models of a variety of diseases. The engineered zinc finger nucleases (ZFNs) and small hairpin RNA-expressing constructs are novel tools with useful properties for gene knockout free of ES manipulation. In this review we briefly outline the different approaches and technologies for transgenesis as well as their advantages and disadvantages. We also present an overview on how the novel integrative mouse and human genomic databases and bioinformatics approaches have been used to understand genotype-phenotype relationships of hundreds of mutated and candidate disease genes in mouse models. The updating and continued improvements of the genomic technologies will eventually help us to unraveling the biological and pathological processes in such a way that they can be translated more efficiently from mouse to human and vise-versa.

  4. Simplified CRISPR tools for efficient genome editing and streamlined protocols for their delivery into mammalian cells and mouse zygotes.

    Science.gov (United States)

    Jacobi, Ashley M; Rettig, Garrett R; Turk, Rolf; Collingwood, Michael A; Zeiner, Sarah A; Quadros, Rolen M; Harms, Donald W; Bonthuis, Paul J; Gregg, Christopher; Ohtsuka, Masato; Gurumurthy, Channabasavaiah B; Behlke, Mark A

    2017-05-15

    Genome editing using the CRISPR/Cas9 system requires the presence of guide RNAs bound to the Cas9 endonuclease as a ribonucleoprotein (RNP) complex in cells, which cleaves the host cell genome at sites specified by the guide RNAs. New genetic material may be introduced during repair of the double-stranded break via homology dependent repair (HDR) if suitable DNA templates are delivered with the CRISPR components. Early methods used plasmid or viral vectors to make these components in the host cell, however newer approaches using recombinant Cas9 protein with synthetic guide RNAs introduced directly as an RNP complex into cells shows faster onset of action with fewer off-target effects. This approach also enables use of chemically modified synthetic guide RNAs that have improved nuclease stability and reduces the risk of triggering an innate immune response in the host cell. This article provides detailed methods for genome editing using the RNP approach with synthetic guide RNAs using lipofection or electroporation in mammalian cells or using microinjection in murine zygotes, with or without addition of a single-stranded HDR template DNA. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  5. The mouse genome displays highly dynamic populations of KRAB-zinc finger protein genes and related genetic units.

    Science.gov (United States)

    Kauzlaric, Annamaria; Ecco, Gabriela; Cassano, Marco; Duc, Julien; Imbeault, Michael; Trono, Didier

    2017-01-01

    KRAB-containing poly-zinc finger proteins (KZFPs) constitute the largest family of transcription factors encoded by mammalian genomes, and growing evidence indicates that they fulfill functions critical to both embryonic development and maintenance of adult homeostasis. KZFP genes underwent broad and independent waves of expansion in many higher vertebrates lineages, yet comprehensive studies of members harbored by a given species are scarce. Here we present a thorough analysis of KZFP genes and related units in the murine genome. We first identified about twice as many elements than previously annotated as either KZFP genes or pseudogenes, notably by assigning to this family an entity formerly considered as a large group of Satellite repeats. We then could delineate an organization in clusters distributed throughout the genome, with signs of recombination, translocation, duplication and seeding of new sites by retrotransposition of KZFP genes and related genetic units (KZFP/rGUs). Moreover, we harvested evidence indicating that closely related paralogs had evolved through both drifting and shifting of sequences encoding for zinc finger arrays. Finally, we could demonstrate that the KAP1-SETDB1 repressor complex tames the expression of KZFP/rGUs within clusters, yet that the primary targets of this regulation are not the KZFP/rGUs themselves but enhancers contained in neighboring endogenous retroelements and that, underneath, KZFPs conserve highly individualized patterns of expression.

  6. Database Description - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us KOME Database... Description General information of database Database name Knowledge-based Oryza Molecular biol...baraki 305-8602, Japan National Institute of Agrobiological Sciences Plant Genome Research Unit Shoshi Kikuchi E-mail : Database... classification Plant databases - Rice Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database...A clones that were completely sequenced in the Rice full-length cDNA project is shown in the database. The f

  7. Database Description - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us GETDB Database Description General information of database Database name GETDB Alternative n...ame Gal4 Enhancer Trap Insertion Database DOI 10.18908/lsdba.nbdc00236-000 Creator Creator Name: Shigeo Haya... Chuo-ku, Kobe 650-0047 Tel: +81-78-306-3185 FAX: +81-78-306-3183 E-mail: Database classification Expression... Invertebrate genome database Organism Taxonomy Name: Drosophila melanogaster Taxonomy ID: 7227 Database des...cription About 4,600 insertion lines of enhancer trap lines based on the Gal4-UAS

  8. Genome-wide identification of targets and function of individual MicroRNAs in mouse embryonic stem cells.

    Directory of Open Access Journals (Sweden)

    Sophie A Hanina

    2010-10-01

    Full Text Available Mouse Embryonic Stem (ES cells express a unique set of microRNAs (miRNAs, the miR-290-295 cluster. To elucidate the role of these miRNAs and how they integrate into the ES cell regulatory network requires identification of their direct regulatory targets. The difficulty, however, arises from the limited complementarity of metazoan miRNAs to their targets, with the interaction requiring as few as six nucleotides of the miRNA seed sequence. To identify miR-294 targets, we used Dicer1-null ES cells, which lack all endogenous mature miRNAs, and introduced just miR-294 into these ES cells. We then employed two approaches to discover miR-294 targets in mouse ES cells: transcriptome profiling using microarrays and a biochemical approach to isolate mRNA targets associated with the Argonaute2 (Ago2 protein of the RISC (RNA Induced Silencing Complex effector, followed by RNA-sequencing. In the absence of Dicer1, the RISC complexes are largely devoid of mature miRNAs and should therefore contain only transfected miR-294 and its base-paired targets. Our data suggest that miR-294 may promote pluripotency by regulating a subset of c-Myc target genes and upregulating pluripotency-associated genes such as Lin28.

  9. Databases for Microbiologists

    Science.gov (United States)

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  10. Advantages of using the CRISPR/Cas9 system of genome editing to investigate male reproductive mechanisms using mouse models

    Directory of Open Access Journals (Sweden)

    Samantha A. M. Young

    2015-01-01

    Full Text Available Gene disruption technology has long been beneficial for the study of male reproductive biology. However, because of the time and cost involved, this technology was not a viable method except in specialist laboratories. The advent of the CRISPR/Cas9 system of gene disruption has ushered in a new era of genetic investigation. Now, it is possible to generate gene-disrupted mouse models in very little time and at very little cost. This Highlight article discusses the application of this technology to study the genetics of male fertility and looks at some of the future uses of this system that could be used to reveal the essential and nonessential genetic components of male reproductive mechanisms.

  11. Renal cell tumors with clear cell histology and intact VHL and chromosome 3p: a histological review of tumors from the Cancer Genome Atlas database.

    Science.gov (United States)

    Favazza, Laura; Chitale, Dhananjay A; Barod, Ravi; Rogers, Craig G; Kalyana-Sundaram, Shanker; Palanisamy, Nallasivam; Gupta, Nilesh S; Williamson, Sean R

    2017-07-21

    Clear cell renal cell carcinoma is by far the most common form of kidney cancer; however, a number of histologically similar tumors are now recognized and considered distinct entities. The Cancer Genome Atlas published data set was queried (http://cbioportal.org) for clear cell renal cell carcinoma tumors lacking VHL gene mutation and chromosome 3p loss, for which whole-slide images were reviewed. Of the 418 tumors in the published Cancer Genome Atlas clear cell renal cell carcinoma database, 387 had VHL mutation, copy number loss for chromosome 3p, or both (93%). Of the remaining, 27/31 had whole-slide images for review. One had 3p loss based on karyotype but not sequencing, and three demonstrated VHL promoter hypermethylation. Nine could be reclassified as distinct or emerging entities: translocation renal cell carcinoma (n=3), TCEB1 mutant renal cell carcinoma (n=3), papillary renal cell carcinoma (n=2), and clear cell papillary renal cell carcinoma (n=1). Of the remaining, 6 had other clear cell renal cell carcinoma-associated gene alterations (PBRM1, SMARCA4, BAP1, SETD2), leaving 11 specimens, including 2 high-grade or sarcomatoid renal cell carcinomas and 2 with prominent fibromuscular stroma (not TCEB1 mutant). One of the remaining tumors exhibited gain of chromosome 7 but lacked histological features of papillary renal cell carcinoma. Two tumors previously reported to harbor TFE3 gene fusions also exhibited VHL mutation, chromosome 3p loss, and morphology indistinguishable from clear cell renal cell carcinoma, the significance of which is uncertain. In summary, almost all clear cell renal cell carcinomas harbor VHL mutation, 3p copy number loss, or both. Of tumors with clear cell histology that lack these alterations, a subset can now be reclassified as other entities. Further study will determine whether additional entities exist, based on distinct genetic pathways that may have implications for treatment.Modern Pathology advance online publication, 21

  12. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    Directory of Open Access Journals (Sweden)

    Charles Richard Bradshaw

    Full Text Available Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10, a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in

  13. Penetrance of craniofacial anomalies in mouse models of Smith-Magenis syndrome is modified by genomic sequence surrounding Rai1: not all null alleles are alike.

    Science.gov (United States)

    Yan, Jiong; Bi, Weimin; Lupski, James R

    2007-03-01

    Craniofacial abnormality is one of the major clinical manifestations of Smith-Magenis syndrome (SMS). Previous analyses in a mixed genetic background of several SMS mouse models--including Df(11)17/+ and Df(11)17-1/+, which have 2-Mb and 590-kb deletions, respectively, and Rai1(-/+)--revealed that the penetrance of the craniofacial phenotype appears to be influenced by deletion size and genetic background. We generated an additional strain with a 1-Mb deletion intermediate in size between the two described above. Remarkably, the penetrance of its craniofacial anomalies in the mixed background was between those of Df(11)17 and Df(11)17-1. We further analyzed the deletion mutations and the Rai1(-/+) allele in a pure C57BL/6 background, to control for nonlinked modifier loci. The penetrance of the craniofacial anomalies was markedly increased for all the strains in comparison with the mixed background. Mice with Df(11)17 and Df(11)17-1 deletions had a similar penetrance, suggesting that penetrance may be less influenced by deletion size, whereas that of Rai1(-/+) mice was significantly lower than that of the deletion strains. We hypothesize that potential trans-regulatory sequence(s) or gene(s) that reside within the 590-kb genomic interval surrounding Rai1 are the major modifying genetic element(s) affecting the craniofacial penetrance. Moreover, we confirmed the influence of genetic background and different deletion sizes on the phenotype. The complicated control of the penetrance for one phenotype in SMS mouse models provides tools to elucidate molecular mechanisms for penetrance and clearly shows that a null allele caused by chromosomal deletion can have different phenotypic consequences than one caused by gene inactivation.

  14. Effect of Duplicate Genes on Mouse Genetic Robustness: An Update

    Directory of Open Access Journals (Sweden)

    Zhixi Su

    2014-01-01

    Full Text Available In contrast to S. cerevisiae and C. elegans, analyses based on the current knockout (KO mouse phenotypes led to the conclusion that duplicate genes had almost no role in mouse genetic robustness. It has been suggested that the bias of mouse KO database toward ancient duplicates may possibly cause this knockout duplicate puzzle, that is, a very similar proportion of essential genes (PE between duplicate genes and singletons. In this paper, we conducted an extensive and careful analysis for the mouse KO phenotype data and corroborated a strong effect of duplicate genes on mouse genetics robustness. Moreover, the effect of duplicate genes on mouse genetic robustness is duplication-age dependent, which holds after ruling out the potential confounding effect from coding-sequence conservation, protein-protein connectivity, functional bias, or the bias of duplicates generated by whole genome duplication (WGD. Our findings suggest that two factors, the sampling bias toward ancient duplicates and very ancient duplicates with a proportion of essential genes higher than that of singletons, have caused the mouse knockout duplicate puzzle; meanwhile, the effect of genetic buffering may be correlated with sequence conservation as well as protein-protein interactivity.

  15. Sequences within both the 5' UTR and Gag are required for optimal in vivo packaging and propagation of mouse mammary tumor virus (MMTV genomic RNA.

    Directory of Open Access Journals (Sweden)

    Farah Mustafa

    Full Text Available BACKGROUND: This study mapped regions of genomic RNA (gRNA important for packaging and propagation of mouse mammary tumor virus (MMTV. MMTV is a type B betaretrovirus which preassembles intracellularly, a phenomenon distinct from retroviruses that assemble the progeny virion at cell surface just before budding such as the type C human and feline immunodeficiency viruses (HIV and FIV. Studies of FIV and Mason-Pfizer monkey virus (MPMV, a type D betaretrovirus with similar intracellular virion assembly processes as MMTV, have shown that the 5' untranslated region (5' UTR and 5' end of gag constitute important packaging determinants for gRNA. METHODOLOGY: Three series of MMTV transfer vectors containing incremental amounts of gag or 5' UTR sequences, or incremental amounts of 5' UTR in the presence of 400 nucleotides (nt of gag were constructed to delineate the extent of 5' sequences that may be involved in MMTV gRNA packaging. Real time PCR measured the packaging efficiency of these vector RNAs into MMTV particles generated by co-transfection of MMTV Gag/Pol, vesicular stomatitis virus envelope glycoprotein (VSV-G Env, and individual transfer vectors into human 293T cells. Transfer vector RNA propagation was monitored by measuring transduction of target HeLaT4 cells following infection with viral particles containing a hygromycin resistance gene expression cassette on the packaged RNA. PRINCIPAL FINDINGS: MMTV requires the entire 5' UTR and a minimum of ~120 nucleotide (nt at the 5' end of gag for not only efficient gRNA packaging but also propagation of MMTV-based transfer vector RNAs. Vector RNAs without the entire 5' UTR were defective for both efficient packaging and propagation into target cells. CONCLUSIONS/SIGNIFICANCE: These results reveal that the 5' end of MMTV genome is critical for both gRNA packaging and propagation, unlike the recently delineated FIV and MPMV packaging determinants that have been shown to be of bipartite nature.

  16. Generating a transgenic mouse line stably expressing human MHC surface antigen from a HAC carrying multiple genomic BACs.

    Science.gov (United States)

    Hasegawa, Yoshinori; Ishikura, Tomoyuki; Hasegawa, Takanori; Watanabe, Takashi; Suzuki, Junpei; Nakayama, Manabu; Okamura, Yoshiaki; Okazaki, Tuneko; Koseki, Haruhiko; Ohara, Osamu; Ikeno, Masashi; Masumoto, Hiroshi

    2015-03-01

    The human artificial chromosome (HAC) vector is a promising tool to improve the problematic suppression and position effects of transgene expression frequently seen in transgenic cells and animals produced by conventional plasmid or viral vectors. We generated transgenic mice maintaining a single HAC vector carrying two genomic bacterial artificial chromosomes (BACs) from human HLA-DR loci (DRA and DRB1). Both transgenes on the HAC in transgenic mice exhibited tissue-specific expression in kidney, liver, lung, spleen, lymph node, bone marrow, and thymus cells in RT-PCR analysis. Stable functional expression of a cell surface HLA-DR marker from both transgenes, DRA and DRB1 on the HAC, was detected by flow cytometric analysis of splenocytes and maintained through at least eight filial generations. These results indicate that the de novo HAC system can allow us to manipulate multiple BAC transgenes with coordinated expression as a surface antigen through the generation of transgenic animals.

  17. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    Science.gov (United States)

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

  18. Structural Genomics of Protein Phosphatases

    Energy Technology Data Exchange (ETDEWEB)

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  19. Essential developmental, genomic stability, and tumour suppressor functions of the mouse orthologue of hSSB1/NABP2.

    Directory of Open Access Journals (Sweden)

    Wei Shi

    Full Text Available Single-stranded DNA binding proteins (SSBs regulate multiple DNA transactions, including replication, transcription, and repair. We recently identified SSB1 as a novel protein critical for the initiation of ATM signaling and DNA double-strand break repair by homologous recombination. Here we report that germline Ssb1(-/- embryos die at birth from respiratory failure due to severe rib cage malformation and impaired alveolar development, coupled with additional skeletal defects. Unexpectedly, Ssb1(-/- fibroblasts did not exhibit defects in Atm signaling or γ-H2ax focus kinetics in response to ionizing radiation (IR, and B-cell specific deletion of Ssb1 did not affect class-switch recombination in vitro. However, conditional deletion of Ssb1 in adult mice led to increased cancer susceptibility with broad tumour spectrum, impaired male fertility with testicular degeneration, and increased radiosensitivity and IR-induced chromosome breaks in vivo. Collectively, these results demonstrate essential roles of Ssb1 in embryogenesis, spermatogenesis, and genome stability in vivo.

  20. Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available GETDB Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) Data ...detail Data name Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster ...the Drosophila GAL4 enhancer trap element are clustered by the closeness of their positions from each other.... Us Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) - GETDB | LSDB Archive ...

  1. Genome-wide screen for salmonella genes required for long-term systemic infection of the mouse.

    Directory of Open Access Journals (Sweden)

    2006-02-01

    Full Text Available A microarray-based negative selection screen was performed to identify Salmonella enterica serovar Typhimurium (serovar Typhimurium genes that contribute to long-term systemic infection in 129X1/SvJ (Nramp1(r mice. A high-complexity transposon-mutagenized library was used to infect mice intraperitoneally, and the selective disappearance of mutants was monitored after 7, 14, 21, and 28 d postinfection. One hundred and eighteen genes were identified to contribute to serovar Typhimurium infection of the spleens of mice by 28 d postinfection. The negatively selected mutants represent many known aspects of Salmonella physiology and pathogenesis, although the majority of the identified genes are of putative or unknown function. Approximately 30% of the negatively selected genes correspond to horizontally acquired regions such as those within Salmonella pathogenicity islands (SPI 1-5, prophages (Gifsy-1 and -2 and remnant, and the pSLT virulence plasmid. In addition, mutations in genes responsible for outer membrane structure and remodeling, such as LPS- and PhoP-regulated and fimbrial genes, were also selected against. Competitive index experiments demonstrated that the secreted SPI2 effectors SseK2 and SseJ as well as the SPI4 locus are attenuated relative to wild-type bacteria during systemic infection. Interestingly, several SPI1-encoded type III secretion system effectors/translocases are required by serovar Typhimurium to establish and, unexpectedly, to persist systemically, challenging the present description of Salmonella pathogenesis. Moreover, we observed a progressive selection against serovar Typhimurium mutants based upon the duration of the infection, suggesting that different classes of genes may be required at distinct stages of infection. Overall, these data indicate that Salmonella long-term systemic infection in the mouse requires a diverse repertoire of virulence factors. This diversity of genes presumably reflects the fact that

  2. Legume and Lotus japonicus Databases

    DEFF Research Database (Denmark)

    Hirakawa, Hideki; Mun, Terry; Sato, Shusei

    2014-01-01

    Since the genome sequence of Lotus japonicus, a model plant of family Fabaceae, was determined in 2008 (Sato et al. 2008), the genomes of other members of the Fabaceae family, soybean (Glycine max) (Schmutz et al. 2010) and Medicago truncatula (Young et al. 2011), have been sequenced. In this sec....... In this section, we introduce representative, publicly accessible online resources related to plant materials, integrated databases containing legume genome information, and databases for genome sequence and derived marker information of legume species including L. japonicus...

  3. Genome-wide analysis of DHEA- and DHT-induced gene expression in mouse hypothalamus and hippocampus.

    Science.gov (United States)

    Mo, Qianxing; Lu, Shifang; Garippa, Carrie; Brownstein, Michael J; Simon, Neal G

    2009-04-01

    Dehydroepiandrosterone (DHEA) is the most abundant steroid in humans and a multi-functional neuroactive steroid that has been implicated in a variety of biological effects in both the periphery and central nervous system. Mechanistic studies of DHEA in the periphery have emphasized its role as a prohormone and those in the brain have focused on effects exerted at cell surface receptors. Recent results demonstrated that DHEA is intrinsically androgenic. It competes with DHT for binding to androgen receptor (AR), induces AR-regulated reporter gene expression in vitro, and exogenous DHEA administration regulates gene expression in peripheral androgen-dependent tissues and LnCAP prostate cancer cells, indicating genomic effects and adding a level of complexity to functional models. The absence of information about the effect of DHEA on gene expression in the CNS is a significant gap in light of continuing clinical interest in the compound as a hormone replacement therapy in older individuals, patients with adrenal insufficiency, and as a treatment that improves sense of well-being, increases libido, relieves depressive symptoms, and serves as a neuroprotective agent. In the present study, ovariectomized CF-1 female mice, an established model for assessing CNS effects of androgens, were treated with DHEA (1mg/day), dihydrotestosterone (DHT, a potent androgen used as a positive control; 0.1mg/day) or vehicle (negative control) for 7 days. The effects of DHEA on gene expression were assessed in two regions of the CNS that are enriched in AR, hypothalamus and hippocampus, using DNA microarray, real-time RT-PCR, and immunohistochemistry. RIA of serum samples assessed treatment effects on circulating levels of major steroids. In hypothalamus, DHEA and DHT significantly up-regulated the gene expression of hypocretin (Hcrt; also called orexin), pro-melanin-concentrating hormone (Pmch), and protein kinase C delta (Prkcd), and down-regulated the expression of deleted in bladder

  4. The vasculome of the mouse brain.

    Directory of Open Access Journals (Sweden)

    Shuzhen Guo

    Full Text Available The blood vessel is no longer viewed as passive plumbing for the brain. Increasingly, experimental and clinical findings suggest that cerebral endothelium may possess endocrine and paracrine properties - actively releasing signals into and receiving signals from the neuronal parenchyma. Hence, metabolically perturbed microvessels may contribute to central nervous system (CNS injury and disease. Furthermore, cerebral endothelium can serve as sensors and integrators of CNS dysfunction, releasing measurable biomarkers into the circulating bloodstream. Here, we define and analyze the concept of a brain vasculome, i.e. a database of gene expression patterns in cerebral endothelium that can be linked to other databases and systems of CNS mediators and markers. Endothelial cells were purified from mouse brain, heart and kidney glomeruli. Total RNA were extracted and profiled on Affymetrix mouse 430 2.0 micro-arrays. Gene expression analysis confirmed that these brain, heart and glomerular preparations were not contaminated by brain cells (astrocytes, oligodendrocytes, or neurons, cardiomyocytes or kidney tubular cells respectively. Comparison of the vasculome between brain, heart and kidney glomeruli showed that endothelial gene expression patterns were highly organ-dependent. Analysis of the brain vasculome demonstrated that many functionally active networks were present, including cell adhesion, transporter activity, plasma membrane, leukocyte transmigration, Wnt signaling pathways and angiogenesis. Analysis of representative genome-wide-association-studies showed that genes linked with Alzheimer's disease, Parkinson's disease and stroke were detected in the brain vasculome. Finally, comparison of our mouse brain vasculome with representative plasma protein databases demonstrated significant overlap, suggesting that the vasculome may be an important source of circulating signals in blood. Perturbations in cerebral endothelial function may profoundly

  5. The top skin-associated genes: a comparative analysis of human and mouse skin transcriptomes.

    Science.gov (United States)

    Gerber, Peter Arne; Buhren, Bettina Alexandra; Schrumpf, Holger; Homey, Bernhard; Zlotnik, Albert; Hevezi, Peter

    2014-06-01

    The mouse represents a key model system for the study of the physiology and biochemistry of skin. Comparison of skin between mouse and human is critical for interpretation and application of data from mouse experiments to human disease. Here, we review the current knowledge on structure and immunology of mouse and human skin. Moreover, we present a systematic comparison of human and mouse skin transcriptomes. To this end, we have recently used a genome-wide database of human gene expression to identify genes highly expressed in skin, with no, or limited expression elsewhere - human skin-associated genes (hSAGs). Analysis of our set of hSAGs allowed us to generate a comprehensive molecular characterization of healthy human skin. Here, we used a similar database to generate a list of mouse skin-associated genes (mSAGs). A comparative analysis between the top human (n=666) and mouse (n=873) skin-associated genes (SAGs) revealed a total of only 30.2% identity between the two lists. The majority of shared genes encode proteins that participate in structural and barrier functions. Analysis of the top functional annotation terms revealed an overlap for morphogenesis, cell adhesion, structure, and signal transduction. The results of this analysis, discussed in the context of published data, illustrate the diversity between the molecular make up of skin of both species and grants a probable explanation, why results generated in murine in vivo models often fail to translate into the human.

  6. MIPS plant genome information resources.

    Science.gov (United States)

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  7. Integration of Genome-Wide Computation DRE Search, AhR ChIP-chip and Gene Expression Analyses of TCDD-Elicited Responses in the Mouse Liver

    Directory of Open Access Journals (Sweden)

    Matthews Jason

    2011-07-01

    Full Text Available Abstract Background The aryl hydrocarbon receptor (AhR is a ligand-activated transcription factor (TF that mediates responses to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD. Integration of TCDD-induced genome-wide AhR enrichment, differential gene expression and computational dioxin response element (DRE analyses further elucidate the hepatic AhR regulatory network. Results Global ChIP-chip and gene expression analyses were performed on hepatic tissue from immature ovariectomized mice orally gavaged with 30 μg/kg TCDD. ChIP-chip analysis identified 14,446 and 974 AhR enriched regions (1% false discovery rate at 2 and 24 hrs, respectively. Enrichment density was greatest in the proximal promoter, and more specifically, within ± 1.5 kb of a transcriptional start site (TSS. AhR enrichment also occurred distal to a TSS (e.g. intergenic DNA and 3' UTR, extending the potential gene expression regulatory roles of the AhR. Although TF binding site analyses identified over-represented DRE sequences within enriched regions, approximately 50% of all AhR enriched regions lacked a DRE core (5'-GCGTG-3'. Microarray analysis identified 1,896 number of TCDD-responsive genes (|fold change| ≥ 1.5, P1(t > 0.999. Integrating this gene expression data with our ChIP-chip and DRE analyses only identified 625 differentially expressed genes that involved an AhR interaction at a DRE. Functional annotation analysis of differentially regulated genes associated with AhR enrichment identified overrepresented processes related to fatty acid and lipid metabolism and transport, and xenobiotic metabolism, which are consistent with TCDD-elicited steatosis in the mouse liver. Conclusions Details of the AhR regulatory network have been expanded to include AhR-DNA interactions within intragenic and intergenic genomic regions. Moreover, the AhR can interact with DNA independent of a DRE core suggesting there are alternative mechanisms of AhR-mediated gene regulation.

  8. QuadBase: genome-wide database of G4 DNA—occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes

    OpenAIRE

    Yadav, Vinod Kumar; Abraham, James Kappukalayil; Mani, Prithvi; Kulshrestha, Rashi; Chowdhury, Shantanu

    2007-01-01

    Emerging evidence indicates the importance of G-quadruplex motifs as drug targets. [Stuart A. Borman, Ascent of quadruplexes—nucleic acid structures become promising drug targets. Chem. Eng. News, 2007;85, 12–17], which stems from the fact that these motifs are present in a surprising number of promoters wherein their role in controlling gene expression has been demonstrated for a few. We present a compendium of quadruplex motifs, with particular focus on their occurrence and conservation in ...

  9. HOLLYWOOD: a comparative relational database of alternative splicing.

    Science.gov (United States)

    Holste, Dirk; Huo, George; Tung, Vivian; Burge, Christopher B

    2006-01-01

    RNA splicing is an essential step in gene expression, and is often variable, giving rise to multiple alternatively spliced mRNA and protein isoforms from a single gene locus. The design of effective databases to support experimental and computational investigations of alternative splicing (AS) is a significant challenge. In an effort to integrate accurate exon and splice site annotation with current knowledge about splicing regulatory elements and predicted AS events, and to link information about the splicing of orthologous genes in different species, we have developed the Hollywood system. This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags, and links features such as splice site sequence and strength, exonic splicing enhancers and silencers, conserved and non-conserved patterns of splicing, and cDNA library information for inferred alternative exons. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene. A streamlined graphical representation of gene splicing patterns is provided, and these patterns can alternatively be layered onto existing information in the UCSC Genome Browser. The database is accessible at http://hollywood.mit.edu.

  10. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells.

    Science.gov (United States)

    Sharova, Lioudmila V; Sharov, Alexei A; Nedorezov, Timur; Piao, Yulan; Shaik, Nabeebi; Ko, Minoru S H

    2009-02-01

    Degradation of mRNA is one of the key processes that control the steady-state level of gene expression. However, the rate of mRNA decay for the majority of genes is not known. We successfully obtained the rate of mRNA decay for 19 977 non-redundant genes by microarray analysis of RNA samples obtained from mouse embryonic stem (ES) cells. Median estimated half-life was 7.1 h and only genes, including Prdm1, Myc, Gadd45 g, Foxa2, Hes5 and Trib1, showed half-life less than 1 h. In general, mRNA species with short half-life were enriched among genes with regulatory functions (transcription factors), whereas mRNA species with long half-life were enriched among genes related to metabolism and structure (extracellular matrix, cytoskeleton). The stability of mRNAs correlated more significantly with the structural features of genes than the function of genes: mRNA stability showed the most significant positive correlation with the number of exon junctions per open reading frame length, and negative correlation with the presence of PUF-binding motifs and AU-rich elements in 3'-untranslated region (UTR) and CpG di-nucleotides in the 5'-UTR. The mRNA decay rates presented in this report are the largest data set for mammals and the first for ES cells.

  11. Database for mRNA Half-Life of 19 977 Genes Obtained by DNA Microarray Analysis of Pluripotent and Differentiating Mouse Embryonic Stem Cells

    Science.gov (United States)

    Sharova, Lioudmila V.; Sharov, Alexei A.; Nedorezov, Timur; Piao, Yulan; Shaik, Nabeebi; Ko, Minoru S.H.

    2009-01-01

    Degradation of mRNA is one of the key processes that control the steady-state level of gene expression. However, the rate of mRNA decay for the majority of genes is not known. We successfully obtained the rate of mRNA decay for 19 977 non-redundant genes by microarray analysis of RNA samples obtained from mouse embryonic stem (ES) cells. Median estimated half-life was 7.1 h and only <100 genes, including Prdm1, Myc, Gadd45 g, Foxa2, Hes5 and Trib1, showed half-life less than 1 h. In general, mRNA species with short half-life were enriched among genes with regulatory functions (transcription factors), whereas mRNA species with long half-life were enriched among genes related to metabolism and structure (extracellular matrix, cytoskeleton). The stability of mRNAs correlated more significantly with the structural features of genes than the function of genes: mRNA stability showed the most significant positive correlation with the number of exon junctions per open reading frame length, and negative correlation with the presence of PUF-binding motifs and AU-rich elements in 3′-untranslated region (UTR) and CpG di-nucleotides in the 5′-UTR. The mRNA decay rates presented in this report are the largest data set for mammals and the first for ES cells. PMID:19001483

  12. Did androgen-binding protein paralogs undergo neo- and/or Subfunctionalization as the Abp gene region expanded in the mouse genome?

    Science.gov (United States)

    Karn, Robert C; Chung, Amanda G; Laukaitis, Christina M

    2014-01-01

    The Androgen-binding protein (Abp) region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1) no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2) substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3) that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution.

  13. Did androgen-binding protein paralogs undergo neo- and/or Subfunctionalization as the Abp gene region expanded in the mouse genome?

    Directory of Open Access Journals (Sweden)

    Robert C Karn

    Full Text Available The Androgen-binding protein (Abp region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1 no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2 substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3 that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution.

  14. Genome engineering via homologous recombination in mouse embryonic stem (ES cells: an amazingly versatile tool for the study of mammalian biology

    Directory of Open Access Journals (Sweden)

    CHARLES BABINET

    2001-09-01

    Full Text Available The ability to introduce genetic modifications in the germ line of complex organisms has be