WorldWideScience

Sample records for databases overview sequencing

  1. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  2. Danish clinical databases: An overview

    DEFF Research Database (Denmark)

    Green, Anders

    2011-01-01

    Clinical databases contain data related to diagnostic procedures, treatments and outcomes. In 2001, a scheme was introduced for the approval, supervision and support to clinical databases in Denmark.......Clinical databases contain data related to diagnostic procedures, treatments and outcomes. In 2001, a scheme was introduced for the approval, supervision and support to clinical databases in Denmark....

  3. Relational Database Technology: An Overview.

    Science.gov (United States)

    Melander, Nicole

    1987-01-01

    Describes the development of relational database technology as it applies to educational settings. Discusses some of the new tools and models being implemented in an effort to provide educators with technologically advanced ways of answering questions about education programs and data. (TW)

  4. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  5. Overview of NoSQL and comparison with SQL database ...

    African Journals Online (AJOL)

    Overview of NoSQL and comparison with SQL database management systems. ... Abstract. The increasing need for space in the database community has caused the revolution named NoSQL ‗Not Only SQL'. ... HOW TO USE AJOL.

  6. The International Nucleotide Sequence Database Collaboration.

    Science.gov (United States)

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Nakamura, Yasukazu

    2011-01-01

    Under the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), globally comprehensive public domain nucleotide sequence is captured, preserved and presented. The partners of this long-standing collaboration work closely together to provide data formats and conventions that enable consistent data submission to their databases and support regular data exchange around the globe. Clearly defined policy and governance in relation to free access to data and relationships with journal publishers have positioned INSDC databases as a key provider of the scientific record and a core foundation for the global bioinformatics data infrastructure. While growth in sequence data volumes comes no longer as a surprise to INSDC partners, the uptake of next-generation sequencing technology by mainstream science that we have witnessed in recent years brings a step-change to growth, necessarily making a clear mark on INSDC strategy. In this article, we introduce the INSDC, outline data growth patterns and comment on the challenges of increased growth.

  7. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  8. Winnowing sequences from a database search.

    Science.gov (United States)

    Berman, P; Zhang, Z; Wolf, Y I; Koonin, E V; Miller, W

    2000-01-01

    In database searches for sequence similarity, matches to a distinct sequence region (e.g., protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N* is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.

  9. Databases for neurogenetics: introduction, overview, and challenges.

    Science.gov (United States)

    Sobrido, María-Jesús; Cacheiro, Pilar; Carracedo, Angel; Bertram, Lars

    2012-09-01

    The importance for research and clinical utility of mutation databases, as well as the issues and difficulties entailed in their construction, is discussed within the Human Variome Project. While general principles and standards can apply to most human diseases, some specific questions arise when dealing with the nature of genetic neurological disorders. So far, publically accessible mutation databases exist for only about half of the genes causing neurogenetic disorders; and a considerable work is clearly still needed to optimize their content. The current landscape, main challenges, some potential solutions, and future perspectives on genetic databases for disorders of the nervous system are reviewed in this special issue of Human Mutation on neurogenetics. © 2012 Wiley Periodicals, Inc.

  10. The National Solar Radiation Database (NSRDB): A Brief Overview

    Energy Technology Data Exchange (ETDEWEB)

    Habte, Aron M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Sengupta, Manajit [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Lopez, Anthony [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2017-09-25

    This poster presents a high-level overview of the National Solar Radiation Database (NSRDB). The NSRDB uses the physics-based model (PSM), which was developed using: adapted PATMOS-X model for cloud identification and properties, REST-2 model for clear-sky conditions, and NREL's Fast All-sky Radiation Model for Solar Applications (FARMS) for cloudy-sky Global Horizontal Irradiance (GHI) solar irradiance calculations.

  11. MIPS: a database for protein sequences and complete genomes.

    Science.gov (United States)

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  12. Characterization analysis database system (CADS). A system overview

    International Nuclear Information System (INIS)

    1997-12-01

    The CADS database is a standardized, quality-assured, and configuration-controlled data management system developed to assist in the task of characterizing the DOE surplus HEU material. Characterization of the surplus HEU inventory includes identifying the specific material; gathering existing data about the inventory; defining the processing steps that may be necessary to prepare the material for transfer to a blending site; and, ultimately, developing a range of the preliminary cost estimates for those processing steps. Characterization focuses on producing commercial reactor fuel as the final step in material disposition. Based on the project analysis results, the final determination will be made as to the viability of the disposition path for each particular item of HEU. The purpose of this document is to provide an informational overview of the CADS database, its evolution, and its current capabilities. This document describes the purpose of CADS, the system requirements it fulfills, the database structure, and the operational guidelines of the system

  13. Study of event sequence database for a nuclear power domain

    International Nuclear Information System (INIS)

    Kusumi, Yoshiaki

    1998-01-01

    A retrieval engine developed to extract event sequences from an accident information database using a time series retrieval formula expressed with ordered retrieval terms is explored. This engine outputs not only a sequence which completely matches with a time series retrieval formula, but also sequence which approximately matches the formula (fuzzy retrieval). An event sequence database in which records consist of three ordered parameters, namely the causal event, the process and result. Then the database is used to assess the feasibility of this engine and favorable results were obtained. (author)

  14. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  15. Polymorphism Sequence - JSNP | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us JSNP Polymorphism Sequence Data detail Data name Polymorphism Sequence DOI 10.18908/lsdba.nb...dc00114-001 Description of data contents Information on polymorphisms (SNPs and insertions/deletions) and th...se Name database name JSNP_SNP: single nucleotide polymorphism JSNP_InsDel_IND: insertion/deletion JSNP_InsD...ved allele observed 3' Flanking Sequence 3' flanking sequence Offset in Flanking Sequence position of the polymorphism...uence Accession No. accession No. of the sequence for polymorphism screening Offset in Record position of the polymorphism

  16. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  17. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  18. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Directory of Open Access Journals (Sweden)

    Qingyu Chen

    Full Text Available First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases.We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  19. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  20. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States); Univ. of California, Berkeley, CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  1. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  2. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  3. BIOPEP database and other programs for processing bioactive peptide sequences.

    Science.gov (United States)

    Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata

    2008-01-01

    This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.

  4. The development and application of a Mycoplasma gallisepticum sequence database.

    Science.gov (United States)

    Armour, Natalie K; Laibinis, Victoria A; Collett, Stephen R; Ferguson-Noel, Naola

    2013-01-01

    Molecular analysis was conducted on 36 Mycoplasma gallisepticum DNA extracts from tracheal swab samples of commercial poultry in seven South African provinces between 2009 and 2012. Twelve unique M. gallisepticum genotypes were identified by polymerase chain reaction and sequence analysis of the 16S-23S rRNA intergenic spacer region (IGSR), M. gallisepticum cytadhesin 2 (mgc2), MGA_0319 and gapA genetic regions. The DNA sequences of these genotypes were distinct from those of M. gallisepticum isolates in a database composed of sequences from other countries, vaccine and reference strains. The most prevalent genotype (SA-WT#7) was detected in samples from commercial broilers, broiler breeders and layers in five provinces. South African M. gallisepticum sequences were more similar to those of the live vaccines commercially available in South Africa, but were distinct from that of F strain vaccine, which is not registered for use in South Africa. The IGSR, mgc2 or MGA_0319 sequences of three South African genotypes were identical to those of the ts-11 vaccine strain, necessitating a combination of mgc2 and IGSR targeted sequencing to differentiate South African wild-type genotypes from ts-11 vaccine. To identify and differentiate all 12 wild-types, mgc2, IGSR and MGA_0319 sequencing was required. Sequencing of gapA was least effective at strain differentiation. This research serves as a model for the development of an M. gallisepticum sequence database, and illustrates its application to characterize M. gallisepticum genotypes, select diagnostic tests and better understand the epidemiology of M. gallisepticum.

  5. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  6. Comprehensive Genetic Database of Expressed Sequence Tags for Coccolithophorids

    Science.gov (United States)

    Ranji, Mohammad; Hadaegh, Ahmad R.

    Coccolithophorids are unicellular, marine, golden-brown, single-celled algae (Haptophyta) commonly found in near-surface waters in patchy distributions. They belong to the Phytoplankton family that is known to be responsible for much of the earth reproduction. Phytoplankton, just like plants live based on the energy obtained by Photosynthesis which produces oxygen. Substantial amount of oxygen in the earth's atmosphere is produced by Phytoplankton through Photosynthesis. The single-celled Emiliana Huxleyi is the most commonly known specie of Coccolithophorids and is known for extracting bicarbonate (HCO3) from its environment and producing calcium carbonate to form Coccoliths. Coccolithophorids are one of the world's primary producers, contributing about 15% of the average oceanic phytoplankton biomass to the oceans. They produce elaborate, minute calcite platelets (Coccoliths), covering the cell to form a Coccosphere and supplying up to 60% of the bulk pelagic calcite deposited on the sea floors. In order to understand the genetics of Coccolithophorid and the complexities of their biochemical reactions, we decided to build a database to store a complete profile of these organisms' genomes. Although a variety of such databases currently exist, (http://www.geneservice.co.uk/home/) none have yet been developed to comprehensively address the sequencing efforts underway by the Coccolithophorid research community. This database is called CocooExpress and is available to public (http://bioinfo.csusm.edu) for both data queries and sequence contribution.

  7. MSDB: A Comprehensive Database of Simple Sequence Repeats.

    Science.gov (United States)

    Avvaru, Akshay Kumar; Saxena, Saketh; Sowpati, Divya Tej; Mishra, Rakesh Kumar

    2017-06-01

    Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  8. Overview of FEED, the feeding experiments end-user database.

    Science.gov (United States)

    Wall, Christine E; Vinyard, Christopher J; Williams, Susan H; Gapeyev, Vladimir; Liu, Xianhua; Lapp, Hilmar; German, Rebecca Z

    2011-08-01

    The Feeding Experiments End-user Database (FEED) is a research tool developed by the Mammalian Feeding Working Group at the National Evolutionary Synthesis Center that permits synthetic, evolutionary analyses of the physiology of mammalian feeding. The tasks of the Working Group are to compile physiologic data sets into a uniform digital format stored at a central source, develop a standardized terminology for describing and organizing the data, and carry out a set of novel analyses using FEED. FEED contains raw physiologic data linked to extensive metadata. It serves as an archive for a large number of existing data sets and a repository for future data sets. The metadata are stored as text and images that describe experimental protocols, research subjects, and anatomical information. The metadata incorporate controlled vocabularies to allow consistent use of the terms used to describe and organize the physiologic data. The planned analyses address long-standing questions concerning the phylogenetic distribution of phenotypes involving muscle anatomy and feeding physiology among mammals, the presence and nature of motor pattern conservation in the mammalian feeding muscles, and the extent to which suckling constrains the evolution of feeding behavior in adult mammals. We expect FEED to be a growing digital archive that will facilitate new research into understanding the evolution of feeding anatomy.

  9. An international overview of database on R and D techniques

    International Nuclear Information System (INIS)

    Yanagihara, S.

    2005-01-01

    Full text: At the early stage of decommissioning activities regarding relatively small nuclear facilities, various techniques for the decommissioning and dismantling (D and D) process had been developed to be applied to D and D projects in 1980's. It was then confirmed that the present techniques are available for decommissioning nuclear facilities by demonstrating successful completion of the projects. However, improvement of the present techniques is necessary for efficient adaptation to D and D process in cost efficient and safe manner when decommissioning commercial nuclear facilities with large scale and higher level contamination. By the review of present status of D and D projects and decommissioning strategies, it was cleared that cost saving is one of important aspect in the D and D projects. From technical points of view in efficiency, techniques such as laser beam cutting, automated remote control of devices and computer simulation are expected to be useful tools for future use in cost saving process. In addition, techniques should be properly applied to D and D projects within wide range of areas such as dismantling and decontamination process, project planning, waste management, communication with regulatory body and public, safety evaluation and environmental impacts. To cope with overall D and D activities, it might be necessary to make continuous efforts to improve the techniques and construct database for evaluation from wide range of views of D and D projects. (author)

  10. Personal Databases: Of Filing Cabinets and Idiosyncrasy [and] Library Automation: An Overview of the Market.

    Science.gov (United States)

    Molholt, Pat; McDonald, David R.

    1989-01-01

    The first of two articles describes how a team effort by computing centers and academic libraries could aid faculty in the organization of their personal databases. The second provides an overview of the academic library automation market, identifying vendors active in the market and trends of recent years. (CLB)

  11. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  12. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  13. MIPS: a database for protein sequences, homology data and yeast genome information.

    Science.gov (United States)

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  14. Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences

    Directory of Open Access Journals (Sweden)

    Makiko Suwa

    2011-04-01

    Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.

  15. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  16. AgdbNet – antigen sequence database software for bacterial typing

    Directory of Open Access Journals (Sweden)

    Maiden Martin CJ

    2006-06-01

    Full Text Available Abstract Background Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences. Results Databases are described by XML files that are parsed by a Perl CGI script. Each database can have any number of loci, which may be defined by nucleotide and/or peptide sequences. The software is currently in use on at least five public databases for the typing of Neisseria meningitidis, Campylobacter jejuni and Streptococcus equi and can be set up to query internal isolate tables or suitably-configured external isolate databases, such as those used for multilocus sequence typing. The style of the resulting website can be fully configured by modifying stylesheets and through the use of customised header and footer files that surround the output of the script. Conclusion The software provides a rapid means of setting up customised Internet antigen sequence databases. The flexible configuration options enable typing schemes with differing requirements to be accommodated.

  17. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Directory of Open Access Journals (Sweden)

    Suldbold Jargalmaa

    2017-07-01

    Full Text Available Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II. These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species.

  18. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Science.gov (United States)

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  19. Severe accident sequence assessment for boiling water reactors: program overview

    International Nuclear Information System (INIS)

    Fontana, M.H.

    1980-10-01

    The Severe Accident Sequence Assessment (SASA) Program was started at the Oak Ridge National Laboratory (ORNL) in June 1980. This report documents the initial planning, specification of objectives, potential uses of the results, plan of attack, and preliminary results. ORNL was assigned the Brown's Ferry Unit 1 Plant with the station blackout being the initial sequence set to be addressed. This set includes: (1) loss of offsite and onsite ac power with no coolant injection; and (2) loss of offsite and onsite ac power with high pressure coolant injection (HPCI) and reactor core isolation cooling (RCIC) as long as dc power supply lasts. This report includes representative preliminary results for the former case

  20. License - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project License to Use This Database Last updated : 2010/02/15 You may use this databas...ional License described below. The Standard License specifies the license terms regarding the use of this database... and the requirements you must follow in using this database. The Additiona...n the Standard License. Standard License The Standard License for this database is the license specified in ...the Creative Commons Attribution-Share Alike 2.1 Japan . If you use data from this database

  1. The IAEA's Net Enabled Waste Management Database: Overview and current status

    International Nuclear Information System (INIS)

    Csullog, G.W.; Bell, M.J.; Pozdniakov, I.; Petison, G.; Kostitsin, V.

    2002-01-01

    The IAEA's Net Enabled Waste Management Database (NEWMDB) contains information on national radioactive waste management programmes and organizations, plans and activities, relevant laws and regulations, policies and radioactive waste inventories. The NEWMDB, which was launched on the Internet on 6 July 2001, is the successor to the IAEA's Waste Management Database (WMDB), which was in use during the 1990's. The NEWMDB's first data collection cycle took place from July 2001 to March 2002. This paper provides an overview of the NEWMDB, it describes the results of the first data collection cycle, and it discusses the way forward for additional data collection cycles. Three companion papers describe (1) the role of the NEWMDB as an international source of information about radioactive waste management, (2) issues related to the variety of waste classification schemes used by IAEA Member States, and (3) the NEWMDB in the context of an indicator of sustainable development for radioactive waste management. (author)

  2. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

    Directory of Open Access Journals (Sweden)

    Gibbs Mark J

    2008-02-01

    Full Text Available Abstract Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  3. The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

    Science.gov (United States)

    Fourment, Mathieu; Gibbs, Mark J

    2008-02-05

    Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.

  4. PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species

    Directory of Open Access Journals (Sweden)

    Lalucat Jorge

    2010-04-01

    Full Text Available Abstract Background The genus Pseudomonas comprises more than 100 species of environmental, clinical, agricultural, and biotechnological interest. Although, the recommended method for discriminating bacterial species is DNA-DNA hybridisation, alternative techniques based on multigenic sequence analysis are becoming a common practice in bacterial species discrimination studies. Since there is not a general criterion for determining which genes are more useful for species resolution; the number of strains and genes analysed is increasing continuously. As a result, sequences of different genes are dispersed throughout several databases. This sequence information needs to be collected in a common database, in order to be useful for future identification-based projects. Description The PseudoMLSA Database is a comprehensive database of multiple gene sequences from strains of Pseudomonas species. The core of the database is composed of selected gene sequences from all Pseudomonas type strains validly assigned to the genus through 2008. The database is aimed to be useful for MultiLocus Sequence Analysis (MLSA procedures, for the identification and characterisation of any Pseudomonas bacterial isolate. The sequences are available for download via a direct connection to the National Center for Biotechnology Information (NCBI. Additionally, the database includes an online BLAST interface for flexible nucleotide queries and similarity searches with the user's datasets, and provides a user-friendly output for easily parsing, navigating, and analysing BLAST results. Conclusions The PseudoMLSA database amasses strains and sequence information of validly described Pseudomonas species, and allows free querying of the database via a user-friendly, web-based interface available at http://www.uib.es/microbiologiaBD/Welcome.html. The web-based platform enables easy retrieval at strain or gene sequence information level; including references to published peer

  5. Quality standards for DNA sequence variation databases to improve clinical management under development in Australia

    Directory of Open Access Journals (Sweden)

    B. Bennetts

    2014-09-01

    Full Text Available Despite the routine nature of comparing sequence variations identified during clinical testing to database records, few databases meet quality requirements for clinical diagnostics. To address this issue, The Royal College of Pathologists of Australasia (RCPA in collaboration with the Human Genetics Society of Australasia (HGSA, and the Human Variome Project (HVP is developing standards for DNA sequence variation databases intended for use in the Australian clinical environment. The outputs of this project will be promoted to other health systems and accreditation bodies by the Human Variome Project to support the development of similar frameworks in other jurisdictions.

  6. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    Energy Technology Data Exchange (ETDEWEB)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in

  7. Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

    Science.gov (United States)

    Owens, John

    2009-01-01

    Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.

  8. Combining next-generation sequencing and online databases for microsatellite development in non-model organisms.

    Science.gov (United States)

    Rico, Ciro; Normandeau, Eric; Dion-Côté, Anne-Marie; Rico, María Inés; Côté, Guillaume; Bernatchez, Louis

    2013-12-03

    Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes.

  9. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    Science.gov (United States)

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  10. muBLASTP: database-indexed protein sequence search on multicore CPUs.

    Science.gov (United States)

    Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

    2016-11-04

    The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

  11. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    Science.gov (United States)

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.

  12. Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

    Science.gov (United States)

    Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

    2014-01-01

    With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.

  13. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  14. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    Science.gov (United States)

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  15. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  16. Estimating the annotation error rate of curated GO database sequence annotations

    Directory of Open Access Journals (Sweden)

    Brown Alfred L

    2007-05-01

    Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

  17. Thermochemistry in BWR. An overview of applications of program codes and databases

    International Nuclear Information System (INIS)

    Hermansson, H-P.; Becker, R.

    2010-01-01

    The Swedish work on thermodynamics of metal-water systems relevant to BWR conditions has been ongoing since the 70ies, and at present time a compilation and adaptation of codes and thermodynamic databases are in progress. In the previous work, basic thermodynamic data were compiled for parts of the system Fe-Cr-Ni-Co-Zn-S-H 2 O at 25-300 °C. Since some thermodynamic information necessary for temperature extrapolations of data up to 300 °C was not published in the earlier works, these data have now been partially recalculated. This applies especially to the parameters of the HKF-model, which are used to extrapolate the thermodynamic data for ionic and neutral aqua species from 25 °C to BWR temperatures. Using the completed data, e.g. the change in standard Gibbs energy (ΔG 0 ) and the equilibrium constant (log K) can be calculated for further applications at BWR/LWR conditions. In addition a computer program is currently being developed at Studsvik for the calculation of equilibrium conductivity in high temperature water. The program is intended for PWR applications, but can also be applied to BWR environment. Data as described above will be added to the database of this program. It will be relatively easy to further develop the program e.g. to calculate Pourbaix diagrams, and these graphs could then be calculated at any temperature. This means that there will be no limitation to the temperatures and total concentrations (usually 10 -6 to 10 -8 mol/kg) as reported in earlier work. It is also easy to add a function generating ΔG 0 and log K values at selected temperatures. One of the fundamentals for this work was also to overview and collect publicly available thermodynamic program codes and databases of relevance for BWR conditions found in open sources. The focus has been on finding already done compilations and reviews, and some 40 codes and 15 databases were found. Codes and data-bases are often integrated and such a package is often developed for

  18. mESAdb: microRNA expression and sequence analysis database.

    Science.gov (United States)

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  19. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    Science.gov (United States)

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  20. PATACSDB—the database of polyA translational attenuators in coding sequences

    Directory of Open Access Journals (Sweden)

    Malgorzata Habich

    2016-02-01

    Full Text Available Recent additions to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA tracks encoding for poly-lysine runs in protein sequences. Such tracks stall the translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such, they substantially influence the stability of mRNA and the amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance; this makes each of these sequences a potentially interesting case study for the effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present PATACSDB, a resource that contain a comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on the Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A’s allowing for one mismatched base. The PATACSDB database is accessible at: http://sysbio.ibb.waw.pl/patacsdb. The source code is available at http://github.com/habich/PATACSDB, and it includes the scripts with which the database can be recreated.

  1. CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

    Directory of Open Access Journals (Sweden)

    Maskell Douglas L

    2009-05-01

    Full Text Available Abstract Background The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Findings Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. Conclusion CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  2. Identifying Social Impacts in Product Supply Chains:Overview and Application of the Social Hotspot Database

    Directory of Open Access Journals (Sweden)

    Gregory Norris

    2012-08-01

    Full Text Available One emerging tool to measure the social-related impacts in supply chains is Social Life Cycle Assessment (S-LCA, a derivative of the well-established environmental LCA technique. LCA has recently started to gain popularity among large corporations and initiatives, such as The Sustainability Consortium or the Sustainable Apparel Coalition. Both have made the technique a cornerstone of their applied-research program. The Social Hotspots Database (SHDB is an overarching, global database that eases the data collection burden in S-LCA studies. Proposed “hotspots” are production activities or unit processes (also defined as country-specific sectors in the supply chain that may be at risk for social issues to be present. The SHDB enables efficient application of S-LCA by allowing users to prioritize production activities for which site-specific data collection is most desirable. Data for three criteria are used to inform prioritization: (1 labor intensity in worker hours per unit process and (2 risk for, or opportunity to affect, relevant social themes or sub-categories related to Human Rights, Labor Rights and Decent Work, Governance and Access to Community Services (3 gravity of a social issue. The Worker Hours Model was developed using a global input/output economic model and wage rate data. Nearly 200 reputable sources of statistical data have been used to develop 20 Social Theme Tables by country and sector. This paper presents an overview of the SHDB development and features, as well as results from a pilot study conducted on strawberry yogurt. This study, one of seven Social Scoping Assessments mandated by The Sustainability Consortium, identifies the potential social hotspots existing in the supply chain of strawberry yogurt. With this knowledge, companies that manufacture or sell yogurt can refine their data collection efforts in order to put their social responsibility performance in perspective and effectively set up programs and

  3. PrionHome: a database of prions and other sequences relevant to prion phenomena.

    Directory of Open Access Journals (Sweden)

    Djamel Harbi

    Full Text Available Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions, prionoids (i.e., proteins that propagate like prions between individual cells, and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant, and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments. We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic

  4. PrionHome: a database of prions and other sequences relevant to prion phenomena.

    Science.gov (United States)

    Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M A; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M

    2012-01-01

    Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing

  5. Protein backbone angle restraints from searching a database for chemical shift and sequence homology

    Energy Technology Data Exchange (ETDEWEB)

    Cornilescu, Gabriel; Delaglio, Frank; Bax, Ad [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)

    1999-03-15

    Chemical shifts of backbone atoms in proteins are exquisitely sensitive to local conformation, and homologous proteins show quite similar patterns of secondary chemical shifts. The inverse of this relation is used to search a database for triplets of adjacent residues with secondary chemical shifts and sequence similarity which provide the best match to the query triplet of interest. The database contains 13C{alpha}, 13C{beta}, 13C', 1H{alpha} and 15N chemical shifts for 20 proteins for which a high resolution X-ray structure is available. The computer program TALOS was developed to search this database for strings of residues with chemical shift and residue type homology. The relative importance of the weighting factors attached to the secondary chemical shifts of the five types of resonances relative to that of sequence similarity was optimized empirically. TALOS yields the 10 triplets which have the closest similarity in secondary chemical shift and amino acid sequence to those of the query sequence. If the central residues in these 10 triplets exhibit similar {phi} and {psi} backbone angles, their averages can reliably be used as angular restraints for the protein whose structure is being studied. Tests carried out for proteins of known structure indicate that the root-mean-square difference (rmsd) between the output of TALOS and the X-ray derived backbone angles is about 15 deg. Approximately 3% of the predictions made by TALOS are found to be in error.

  6. GarlicESTdb: an online database and mining tool for garlic EST sequences

    Directory of Open Access Journals (Sweden)

    Choi Sang-Haeng

    2009-05-01

    Full Text Available Abstract Background Allium sativum., commonly known as garlic, is a species in the onion genus (Allium, which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. Description GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition software technology (JSP/EJB/JavaServlet for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation

  7. GarlicESTdb: an online database and mining tool for garlic EST sequences.

    Science.gov (United States)

    Kim, Dae-Won; Jung, Tae-Sung; Nam, Seong-Hyeuk; Kwon, Hyuk-Ryul; Kim, Aeri; Chae, Sung-Hwa; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Park, Hong-Seog

    2009-05-18

    Allium sativum., commonly known as garlic, is a species in the onion genus (Allium), which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST) of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum) EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition) software technology (JSP/EJB/JavaServlet) for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP) and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation information for others to view. The Garlic

  8. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  9. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    Directory of Open Access Journals (Sweden)

    Dumontier Michel

    2002-10-01

    Full Text Available Abstract Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

  10. Identification of Alternative Splice Variants Using Unique Tryptic Peptide Sequences for Database Searches.

    Science.gov (United States)

    Tran, Trung T; Bollineni, Ravi C; Strozynski, Margarita; Koehler, Christian J; Thiede, Bernd

    2017-07-07

    Alternative splicing is a mechanism in eukaryotes by which different forms of mRNAs are generated from the same gene. Identification of alternative splice variants requires the identification of peptides specific for alternative splice forms. For this purpose, we generated a human database that contains only unique tryptic peptides specific for alternative splice forms from Swiss-Prot entries. Using this database allows an easy access to splice variant-specific peptide sequences that match to MS data. Furthermore, we combined this database without alternative splice variant-1-specific peptides with human Swiss-Prot. This combined database can be used as a general database for searching of LC-MS data. LC-MS data derived from in-solution digests of two different cell lines (LNCaP, HeLa) and phosphoproteomics studies were analyzed using these two databases. Several nonalternative splice variant-1-specific peptides were found in both cell lines, and some of them seemed to be cell-line-specific. Control and apoptotic phosphoproteomes from Jurkat T cells revealed several nonalternative splice variant-1-specific peptides, and some of them showed clear quantitative differences between the two states.

  11. A Public Database of Memory and Naive B-Cell Receptor Sequences.

    Directory of Open Access Journals (Sweden)

    William S DeWitt

    Full Text Available The vast diversity of B-cell receptors (BCR and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics.

  12. High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

    Directory of Open Access Journals (Sweden)

    Adrianto Wirawan

    2009-01-01

    Full Text Available The enormous growth of biological sequence databases has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of low cost parallel multicore accelerator technologies has made it possible to reduce execution times of many bioinformatics applications. In this paper, we demonstrate how the Cell Broadband Engine can be used as a computational platform to accelerate two approaches for protein sequence database scanning: exhaustive and heuristic. We present efficient parallelization techniques for two representative algorithms: the dynamic programming based Smith–Waterman algorithm and the popular BLASTP heuristic. Their implementation on a Playstation®3 leads to significant runtime savings compared to corresponding sequential implementations.

  13. A Public Database of Memory and Naive B-Cell Receptor Sequences.

    Science.gov (United States)

    DeWitt, William S; Lindau, Paul; Snyder, Thomas M; Sherwood, Anna M; Vignali, Marissa; Carlson, Christopher S; Greenberg, Philip D; Duerkopp, Natalie; Emerson, Ryan O; Robins, Harlan S

    2016-01-01

    The vast diversity of B-cell receptors (BCR) and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics.

  14. A precipitation database of station-based daily and monthly measurements for West Africa: Overview, quality control and harmonization

    Science.gov (United States)

    Bliefernicht, Jan; Waongo, Moussa; Annor, Thompson; Laux, Patrick; Lorenz, Manuel; Salack, Seyni; Kunstmann, Harald

    2017-04-01

    West Africa is a data sparse region. High quality and long-term precipitation data are often not readily available for applications in hydrology, agriculture, meteorology and other needs. To close this gap, we use multiple data sources to develop a precipitation database with long-term daily and monthly time series. This database was compiled from 16 archives including global databases e.g. from the Global Historical Climatology Network (GHCN), databases from research projects (e.g. the AMMA database) and databases of the national meteorological services of some West African countries. The collection consists of more than 2000 precipitation gauges with measurements dating from 1850 to 2015. Due to erroneous measurements (e.g. temporal offsets, unit conversion errors), missing values and inconsistent meta-data, the merging of this precipitation dataset is not straightforward and requires a thorough quality control and harmonization. To this end, we developed geostatistical-based algorithms for quality control of individual databases and harmonization to a joint database. The algorithms are based on a pairwise comparison of the correspondence of precipitation time series in dependence to the distance between stations. They were tested for precipitation time series from gages located in a rectangular domain covering Burkina Faso, Ghana, Benin and Togo. This harmonized and quality controlled precipitation database was recently used for several applications such as the validation of a high resolution regional climate model and the bias correction of precipitation projections provided the Coordinated Regional Climate Downscaling Experiment (CORDEX). In this presentation, we will give an overview of the novel daily and monthly precipitation database and the algorithms used for quality control and harmonization. We will also highlight the quality of global and regional archives (e.g. GHCN, GSOD, AMMA database) in comparison to the precipitation databases provided by the

  15. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    Science.gov (United States)

    Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G; Parkhill, Julian; Rajandream, Marie-Adèle

    2008-12-01

    Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/

  16. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  17. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database

    Science.gov (United States)

    Carver, Tim; Berriman, Matthew; Tivey, Adrian; Patel, Chinmay; Böhme, Ulrike; Barrell, Barclay G.; Parkhill, Julian; Rajandream, Marie-Adèle

    2008-01-01

    Motivation: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. Results: Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. Availability: Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/ Contact: artemis@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18845581

  18. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  19. Databases

    Digital Repository Service at National Institute of Oceanography (India)

    Kunte, P.D.

    Information on bibliographic as well as numeric/textual databases relevant to coastal geomorphology has been included in a tabular form. Databases cover a broad spectrum of related subjects like coastal environment and population aspects, coastline...

  20. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    Science.gov (United States)

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  1. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    Science.gov (United States)

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-06-23

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass

  2. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    Science.gov (United States)

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Databases

    Directory of Open Access Journals (Sweden)

    Nick Ryan

    2004-01-01

    Full Text Available Databases are deeply embedded in archaeology, underpinning and supporting many aspects of the subject. However, as well as providing a means for storing, retrieving and modifying data, databases themselves must be a result of a detailed analysis and design process. This article looks at this process, and shows how the characteristics of data models affect the process of database design and implementation. The impact of the Internet on the development of databases is examined, and the article concludes with a discussion of a range of issues associated with the recording and management of archaeological data.

  4. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

    Directory of Open Access Journals (Sweden)

    Rognes Torbjørn

    2011-06-01

    Full Text Available Abstract Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  5. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    Science.gov (United States)

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

    Science.gov (United States)

    Rognes, Torbjørn

    2011-06-01

    The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  7. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  8. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis

    Directory of Open Access Journals (Sweden)

    Hsiao Yu-Yun

    2011-01-01

    Full Text Available Abstract Background Phalaenopsis orchids are popular floral crops, and development of new cultivars is economically important to floricultural industries worldwide. Analysis of orchid genes could facilitate orchid improvement. Bacterial artificial chromosome (BAC end sequences (BESs can provide the first glimpses into the sequence composition of a novel genome and can yield molecular markers for use in genetic mapping and breeding. Results We used two BAC libraries (constructed using the BamHI and HindIII restriction enzymes of Phalaenopsis equestris to generate pair-end sequences from 2,920 BAC clones (71.4% and 28.6% from the BamHI and HindIII libraries, respectively, at a success rate of 95.7%. A total of 5,535 BESs were generated, representing 4.5 Mb, or about 0.3% of the Phalaenopsis genome. The trimmed sequences ranged from 123 to 1,397 base pairs (bp in size, with an average edited read length of 821 bp. When these BESs were subjected to sequence homology searches, it was found that 641 (11.6% were predicted to represent protein-encoding regions, whereas 1,272 (23.0% contained repetitive DNA. Most of the repetitive DNA sequences were gypsy- and copia-like retrotransposons (41.9% and 12.8%, respectively, whereas only 10.8% were DNA transposons. Further, 950 potential simple sequence repeats (SSRs were discovered. Dinucleotides were the most abundant repeat motifs; AT/TA dimer repeats were the most frequent SSRs, representing 253 (26.6% of all identified SSRs. Microsynteny analysis revealed that more BESs mapped to the whole-genome sequences of poplar than to those of grape or Arabidopsis, and even fewer mapped to the rice genome. This work will facilitate analysis of the Phalaenopsis genome, and will help clarify similarities and differences in genome composition between orchids and other plant species. Conclusion Using BES analysis, we obtained an overview of the Phalaenopsis genome in terms of gene abundance, the presence of repetitive

  9. Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

    International Nuclear Information System (INIS)

    Shen Yang; Bax, Ad

    2007-01-01

    Chemical shifts of nuclei in or attached to a protein backbone are exquisitely sensitive to their local environment. A computer program, SPARTA, is described that uses this correlation with local structure to predict protein backbone chemical shifts, given an input three-dimensional structure, by searching a newly generated database for triplets of adjacent residues that provide the best match in φ/ψ/χ 1 torsion angles and sequence similarity to the query triplet of interest. The database contains 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C' chemical shifts for 200 proteins for which a high resolution X-ray (≤2.4 A) structure is available. The relative importance of the weighting factors for the φ/ψ/χ 1 angles and sequence similarity was optimized empirically. The weighted, average secondary shifts of the central residues in the 20 best-matching triplets, after inclusion of nearest neighbor, ring current, and hydrogen bonding effects, are used to predict chemical shifts for the protein of known structure. Validation shows good agreement between the SPARTA-predicted and experimental shifts, with standard deviations of 2.52, 0.51, 0.27, 0.98, 1.07 and 1.08 ppm for 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C', respectively, including outliers

  10. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database

    NARCIS (Netherlands)

    Ritari, Jarmo; Salojärvi, Jarkko; Lahti, Leo; Vos, de Willem M.

    2015-01-01

    Background: Current sequencing technology enables taxonomic profiling of microbial ecosystems at high resolution and depth by using the 16S rRNA gene as a phylogenetic marker. Taxonomic assignation of newly acquired data is based on sequence comparisons with comprehensive reference databases to

  11. A two-locus DNA sequence database for typing plant and human pathogens within the Fusarium oxysporum species complex

    DEFF Research Database (Denmark)

    O'Donnell, Kerry; Gueidan, C; Sink, S

    2009-01-01

    We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species compl...... of the IGS rDNA sequences may be non-orthologous. We also evaluated enniatin, fumonisin and moniliformin mycotoxin production in vitro within a phylogenetic framework....

  12. Overview of Historical Earthquake Document Database in Japan and Future Development

    Science.gov (United States)

    Nishiyama, A.; Satake, K.

    2014-12-01

    In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami

  13. The SDH mutation database: an online resource for succinate dehydrogenase sequence variants involved in pheochromocytoma, paraganglioma and mitochondrial complex II deficiency

    Directory of Open Access Journals (Sweden)

    Devilee Peter

    2005-11-01

    Full Text Available Abstract Background The SDHA, SDHB, SDHC and SDHD genes encode the subunits of succinate dehydrogenase (succinate: ubiquinone oxidoreductase, a component of both the Krebs cycle and the mitochondrial respiratory chain. SDHA, a flavoprotein and SDHB, an iron-sulfur protein together constitute the catalytic domain, while SDHC and SDHD encode membrane anchors that allow the complex to participate in the respiratory chain as complex II. Germline mutations of SDHD and SDHB are a major cause of the hereditary forms of the tumors paraganglioma and pheochromocytoma. The largest subunit, SDHA, is mutated in patients with Leigh syndrome and late-onset optic atrophy, but has not as yet been identified as a factor in hereditary cancer. Description The SDH mutation database is based on the recently described Leiden Open (source Variation Database (LOVD system. The variants currently described in the database were extracted from the published literature and in some cases annotated to conform to current mutation nomenclature. Researchers can also directly submit new sequence variants online. Since the identification of SDHD, SDHC, and SDHB as classic tumor suppressor genes in 2000 and 2001, studies from research groups around the world have identified a total of 120 variants. Here we introduce all reported paraganglioma and pheochromocytoma related sequence variations in these genes, in addition to all reported mutations of SDHA. The database is now accessible online. Conclusion The SDH mutation database offers a valuable tool and resource for clinicians involved in the treatment of patients with paraganglioma-pheochromocytoma, clinical geneticists needing an overview of current knowledge, and geneticists and other researchers needing a solid foundation for further exploration of both these tumor syndromes and SDHA-related phenotypes.

  14. Next-generation sequencing can reveal in vitro-generated PCR crossover products: some artifactual sequences correspond to HLA alleles in the IMGT/HLA database.

    Science.gov (United States)

    Holcomb, C L; Rastrou, M; Williams, T C; Goodridge, D; Lazaro, A M; Tilanus, M; Erlich, H A

    2014-01-01

    The high-resolution human leukocyte antigen (HLA) genotyping assay that we developed using 454 sequencing and Conexio software uses generic polymerase chain reaction (PCR) primers for DRB exon 2. Occasionally, we observed low abundance DRB amplicon sequences that resulted from in vitro PCR 'crossing over' between DRB1 and DRB3/4/5. These hybrid sequences, revealed by the clonal sequencing property of the 454 system, were generally observed at a read depth of 5%-10% of the true alleles. They usually contained at least one mismatch with the IMGT/HLA database, and consequently, were easily recognizable and did not cause a problem for HLA genotyping. Sometimes, however, these artifactual sequences matched a rare allele and the automatic genotype assignment was incorrect. These observations raised two issues: (1) could PCR conditions be modified to reduce such artifacts? and (2) could some of the rare alleles listed in the IMGT/HLA database be artifacts rather than true alleles? Because PCR crossing over occurs during late cycles of PCR, we compared DRB genotypes resulting from 28 and (our standard) 35 cycles of PCR. For all 21 cell line DNAs amplified for 35 cycles, crossover products were detected. In 33% of the cases, these hybrid sequences corresponded to named alleles. With amplification for only 28 cycles, these artifactual sequences were not detectable. To investigate whether some rare alleles in the IMGT/HLA database might be due to PCR artifacts, we analyzed four samples obtained from the investigators who submitted the sequences. In three cases, the sequences were generated from true alleles. In one case, our 454 sequencing revealed an error in the previously submitted sequence. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. Overview of intelligent data retrieval methods for waveforms and images in massive fusion databases

    Energy Technology Data Exchange (ETDEWEB)

    Vega, J. [JET-EFDA, Culham Science Center, OX14 3DB Abingdon (United Kingdom); Asociacion EURATOM/CIEMAT para Fusion, Avda. Complutense 22, 28040 Madrid (Spain)], E-mail: jesus.vega@ciemat.es; Murari, A. [JET-EFDA, Culham Science Center, OX14 3DB Abingdon (United Kingdom); Consorzio RFX-Associazione EURATOM ENEA per la Fusione, I-35127 Padua (Italy); Pereira, A.; Portas, A.; Ratta, G.A.; Castro, R. [JET-EFDA, Culham Science Center, OX14 3DB Abingdon (United Kingdom); Asociacion EURATOM/CIEMAT para Fusion, Avda. Complutense 22, 28040 Madrid (Spain)

    2009-06-15

    JET database contains more than 42 Tbytes of data (waveforms and images) and it doubles its size about every 2 years. ITER database is expected to be orders of magnitude above this quantity. Therefore, data access in such huge databases can no longer be efficiently based on shot number or temporal interval. Taking into account that diagnostics generate reproducible signal patterns (structural shapes) for similar physical behaviour, high level data access systems can be developed. In these systems, the input parameter is a pattern and the outputs are the shot numbers and the temporal locations where similar patterns appear inside the database. These pattern oriented techniques can be used for first data screening of any type of morphological aspect of waveforms and images. The article shows a new technique to look for similar images in huge databases in a fast an efficient way. Also, previous techniques to search for similar waveforms and to retrieve time-series data or images containing any kind of patterns are reviewed.

  16. Reticulamoeba Is a Long-Branched Granofilosean (Cercozoa) That Is Missing from Sequence Databases

    Science.gov (United States)

    Bass, David; Yabuki, Akinori; Santini, Sébastien; Romac, Sarah; Berney, Cédric

    2012-01-01

    We sequenced the 18S ribosomal RNA gene of seven isolates of the enigmatic marine amoeboflagellate Reticulamoeba Grell, which resolved into four genetically distinct Reticulamoeba lineages, two of which correspond to R. gemmipara Grell and R. minor Grell, another with a relatively large cell body forming lacunae, and another that has similarities to both R. minor and R. gemmipara but with a greater propensity to form cell clusters. These lineages together form a long-branched clade that branches within the cercozoan class Granofilosea (phylum Cercozoa), showing phylogenetic affinities with the genus Mesofila. The basic morphology of Reticulamoeba is a roundish or ovoid cell with a more or less irregular outline. Long and branched reticulopodia radiate from the cell. The reticulopodia bear granules that are bidirectionally motile. There is also a biflagellate dispersal stage. Reticulamoeba is frequently observed in coastal marine environmental samples. PCR primers specific to the Reticulamoeba clade confirm that it is a frequent member of benthic marine microbial communities, and is also found in brackish water sediments and freshwater biofilm. However, so far it has not been found in large molecular datasets such as the nucleotide database in NCBI GenBank, metagenomic datasets in Camera, and the marine microbial eukaryote sampling and sequencing consortium BioMarKs, although closely related lineages can be found in some of these datasets using a highly targeted approach. Therefore, although such datasets are very powerful tools in microbial ecology, they may, for several methodological reasons, fail to detect ecologically and evolutionary key lineages. PMID:23226495

  17. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database.

    Science.gov (United States)

    Billoud, B; Kontic, M; Viari, A

    1996-01-01

    At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified. PMID:8628670

  18. Overview of soil phosphorus data from a large international soil database

    NARCIS (Netherlands)

    Batjes, N.H.

    2014-01-01

    An overiew of extractable soil phosphorus (P-Bray, P-Olsen, P-Mehlich and P-water) and P-retention data held in a large profile database is presented. The primary aim is to assess whether representative P-values, by broad soil group (FAO system), can be determined for each of these analytical

  19. Determining Clostridium difficile intra-taxa diversity by mining multilocus sequence typing databases.

    Science.gov (United States)

    Muñoz, Marina; Ríos-Chaparro, Dora Inés; Patarroyo, Manuel Alfonso; Ramírez, Juan David

    2017-03-14

    Multilocus sequence typing (MLST) is a highly discriminatory typing strategy; it is reproducible and scalable. There is a MLST scheme for Clostridium difficile (CD), a gram positive bacillus causing different pathologies of the gastrointestinal tract. This work was aimed at describing the frequency of sequence types (STs) and Clades (C) reported and evalute the intra-taxa diversity in the CD MLST database (CD-MLST-db) using an MLSA approach. Analysis of 1778 available isolates showed that clade 1 (C1) was the most frequent worldwide (57.7%), followed by C2 (29.1%). Regarding sequence types (STs), it was found that ST-1, belonging to C2, was the most frequent. The isolates analysed came from 17 countries, mostly from the United Kingdom (UK) (1541 STs, 87.0%). The diversity of the seven housekeeping genes in the MLST scheme was evaluated, and alleles from the profiles (STs), for identifying CD population structure. It was found that adk and atpA are conserved genes allowing a limited amount of clusters to be discriminated; however, different genes such as drx, glyA and particularly sodA showed high diversity indexes and grouped CD populations in many clusters, suggesting that these genes' contribution to CD typing should be revised. It was identified that CD STs reported to date have a mostly clonal population structure with foreseen events of recombination; however, one group of STs was not assigned to a clade being highly different containing at least nine well-supported clusters, suggesting a greater amount of clades for CD. This study shows the usefulness of CD-MLST-db as a tool for studying CD distribution and population structure, identifying the need for reviewing the usefulness of sodA as housekeeping gene within the MLST scheme and suggesting the existence of a greater amount of CD clades. The study also shows the plausible exchange of genetic material between STs, contributing towards intra-taxa genetic diversity.

  20. HIVBrainSeqDB: a database of annotated HIV envelope sequences from brain and other anatomical sites

    Directory of Open Access Journals (Sweden)

    O'Connor Niall

    2010-12-01

    Full Text Available Abstract Background The population of HIV replicating within a host consists of independently evolving and interacting sub-populations that can be genetically distinct within anatomical compartments. HIV replicating within the brain causes neurocognitive disorders in up to 20-30% of infected individuals and is a viral sanctuary site for the development of drug resistance. The primary determinant of HIV neurotropism is macrophage tropism, which is primarily determined by the viral envelope (env gene. However, studies of genetic aspects of HIV replicating in the brain are hindered because existing repositories of HIV sequences are not focused on neurotropic virus nor annotated with neurocognitive and neuropathological status. To address this need, we constructed the HIV Brain Sequence Database. Results The HIV Brain Sequence Database is a public database of HIV envelope sequences, directly sequenced from brain and other tissues from the same patients. Sequences are annotated with clinical data including viral load, CD4 count, antiretroviral status, neurocognitive impairment, and neuropathological diagnosis, all curated from the original publication. Tissue source is coded using an anatomical ontology, the Foundational Model of Anatomy, to capture the maximum level of detail available, while maintaining ontological relationships between tissues and their subparts. 44 tissue types are represented within the database, grouped into 4 categories: (i brain, brainstem, and spinal cord; (ii meninges, choroid plexus, and CSF; (iii blood and lymphoid; and (iv other (bone marrow, colon, lung, liver, etc. Patient coding is correlated across studies, allowing sequences from the same patient to be grouped to increase statistical power. Using Cytoscape, we visualized relationships between studies, patients and sequences, illustrating interconnections between studies and the varying depth of sequencing, patient number, and tissue representation across studies

  1. Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies.

    NARCIS (Netherlands)

    G.P. Patrinos (George); B. Giardine (Belinda); C. Riemer (Cathy); W. Miller (Webb); D.H. Chui (David); N.P. Anagnou (Nicholas); H. Wajcman (Henri); R.C. Hardison (Ross)

    2004-01-01

    textabstractHbVar (http://globin.cse.psu.edu/globin/hbvar/) is a relational database developed by a multi-center academic effort to provide up-to-date and high quality information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and

  2. Supporting parents and parenting: An overview of data-based papers recently published in Contemporary Nurse.

    Science.gov (United States)

    Jackson, Debra; Power, Tamara; Dean, Sue; Potgieter, Ingrid; Cleary, Michelle

    2013-10-02

    Abstract Nurses have a crucial role in play in supporting parents and in delivering and referring parents to family-support services. In this editorial, we reflect on papers recently published in Contemporary Nurse. We sought to consider data-based papers on parenting published between 2008 and 2012 and elucidate the role/s and potential roles of nurses in enhancing and supporting parenting. Parenting is recognised as a crucial variable for achieving positive outcomes for children (Dawson et al 2012). Poor, inconsistent or abusive parenting is linked to poor outcomes (Griffin et al. 2000, Holt et al.2008, Patterson et al.1989), while consistent and effective parenting is associated with enhanced child outcomes (Lamb 2012, Landry et al.2001). In addition to being important to outcomes for children, perceived parenting quality is also important to parents themselves. Disrupted relationships between parents and their children have been identified as distressing and potentially damaging to both parties (Jackson 2000; East 2006, 2007; Power 2012).

  3. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  4. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    Science.gov (United States)

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. FishPathogens.eu/vhsv: a user-friendly viral haemorrhagic septicaemia virus isolate and sequence database

    DEFF Research Database (Denmark)

    Jonstrup, Søren Peter; Gray, Tanya; Kahns, Søren

    2009-01-01

    A database has been created, http://www.Fish Pathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part of the Eu......A database has been created, http://www.Fish Pathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part...... of the European Community Reference Laboratory for Fish Diseases function. This concept has been initially developed for viral haemorrhagic septicaemia virus and will be extended in future to include information on other significant aquaculture pathogens. Information included for each isolate comprises sequence...... to obtain data from any selected part of the genome of interest. The output of the sequence search can be readily retrieved as a FASTA file ready to be imported into a sequence alignment tool of choice, facilitating further molecular epidemiological study....

  6. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Ussery, David

    2004-01-01

    , these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web...... and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently...... content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues....

  7. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  8. Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.

    Science.gov (United States)

    Mi, Tian; Merlin, Jerlin Camilus; Deverasetty, Sandeep; Gryk, Michael R; Bill, Travis J; Brooks, Andrew W; Lee, Logan Y; Rathnayake, Viraj; Ross, Christian A; Sargeant, David P; Strong, Christy L; Watts, Paula; Rajasekaran, Sanguthevar; Schiller, Martin R

    2012-01-01

    Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.

  9. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    Science.gov (United States)

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  10. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  11. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Science.gov (United States)

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a

  12. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism spe...cies Data detail Data name Cluster based on sequence comparison of homologous proteins of 95 organism specie...istory of This Database Site Policy | Contact Us Cluster based on sequence compariso

  13. Overview of BWR Severe Accident Sequence Analyses at Oak Ridge National Laboratory

    International Nuclear Information System (INIS)

    Hodge, S.A.

    1983-01-01

    Since its inception in October 1980, the Severe Accident Sequence Analysis (SASA) program at Oak Ridge National Laboratory (ORNL) has completed four studies including Station Blackout, Scram Discharge Volume Break, Loss of Decay Heat Removal, and Loss of Injection accident sequences for the Browns Ferry Nuclear Plant. The accident analyses incorporated in a SASA study provide much greater detail than that practically achievable in a Probabilistic Risk Assessment (PRA). When applied to the candidate dominant accident sequences identified by a PRA, the detailed SASA results determine if factors neglected by the PRA would have a significant effect on the order of dominant sequences. Ongoing SASA work at ORNL involves the analysis of Anticipated Transients Without Scram (ATWS) sequences for Browns Ferry

  14. MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization

    Directory of Open Access Journals (Sweden)

    Kuczmarski Thomas A

    2006-10-01

    Full Text Available Abstract Background MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. Description MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. Conclusion MannDB comprises a large number of genomes and comprehensive protein

  15. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    Science.gov (United States)

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  16. TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.

    Science.gov (United States)

    Upadhyay, Pawan; Gardi, Nilesh; Desai, Sanket; Sahoo, Bikram; Singh, Ankita; Togar, Trupti; Iyer, Prajish; Prasad, Ratnam; Chandrani, Pratik; Gupta, Sudeep; Dutt, Amit

    2016-01-01

    Cancer is predominantly a somatic disease. A mutant allele present in a cancer cell genome is considered somatic when it's absent in the paired normal genome along with public SNP databases. The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations. We present the T: ata M: emorial C: entre-SNP D: ata B: ase (TMC-SNPdb), as the first open source, flexible, upgradable, and freely available SNP database (accessible through dbSNP build 149 and ANNOVAR)-representing 114 309 unique germline variants-generated from whole exome data of 62 normal samples derived from cancer patients of Indian origin. The TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface with the ability to deplete additional Indian population specific SNPs over and above dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. In addition to dbSNP build 149 and ANNOVAR, the TMC-SNPdb along with the subtraction tool is available for download in the public domain at the following:Database URL: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html. © The Author(s) 2016. Published by Oxford University Press.

  17. Polymorphisms and resistance mutations of hepatitis C virus on sequences in the European hepatitis C virus database

    Science.gov (United States)

    Kliemann, Dimas Alexandre; Tovo, Cristiane Valle; da Veiga, Ana Beatriz Gorini; de Mattos, Angelo Alves; Wood, Charles

    2016-01-01

    AIM To evaluate the occurrence of resistant mutations in treatment-naïve hepatitis C virus (HCV) sequences deposited in the European hepatitis C virus database (euHCVdb). METHODS The sequences were downloaded from the euHCVdb (https://euhcvdb.ibcp.fr/euHCVdb/). The search was performed for full-length NS3 protease, NS5A and NS5B polymerase sequences of HCV, separated by genotypes 1a, 1b, 2a, 2b and 3a, and resulted in 798 NS3, 708 NS5A and 535 NS5B sequences from HCV genotypes 1a, 1b, 2a, 2b and 3a, after the exclusion of sequences containing errors and/or gaps or incomplete sequences, and sequences from patients previously treated with direct antiviral agents (DAA). The sequence alignment was performed with MEGA 6.06 MAC and the resulting protein sequences were then analyzed using the BioEdit 7.2.5. for mutations associated with resistance. Only positions that have been described as being associated with failure in treatment in in vivo studies, and/or as conferring a more than 2-fold change in replication in comparison to the wildtype reference strain in in vitro phenotypic assays were included in the analysis. RESULTS The Q80K variant in the NS3 gene was the most prevalent mutation, being found in 44.66% of subtype 1a and 0.25% of subtype 1b. Other frequent mutations observed in more than 2% of the NS3 sequences were: I170V (3.21%) in genotype 1a, and Y56F (15.93%), V132I (23.28%) and I170V (65.20%) in genotype 1b. For the NS5A, 2.21% of the genotype 1a sequences have the P58S mutation, 5.95% of genotype 1b sequences have the R30Q mutation, 15.79% of subtypes 2a sequences have the Q30R mutation, 23.08% of subtype 2b sequences have a L31M mutation, and in subtype 3a sequences, 23.08% have the M31L resistant variants. For the NS5B, the V321L RAV was identified in 0.60% of genotype 1a and in 0.32% of genotype 1b sequences, and the N142T variant was observed in 0.32% of subtype 1b sequences. The C316Y, S556G, D559N RAV were identified in 0.33%, 7.82% and 0.32% of

  18. Polymorphisms and resistance mutations of hepatitis C virus on sequences in the European hepatitis C virus database.

    Science.gov (United States)

    Kliemann, Dimas Alexandre; Tovo, Cristiane Valle; da Veiga, Ana Beatriz Gorini; de Mattos, Angelo Alves; Wood, Charles

    2016-10-28

    To evaluate the occurrence of resistant mutations in treatment-naïve hepatitis C virus (HCV) sequences deposited in the European hepatitis C virus database (euHCVdb). The sequences were downloaded from the euHCVdb (https://euhcvdb.ibcp.fr/euHCVdb/). The search was performed for full-length NS3 protease, NS5A and NS5B polymerase sequences of HCV, separated by genotypes 1a, 1b, 2a, 2b and 3a, and resulted in 798 NS3, 708 NS5A and 535 NS5B sequences from HCV genotypes 1a, 1b, 2a, 2b and 3a, after the exclusion of sequences containing errors and/or gaps or incomplete sequences, and sequences from patients previously treated with direct antiviral agents (DAA). The sequence alignment was performed with MEGA 6.06 MAC and the resulting protein sequences were then analyzed using the BioEdit 7.2.5. for mutations associated with resistance. Only positions that have been described as being associated with failure in treatment in in vivo studies, and/or as conferring a more than 2-fold change in replication in comparison to the wildtype reference strain in in vitro phenotypic assays were included in the analysis. The Q80K variant in the NS3 gene was the most prevalent mutation, being found in 44.66% of subtype 1a and 0.25% of subtype 1b. Other frequent mutations observed in more than 2% of the NS3 sequences were: I170V (3.21%) in genotype 1a, and Y56F (15.93%), V132I (23.28%) and I170V (65.20%) in genotype 1b. For the NS5A, 2.21% of the genotype 1a sequences have the P58S mutation, 5.95% of genotype 1b sequences have the R30Q mutation, 15.79% of subtypes 2a sequences have the Q30R mutation, 23.08% of subtype 2b sequences have a L31M mutation, and in subtype 3a sequences, 23.08% have the M31L resistant variants. For the NS5B, the V321L RAV was identified in 0.60% of genotype 1a and in 0.32% of genotype 1b sequences, and the N142T variant was observed in 0.32% of subtype 1b sequences. The C316Y, S556G, D559N RAV were identified in 0.33%, 7.82% and 0.32% of genotype 1b sequences

  19. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview.

    Science.gov (United States)

    Karvelis, Tautvydas; Gasiunas, Giedrius; Siksnys, Virginijus

    2017-05-15

    Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...

  1. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  2. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999; FINAL

    International Nuclear Information System (INIS)

    Harger, Carol A.

    1999-01-01

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence.[The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  3. Performance of Correspondence Algorithms in Vision-Based Driver Assistance Using an Online Image Sequence Database

    DEFF Research Database (Denmark)

    Klette, Reinhard; Krüger, Norbert; Vaudrey, Tobi

    2011-01-01

    the classification of recorded video data into situations defined by a cooccurrence of some events in recorded traffic scenes. About 100-400 stereo frames (or 4-16 s of recording) are considered a basic sequence, which will be identified with one particular situation. Future testing is expected to be on data...

  4. Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Amino acid sequences of predicted proteins and their annotation for 95 organis...m species. Data detail Data name Amino acid sequences of predicted proteins and their annotation for 95 orga...nism species. DOI 10.18908/lsdba.nbdc00464-001 Description of data contents Amino acid sequences of predicted proteins...Database Description Download License Update History of This Database Site Policy | Contact Us Amino acid sequences of predicted prot...eins and their annotation for 95 organism species. - Gclust Server | LSDB Archive ...

  5. Sequence Classification - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ansmembrane helical proteins by applying statistical and machine learning methods to each amino acid sequenc.... Amino Acid Result of predicting β-barrel membrane protein with a statistical method using amino acid compo...sition. ( TMBETADISC-COMP ) Dipeptide Result of predicting β-barrel membrane protein with a statistic...ting β-barrel membrane protein with a statistical method using motifs. ( TMBETADISC-MOTIF ) SVM Result of pr

  6. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences

    OpenAIRE

    Lescot, Magali; Déhais, Patrice; Thijs, Gert; Marchal, Kathleen; Moreau, Yves; Van de Peer, Yves; Rouzé, Pierre; Rombauts, Stephane

    2002-01-01

    PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific t...

  7. Identification of Anhydrobiosis-related Genes from an Expressed Sequence Tag Database in the Cryptobiotic Midge Polypedilum vanderplanki (Diptera; Chironomidae)*

    Science.gov (United States)

    Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi

    2010-01-01

    Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722

  8. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  9. PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

    Directory of Open Access Journals (Sweden)

    Adeel Malik

    2010-01-01

    Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

  10. The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit.

    Science.gov (United States)

    Hussing, C; Bytyci, R; Huber, C; Morling, N; Børsting, C

    2018-05-24

    Some STR loci have internal sequence variations, which are not revealed by the standard STR typing methods used in forensic genetics (PCR and fragment length analysis by capillary electrophoresis (CE)). Typing of STRs with next-generation sequencing (NGS) uncovers the sequence variation in the repeat region and in the flanking regions. In this study, 363 Danish individuals were typed for 56 STRs (26 autosomal STRs, 24 Y-STRs, and 6 X-STRs) using the ForenSeq™ DNA Signature Prep Kit to establish a Danish STR sequence database. Increased allelic diversity was observed in 34 STRs by the PCR-NGS assay. The largest increases were found in DYS389II and D12S391, where the numbers of sequenced alleles were around four times larger than the numbers of alleles determined by repeat length alone. Thirteen SNPs and one InDel were identified in the flanking regions of 12 STRs. Furthermore, 36 single positions and five longer stretches in the STR flanking regions were found to have dubious genotyping quality. The combined match probability of the 26 autosomal STRs was 10,000 times larger using the PCR-NGS assay than by using PCR-CE. The typical paternity indices for trios and duos were 500 and 100 times larger, respectively, than those obtained with PCR-CE. The assay also amplified 94 SNPs selected for human identification. Eleven of these loci were not in Hardy-Weinberg equilibrium in the Danish population, most likely because the minimum threshold for allele calling (30 reads) in the ForenSeq™ Universal Analysis Software was too low and frequent allele dropouts were not detected.

  11. Relational databases

    CERN Document Server

    Bell, D A

    1986-01-01

    Relational Databases explores the major advances in relational databases and provides a balanced analysis of the state of the art in relational databases. Topics covered include capture and analysis of data placement requirements; distributed relational database systems; data dependency manipulation in database schemata; and relational database support for computer graphics and computer aided design. This book is divided into three sections and begins with an overview of the theory and practice of distributed systems, using the example of INGRES from Relational Technology as illustration. The

  12. Translational database selection and multiplexed sequence capture for up front filtering of reliable breast cancer biomarker candidates.

    Directory of Open Access Journals (Sweden)

    Patrik L Ståhl

    Full Text Available Biomarker identification is of utmost importance for the development of novel diagnostics and therapeutics. Here we make use of a translational database selection strategy, utilizing data from the Human Protein Atlas (HPA on differentially expressed protein patterns in healthy and breast cancer tissues as a means to filter out potential biomarkers for underlying genetic causatives of the disease. DNA was isolated from ten breast cancer biopsies, and the protein coding and flanking non-coding genomic regions corresponding to the selected proteins were extracted in a multiplexed format from the samples using a single DNA sequence capture array. Deep sequencing revealed an even enrichment of the multiplexed samples and a great variation of genetic alterations in the tumors of the sampled individuals. Benefiting from the upstream filtering method, the final set of biomarker candidates could be completely verified through bidirectional Sanger sequencing, revealing a 40 percent false positive rate despite high read coverage. Of the variants encountered in translated regions, nine novel non-synonymous variations were identified and verified, two of which were present in more than one of the ten tumor samples.

  13. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    Science.gov (United States)

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  14. Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing

    Science.gov (United States)

    García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.

    2016-01-01

    Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451

  15. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    Science.gov (United States)

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  16. Characterization and compilation of polymorphic simple sequence repeat (SSR markers of peanut from public database

    Directory of Open Access Journals (Sweden)

    Zhao Yongli

    2012-07-01

    Full Text Available Abstract Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L. genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5% within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5% was the most abundant followed by AAG (12.1%, AAT (10.9%, and AT (10.3%.The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders.

  17. The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences

    Science.gov (United States)

    Portales-Casamar, Elodie; Arenillas, David; Lim, Jonathan; Swanson, Magdalena I.; Jiang, Steven; McCallum, Anthony; Kirov, Stefan; Wasserman, Wyeth W.

    2009-01-01

    The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein–DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data ‘boutiques’ within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk. PMID:18971253

  18. Functional role of bacteriophage transfer RNAs: codon usage analysis of genomic sequences stored in the GENBANK/EMBL/DDBJ databases

    Directory of Open Access Journals (Sweden)

    T Kunisawa

    2006-01-01

    Full Text Available Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The data-analysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.

  19. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life.

    Science.gov (United States)

    Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S

    2018-01-01

    Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA

  20. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    Directory of Open Access Journals (Sweden)

    Arun Prabhu Dhanapal

    2015-01-01

    Full Text Available The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away.

  1. MECP2 variation in Rett syndrome-An overview of current coverage of genetic and phenotype data within existing databases.

    Science.gov (United States)

    Townend, Gillian S; Ehrhart, Friederike; van Kranen, Henk J; Wilkinson, Mark; Jacobsen, Annika; Roos, Marco; Willighagen, Egon L; van Enckevort, David; Evelo, Chris T; Curfs, Leopold M G

    2018-04-27

    Rett syndrome (RTT) is a monogenic rare disorder that causes severe neurological problems. In most cases, it results from a loss-of-function mutation in the gene encoding methyl-CPG-binding protein 2 (MECP2). Currently, about 900 unique MECP2 variations (benign and pathogenic) have been identified and it is suspected that the different mutations contribute to different levels of disease severity. For researchers and clinicians, it is important that genotype-phenotype information is available to identify disease-causing mutations for diagnosis, to aid in clinical management of the disorder, and to provide counseling for parents. In this study, 13 genotype-phenotype databases were surveyed for their general functionality and availability of RTT-specific MECP2 variation data. For each database, we investigated findability and interoperability alongside practical user functionality, and type and amount of genetic and phenotype data. The main conclusions are that, as well as being challenging to find these databases and specific MECP2 variants held within, interoperability is as yet poorly developed and requires effort to search across databases. Nevertheless, we found several thousand online database entries for MECP2 variations and their associated phenotypes, diagnosis, or predicted variant effects, which is a good starting point for researchers and clinicians who want to provide, annotate, and use the data. © 2018 The Authors. Human Mutation published by Wiley Periodicals, Inc.

  2. MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.; Colby, Sean M.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.

    2017-02-21

    MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).

  3. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    Science.gov (United States)

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  4. MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.

    Science.gov (United States)

    Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba

    2011-04-15

    Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.

  5. Current understanding of the sequence of events. Overview of current understanding of accident progression at Fukushima Dai-ichi

    International Nuclear Information System (INIS)

    Gulliford, Jim

    2013-01-01

    An overview of the main sequence of events, particularly the evolution of the cores in Units 1-3 was given. The presentation is based on information provided by Dr Okajima of JAEA to the June 2012 Nuclear Science Committee meeting. During the accident, conditions at the plant were such that operators were initially unable to obtain instruments readouts from the control panel and hence could not know what condition the reactors were in. (Reactor Power, Pressure, Temperature, Water height and flow rate, etc.). Subsequently, as electrical power supplies were gradually restored more data became available. In addition to the reactor data, other information from off-site measurements and from measuring stations inside the site boundary is now available, particularly for radiation dose rates in air. These types of information, combined with detailed knowledge of the plant design and operations history up to the time of the accident are being used to construct detailed computer models which simulate the behaviour of the reactor core, pressure vessel and containment during the accident sequence. This combination of detailed design/operating data, limited measured data during the accident and computer modelling allows us to construct a fairly clear picture of the accident progression. The main sequence of events (common to Units 1, 2 and 3) is summarised. The OECD/NEA is currently coordinating an international benchmark study of the accident at Fukushima Daiichi known as the BSAF Project. The objectives of this activity are to analyse and evaluate the accident progression and improve severe accident (SA) analysis methods and models. The project provides valuable additional (and corrected) data from plant measurements as well as an improved understanding of the role played by the fuel and cladding design. Based on (limited) plant data and extensive modelling analysis, we have a detailed qualitative description of the Fukushima-Daiichi accident. Further analyses of the type

  6. The master two-dimensional gel database of human AMA cell proteins: towards linking protein and genome sequence and mapping information (update 1991)

    DEFF Research Database (Denmark)

    Celis, J E; Leffers, H; Rasmussen, H H

    1991-01-01

    autoantigens" and "cDNAs". For convenience we have included an alphabetical list of all known proteins recorded in this database. In the long run, the main goal of this database is to link protein and DNA sequencing and mapping information (Human Genome Program) and to provide an integrated picture......The master two-dimensional gel database of human AMA cells currently lists 3801 cellular and secreted proteins, of which 371 cellular polypeptides (306 IEF; 65 NEPHGE) were added to the master images during the last 10 months. These include: (i) very basic and acidic proteins that do not focus...

  7. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    Science.gov (United States)

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.

  8. Clinical features and lifestyle of patients with amyotrophic lateral sclerosis in Campania: brief overview of an Italian database

    Directory of Open Access Journals (Sweden)

    Francesca Trojsi

    2012-01-01

    Full Text Available BACKGROUND: Physical activity and occupational exposures appeared to play a relevant role in pathogenesis of amyotrophic lateral sclerosis (ALS, a neurodegenerative disease of unknown origin. MATERIALS AND METHODS: We aimed to make an overview of the clinical characteristics and lifestyle (occupation and sport of a population of 395 patients with ALS from Campania, in southern Italy. RESULTS: ALS onset resulted anticipated of about 11 years in industry workers, whilst the more frequent site of onset among farmers was upper limbs. Compared to non-athletes, athletes, particularly soccer players, showed a 7 years anticipation of ALS onset, with higher mortality after 5 years. DISCUSSION AND CONCLUSIONS: We suggest that subjects genetically prone to abnormal response to hypoxia during strenuous physical activity or exposed to neurotoxic agents, such as athletes, farmers or industry workers, might present increased risk to develop ALS. Future case-control and follow-up studies on our population should be implemented to deepen the present results.

  9. AcEST(EST sequences of Adiantum capillus-veneris and their annotation) - AcEST | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us AcEST AcEST(EST sequences of Adiantum capillus-veneris and their annotation) Data detail Dat...a name AcEST(EST sequences of Adiantum capillus-veneris and their annotation) DOI 10.18908/lsdba.nbdc00839-0...01 Description of data contents EST sequence of Adiantum capillus-veneris and its annotation (clone ID, libr...le search URL http://togodb.biosciencedbc.jp/togodb/view/archive_acest#en Data acquisition method Capillary ...ainst UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases) Number of data entries Adiantum capillus-veneris

  10. FishPathogens.eu/vhsv: A user-friendly Viral Haemorrhagic Septicaemia Virus (VHSV) isolate and sequence database

    DEFF Research Database (Denmark)

    Jonstrup, Søren Peter; Gray, Tanya; Kahns, Søren

    A database has been created, www.FishPathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part of the European...

  11. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    Science.gov (United States)

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  12. Federal databases

    International Nuclear Information System (INIS)

    Welch, M.J.; Welles, B.W.

    1988-01-01

    Accident statistics on all modes of transportation are available as risk assessment analytical tools through several federal agencies. This paper reports on the examination of the accident databases by personal contact with the federal staff responsible for administration of the database programs. This activity, sponsored by the Department of Energy through Sandia National Laboratories, is an overview of the national accident data on highway, rail, air, and marine shipping. For each mode, the definition or reporting requirements of an accident are determined and the method of entering the accident data into the database is established. Availability of the database to others, ease of access, costs, and who to contact were prime questions to each of the database program managers. Additionally, how the agency uses the accident data was of major interest

  13. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    Science.gov (United States)

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Computational tools and resources for metabolism-related property predictions. 1. Overview of publicly available (free and commercial) databases and software.

    Science.gov (United States)

    Peach, Megan L; Zakharov, Alexey V; Liu, Ruifeng; Pugliese, Angelo; Tawa, Gregory; Wallqvist, Anders; Nicklaus, Marc C

    2012-10-01

    Metabolism has been identified as a defining factor in drug development success or failure because of its impact on many aspects of drug pharmacology, including bioavailability, half-life and toxicity. In this article, we provide an outline and descriptions of the resources for metabolism-related property predictions that are currently either freely or commercially available to the public. These resources include databases with data on, and software for prediction of, several end points: metabolite formation, sites of metabolic transformation, binding to metabolizing enzymes and metabolic stability. We attempt to place each tool in historical context and describe, wherever possible, the data it was based on. For predictions of interactions with metabolizing enzymes, we show a typical set of results for a small test set of compounds. Our aim is to give a clear overview of the areas and aspects of metabolism prediction in which the currently available resources are useful and accurate, and the areas in which they are inadequate or missing entirely.

  15. A brief overview of the Chemistry-Aerosol Mediterranean Experiment (ChArMEx) database and campaign operation centre (ChOC)

    Science.gov (United States)

    Ferré, Hélène; Dulac, François; Belmahfoud, Nizar; Brissebrat, Guillaume; Cloché, Sophie; Descloitres, Jacques; Fleury, Laurence; Focsa, Loredana; Henriot, Nicolas; Ramage, Karim; Vermeulen, Anne

    2016-04-01

    Initiated in 2010 in the framework of the multidisciplinary research programme MISTRALS (Mediterranean Integrated Studies at Regional and Local Scales; http:www.mistrals-home.org), the Chemistry-Aerosol Mediterranean Experiment (ChArMEx, http://charmex.lsce.ipsl.fr/) aims at federating the scientific community for an updated assessment of the present and future state of the atmospheric environment in the Mediterranean Basin, and of its impacts on the regional climate, air quality, and marine biogeochemistry. The project combines mid- and long-term monitoring, intensive field campaigns, use of satellite data, and modelling studies. In this presentation we provide an overview of the campaign operation centre (http://choc.sedoo.fr/) and project database (http://mistrals.sedoo.fr/ChArMEx), at the end of the first experimental phase of the project that included a series of large campaigns based on airborne means (including balloons and various aircraft) and a network of surface stations. Those campaigns were performed mainly in the western Mediterranean basin in the summer of 2012, 2013 and 2014 with the help of the ChArMEx Operation Centre (ChOC), an open web site that has the objective to gather and display daily quick-looks from model forecasts and near-real time in situ and remote sensing observations of physical and chemical weather conditions relevant for the everyday campaign operation decisions. The ChOC is also useful for post campaign analyses and can be completed with a number of quick-looks of campaign results obtained later in order to offer an easy access to, and comprehensive view of all available data during the campaign period. The items included are selected according to the objectives and location of the given campaigns. The second experimental phase of ChArMEx from 2015 on is more focused on the eastern basin. In addition, the project operation centre is planned to be adapted for a joint MERMEX-ChArMEx oceanographic cruise (PEACETIME) for a study at

  16. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

    Science.gov (United States)

    Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

    2014-04-15

    Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

  17. Structural and sequence variants in patients with Silver-Russell syndrome or similar features-Curation of a disease database

    DEFF Research Database (Denmark)

    Tümer, Zeynep; López-Hernández, Julia Angélica; Netchine, Irène

    2018-01-01

    data of these patients. The clinical features are scored according to the Netchine-Harbison clinical scoring system (NH-CSS), which has recently been accepted as standard by consensus. The structural and sequence variations are reviewed and where necessary redescribed according to recent...

  18. A search for pre-main-sequence stars in high-latitude molecular clouds. 3: A survey of the Einstein database

    Science.gov (United States)

    Caillault, Jean-Pierre; Magnani, Loris; Fryer, Chris

    1995-01-01

    In order to discern whether the high-latitude molecular clouds are regions of ongoing star formation, we have used X-ray emission as a tracer of youthful stars. The entire Einstein database yields 18 images which overlap 10 of the clouds mapped partially or completely in the CO (1-0) transition, providing a total of approximately 6 deg squared of overlap. Five previously unidentified X-ray sources were detected: one has an optical counterpart which is a pre-main-sequence (PMS) star, and two have normal main-sequence stellar counterparts, while the other two are probably extragalactic sources. The PMS star is located in a high Galactic latitude Lynds dark cloud, so this result is not too suprising. The translucent clouds, though, have yet to reveal any evidence of star formation.

  19. Overview of errors in the reference sequence and annotation of Mycobacterium tuberculosis H37Rv, and variation amongst its isolates

    KAUST Repository

    Köser, Claudio U.

    2012-06-01

    Since its publication in 1998, the genome sequence of the Mycobacterium tuberculosis H37Rv laboratory strain has acted as the cornerstone for the study of tuberculosis. In this review we address some of the practical aspects that have come to light relating to the use of H37Rv throughout the past decade which are of relevance for the ongoing genomic and laboratory studies of this pathogen. These include errors in the genome reference sequence and its annotation, as well as the recently detected variation amongst isolates of H37Rv from different laboratories. © 2011 Elsevier B.V..

  20. Overview of errors in the reference sequence and annotation of Mycobacterium tuberculosis H37Rv, and variation amongst its isolates

    KAUST Repository

    Kö ser, Claudio U.; Niemann, Stefan; Summers, David K.; Archer, John A.C.

    2012-01-01

    Since its publication in 1998, the genome sequence of the Mycobacterium tuberculosis H37Rv laboratory strain has acted as the cornerstone for the study of tuberculosis. In this review we address some of the practical aspects that have come to light

  1. (reprocessed)HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us FANTOM...ntified by CAGE tag analysis (BED format) *.rdna.fa.gz: rDNA sequences (FASTA format) Data file File name: fantom...5_rp_exp_details.zip File URL: ftp://ftp.biosciencedbc.jp/archive/fantom5/20161221/fantom5_rp_exp_detai...tp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/reprocessed/hg38_latest/basic/ File size: 1.4 TB File na...me: (reprocessed)basic (Mus musculus) File URL: ftp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/reproce

  2. The Importance of Biological Databases in Biological Discovery.

    Science.gov (United States)

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  3. Database Description - ASTRA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name ASTRA Alternative n...tics Journal Search: Contact address Database classification Nucleotide Sequence Databases - Gene structure,...3702 Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description The database represents classified p...(10):1211-6. External Links: Original website information Database maintenance site National Institute of Ad... for user registration Not available About This Database Database Description Dow

  4. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  5. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  6. Experiment Databases

    Science.gov (United States)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  7. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    Science.gov (United States)

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  8. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Science.gov (United States)

    Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

    2018-01-04

    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  9. KALIMER database development

    Energy Technology Data Exchange (ETDEWEB)

    Jeong, Kwan Seong; Lee, Yong Bum; Jeong, Hae Yong; Ha, Kwi Seok

    2003-03-01

    KALIMER database is an advanced database to utilize the integration management for liquid metal reactor design technology development using Web applications. KALIMER design database is composed of results database, Inter-Office Communication (IOC), 3D CAD database, and reserved documents database. Results database is a research results database during all phase for liquid metal reactor design technology development of mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD database is a schematic overview for KALIMER design structure. And reserved documents database is developed to manage several documents and reports since project accomplishment.

  10. KALIMER database development

    International Nuclear Information System (INIS)

    Jeong, Kwan Seong; Lee, Yong Bum; Jeong, Hae Yong; Ha, Kwi Seok

    2003-03-01

    KALIMER database is an advanced database to utilize the integration management for liquid metal reactor design technology development using Web applications. KALIMER design database is composed of results database, Inter-Office Communication (IOC), 3D CAD database, and reserved documents database. Results database is a research results database during all phase for liquid metal reactor design technology development of mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD database is a schematic overview for KALIMER design structure. And reserved documents database is developed to manage several documents and reports since project accomplishment

  11. Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices.

    Science.gov (United States)

    Tatjewski, Marcin; Kierczak, Marcin; Plewczynski, Dariusz

    2017-01-01

    Here, we present two perspectives on the task of predicting post translational modifications (PTMs) from local sequence fragments using machine learning algorithms. The first is the description of the fundamental steps required to construct a PTM predictor from the very beginning. These steps include data gathering, feature extraction, or machine-learning classifier selection. The second part of our work contains the detailed discussion of more advanced problems which are encountered in PTM prediction task. Probably the most challenging issues which we have covered here are: (1) how to address the training data class imbalance problem (we also present statistics describing the problem); (2) how to properly set up cross-validation folds with an approach which takes into account the homology of protein data records, to address this problem we present our folds-over-clusters algorithm; and (3) how to efficiently reach for new sources of learning features. Presented techniques and notes resulted from intense studies in the field, performed by our and other groups, and can be useful both for researchers beginning in the field of PTM prediction and for those who want to extend the repertoire of their research techniques.

  12. Database Description - TMFunction | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available sidue (or mutant) in a protein. The experimental data are collected from the literature both by searching th...the sequence database, UniProt, structural database, PDB, and literature database

  13. Database Description - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RMG Alternative name ...raki 305-8602, Japan National Institute of Agrobiological Sciences E-mail : Database... classification Nucleotide Sequence Databases Organism Taxonomy Name: Oryza sativa Japonica Group Taxonomy ID: 39947 Database...rnal: Mol Genet Genomics (2002) 268: 434–445 External Links: Original website information Database...available URL of Web services - Need for user registration Not available About This Database Database Descri

  14. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    Directory of Open Access Journals (Sweden)

    Charles Richard Bradshaw

    Full Text Available Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10, a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in

  15. SINEBase: a database and tool for SINE analysis.

    Science.gov (United States)

    Vassetzky, Nikita S; Kramerov, Dmitri A

    2013-01-01

    SINEBase (http://sines.eimb.ru) integrates the revisited body of knowledge about short interspersed elements (SINEs). A set of formal definitions concerning SINEs was introduced. All available sequence data were screened through these definitions and the genetic elements misidentified as SINEs were discarded. As a result, 175 SINE families have been recognized in animals, flowering plants and green algae. These families were classified by the modular structure of their nucleotide sequences and the frequencies of different patterns were evaluated. These data formed the basis for the database of SINEs. The SINEBase website can be used in two ways: first, to explore the database of SINE families, and second, to analyse candidate SINE sequences using specifically developed tools. This article presents an overview of the database and the process of SINE identification and analysis.

  16. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

    Directory of Open Access Journals (Sweden)

    Richardson Annette C

    2008-07-01

    Full Text Available Abstract Background Kiwifruit (Actinidia spp. are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs. Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons. Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases and pathways (terpenoid biosynthesis is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.

  17. Generalized Database Management System Support for Numeric Database Environments.

    Science.gov (United States)

    Dominick, Wayne D.; Weathers, Peggy G.

    1982-01-01

    This overview of potential for utilizing database management systems (DBMS) within numeric database environments highlights: (1) major features, functions, and characteristics of DBMS; (2) applicability to numeric database environment needs and user needs; (3) current applications of DBMS technology; and (4) research-oriented and…

  18. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  19. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  20. DATABASES DEVELOPED IN INDIA FOR BIOLOGICAL SCIENCES

    Directory of Open Access Journals (Sweden)

    Gitanjali Yadav

    2017-09-01

    Full Text Available The complexity of biological systems requires use of a variety of experimental methods with ever increasing sophistication to probe various cellular processes at molecular and atomic resolution. The availability of technologies for determining nucleic acid sequences of genes and atomic resolution structures of biomolecules prompted development of major biological databases like GenBank and PDB almost four decades ago. India was one of the few countries to realize early, the utility of such databases for progress in modern biology/biotechnology. Department of Biotechnology (DBT, India established Biotechnology Information System (BTIS network in late eighties. Starting with the genome sequencing revolution at the turn of the century, application of high-throughput sequencing technologies in biology and medicine for analysis of genomes, transcriptomes, epigenomes and microbiomes have generated massive volumes of sequence data. BTIS network has not only provided state of the art computational infrastructure to research institutes and universities for utilizing various biological databases developed abroad in their research, it has also actively promoted research and development (R&D projects in Bioinformatics to develop a variety of biological databases in diverse areas. It is encouraging to note that, a large number of biological databases or data driven software tools developed in India, have been published in leading peer reviewed international journals like Nucleic Acids Research, Bioinformatics, Database, BMC, PLoS and NPG series publication. Some of these databases are not only unique, they are also highly accessed as reflected in number of citations. Apart from databases developed by individual research groups, BTIS has initiated consortium projects to develop major India centric databases on Mycobacterium tuberculosis, Rice and Mango, which can potentially have practical applications in health and agriculture. Many of these biological

  1. Overview of a compre­hensive resource database for the assessment of recoverable hydrocarbons produced by carbon dioxide enhanced oil recovery

    Science.gov (United States)

    Carolus, Marshall; Biglarbigi, Khosrow; Warwick, Peter D.; Attanasi, Emil D.; Freeman, Philip A.; Lohr, Celeste D.

    2017-10-24

    A database called the “Comprehensive Resource Database” (CRD) was prepared to support U.S. Geological Survey (USGS) assessments of technically recoverable hydrocarbons that might result from the injection of miscible or immiscible carbon dioxide (CO2) for enhanced oil recovery (EOR). The CRD was designed by INTEK Inc., a consulting company under contract to the USGS. The CRD contains data on the location, key petrophysical properties, production, and well counts (number of wells) for the major oil and gas reservoirs in onshore areas and State waters of the conterminous United States and Alaska. The CRD includes proprietary data on petrophysical properties of fields and reservoirs from the “Significant Oil and Gas Fields of the United States Database,” prepared by Nehring Associates in 2012, and proprietary production and drilling data from the “Petroleum Information Data Model Relational U.S. Well Data,” prepared by IHS Inc. in 2012. This report describes the CRD and the computer algorithms used to (1) estimate missing reservoir property values in the Nehring Associates (2012) database, and to (2) generate values of additional properties used to characterize reservoirs suitable for miscible or immiscible CO2 flooding for EOR. Because of the proprietary nature of the data and contractual obligations, the CRD and actual data from Nehring Associates (2012) and IHS Inc. (2012) cannot be presented in this report.

  2. Database principles programming performance

    CERN Document Server

    O'Neil, Patrick

    2014-01-01

    Database: Principles Programming Performance provides an introduction to the fundamental principles of database systems. This book focuses on database programming and the relationships between principles, programming, and performance.Organized into 10 chapters, this book begins with an overview of database design principles and presents a comprehensive introduction to the concepts used by a DBA. This text then provides grounding in many abstract concepts of the relational model. Other chapters introduce SQL, describing its capabilities and covering the statements and functions of the programmi

  3. Improving taxonomic accuracy for fungi in public sequence databases: applying ‘one name one species’ in well-defined genera with Trichoderma/Hypocrea as a test case

    Science.gov (United States)

    Strope, Pooja K; Chaverri, Priscila; Gazis, Romina; Ciufo, Stacy; Domrachev, Michael; Schoch, Conrad L

    2017-01-01

    Abstract The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353 PMID:29220466

  4. Disbiome database: linking the microbiome to disease.

    Science.gov (United States)

    Janssens, Yorick; Nielandt, Joachim; Bronselaer, Antoon; Debunne, Nathan; Verbeke, Frederick; Wynendaele, Evelien; Van Immerseel, Filip; Vandewynckel, Yves-Paul; De Tré, Guy; De Spiegeleer, Bart

    2018-06-04

    Recent research has provided fascinating indications and evidence that the host health is linked to its microbial inhabitants. Due to the development of high-throughput sequencing technologies, more and more data covering microbial composition changes in different disease types are emerging. However, this information is dispersed over a wide variety of medical and biomedical disciplines. Disbiome is a database which collects and presents published microbiota-disease information in a standardized way. The diseases are classified using the MedDRA classification system and the micro-organisms are linked to their NCBI and SILVA taxonomy. Finally, each study included in the Disbiome database is assessed for its reporting quality using a standardized questionnaire. Disbiome is the first database giving a clear, concise and up-to-date overview of microbial composition differences in diseases, together with the relevant information of the studies published. The strength of this database lies within the combination of the presence of references to other databases, which enables both specific and diverse search strategies within the Disbiome database, and the human annotation which ensures a simple and structured presentation of the available data.

  5. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database.

    Science.gov (United States)

    Ikeda, Shun; Abe, Takashi; Nakamura, Yukiko; Kibinge, Nelson; Hirai Morita, Aki; Nakatani, Atsushi; Ono, Naoaki; Ikemura, Toshimichi; Nakamura, Kensuke; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

    2013-05-01

    Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

  6. Database security in the cloud

    OpenAIRE

    Sakhi, Imal

    2012-01-01

    The aim of the thesis is to get an overview of the database services available in cloud computing environment, investigate the security risks associated with it and propose the possible countermeasures to minimize the risks. The thesis also analyzes two cloud database service providers namely; Amazon RDS and Xeround. The reason behind choosing these two providers is because they are currently amongst the leading cloud database providers and both provide relational cloud databases which makes ...

  7. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  8. The Pisa pre-main sequence tracks and isochrones. A database covering a wide range of Z, Y, mass, and age values

    Science.gov (United States)

    Tognelli, E.; Prada Moroni, P. G.; Degl'Innocenti, S.

    2011-09-01

    Context. In recent years new observations of pre-main sequence stars (pre-MS) with Z ≤ Z⊙ have been made available. To take full advantage of the continuously growing amount of data of pre-MS stars in different environments, we need to develop updated pre-MS models for a wide range of metallicity to assign reliable ages and masses to the observed stars. Aims: We present updated evolutionary pre-MS models and isochrones for a fine grid of mass, age, metallicity, and helium values. Methods: We use a standard and well-tested stellar evolutionary code (i.e. FRANEC), that adopts outer boundary conditions from detailed and realistic atmosphere models. In this code, we incorporate additional improvements to the physical inputs related to the equation of state and the low temperature radiative opacities essential to computing low-mass stellar models. Results: We make available via internet a large database of pre-MS tracks and isochrones for a wide range of chemical compositions (Z = 0.0002-0.03), masses (M = 0.2-7.0 M⊙), and ages (1-100 Myr) for a solar-calibrated mixing length parameter α (i.e. 1.68). For each chemical composition, additional models were computed with two different mixing length values, namely α = 1.2 and 1.9. Moreover, for Z ≥ 0.008, we also provided models with two different initial deuterium abundances. The characteristics of the models have been discussed in detail and compared with other work in the literature. The main uncertainties affecting theoretical predictions have been critically discussed. Comparisons with selected data indicate that there is close agreement between theory and observation. Tracks and isochrones are available on the web at the http://astro.df.unipi.it/stellar-models/Tracks and isochrones are also available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/533/A109

  9. Comprehensive two-dimensional gel protein databases offer a global approach to the analysis of human cells: the transformed amnion cells (AMA) master database and its link to genome DNA sequence data

    DEFF Research Database (Denmark)

    Celis, J E; Gesser, B; Rasmussen, H H

    1990-01-01

    , mitochondria, Golgi, ribosomes, intermediate filaments, microfilaments and microtubules), levels in fetal human tissues, partial protein sequences (containing information on 48 human proteins microsequenced so far), cell cycle-regulated proteins, proteins sensitive to interferons alpha, beta, and gamma, heat...

  10. Dictionary as Database.

    Science.gov (United States)

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  11. Ocean Drilling Program: Janus Web Database

    Science.gov (United States)

    JANUS Database Send questions/comments about the online database Request data not available online Janus database Search the ODP/TAMU web site ODP's main web site Janus Data Model Data Migration Overview in Janus Data Types and Examples Leg 199, sunrise. Janus Web Database ODP and IODP data are stored in

  12. KALIMER database development (database configuration and design methodology)

    International Nuclear Information System (INIS)

    Jeong, Kwan Seong; Kwon, Young Min; Lee, Young Bum; Chang, Won Pyo; Hahn, Do Hee

    2001-10-01

    KALIMER Database is an advanced database to utilize the integration management for Liquid Metal Reactor Design Technology Development using Web Applicatins. KALIMER Design database consists of Results Database, Inter-Office Communication (IOC), and 3D CAD database, Team Cooperation system, and Reserved Documents, Results Database is a research results database during phase II for Liquid Metal Reactor Design Technology Develpment of mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD Database is s schematic design overview for KALIMER. Team Cooperation System is to inform team member of research cooperation and meetings. Finally, KALIMER Reserved Documents is developed to manage collected data and several documents since project accomplishment. This report describes the features of Hardware and Software and the Database Design Methodology for KALIMER

  13. tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us tRNAD... tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive ...

  14. Database Description - KAIKOcDNA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us KAIKOcDNA Database Description General information of database Database name KAIKOcDNA Alter...National Institute of Agrobiological Sciences Akiya Jouraku E-mail : Database cla...ssification Nucleotide Sequence Databases Organism Taxonomy Name: Bombyx mori Taxonomy ID: 7091 Database des...rnal: G3 (Bethesda) / 2013, Sep / vol.9 External Links: Original website information Database maintenance si...available URL of Web services - Need for user registration Not available About This Database Database

  15. GOBASE: an organelle genome database

    OpenAIRE

    O?Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

    2008-01-01

    The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (?913 000) and chloroplast-encoded sequences (?250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing...

  16. A search for pre-main sequence stars in the high-latitude molecular clouds. II - A survey of the Einstein database

    Science.gov (United States)

    Caillault, Jean-Pierre; Magnani, Loris

    1990-01-01

    The preliminary results are reported of a survey of every EINSTEIN image which overlaps any high-latitude molecular cloud in a search for X-ray emitting pre-main sequence stars. This survey, together with complementary KPNO and IRAS data, will allow the determination of how prevalent low mass star formation is in these clouds in general and, particularly, in the translucent molecular clouds.

  17. A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera, Demospongiae, Poecilosclerida): transcriptome sequencing, characterization and overview of the gene expression along three life cycle stages.

    Science.gov (United States)

    Pérez-Porro, A R; Navarro-Gómez, D; Uriz, M J; Giribet, G

    2013-05-01

    Sponges can be dominant organisms in many marine and freshwater habitats where they play essential ecological roles. They also represent a key group to address important questions in early metazoan evolution. Recent approaches for improving knowledge on sponge biological and ecological functions as well as on animal evolution have focused on the genetic toolkits involved in ecological responses to environmental changes (biotic and abiotic), development and reproduction. These approaches are possible thanks to newly available, massive sequencing technologies-such as the Illumina platform, which facilitate genome and transcriptome sequencing in a cost-effective manner. Here we present the first NGS (next-generation sequencing) approach to understanding the life cycle of an encrusting marine sponge. For this we sequenced libraries of three different life cycle stages of the Mediterranean sponge Crella elegans and generated de novo transcriptome assemblies. Three assemblies were based on sponge tissue of a particular life cycle stage, including non-reproductive tissue, tissue with sperm cysts and tissue with larvae. The fourth assembly pooled the data from all three stages. By aggregating data from all the different life cycle stages we obtained a higher total number of contigs, contigs with blast hit and annotated contigs than from one stage-based assemblies. In that multi-stage assembly we obtained a larger number of the developmental regulatory genes known for metazoans than in any other assembly. We also advance the differential expression of selected genes in the three life cycle stages to explore the potential of RNA-seq for improving knowledge on functional processes along the sponge life cycle. © 2013 Blackwell Publishing Ltd.

  18. Biofuel Database

    Science.gov (United States)

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  19. Community Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such...

  20. Mycobacteriophage genome database.

    Science.gov (United States)

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  1. Tank waste processing analysis: Database development, tank-by-tank processing requirements, and examples of pretreatment sequences and schedules as applied to Hanford Double-Shell Tank Supernatant Waste - FY 1993

    International Nuclear Information System (INIS)

    Colton, N.G.; Orth, R.J.; Aitken, E.A.

    1994-09-01

    This report gives the results of work conducted in FY 1993 by the Tank Waste Processing Analysis Task for the Underground Storage Tank Integrated Demonstration. The main purpose of this task, led by Pacific Northwest Laboratory, is to demonstrate a methodology to identify processing sequences, i.e., the order in which a tank should be processed. In turn, these sequences may be used to assist in the development of time-phased deployment schedules. Time-phased deployment is implementation of pretreatment technologies over a period of time as technologies are required and/or developed. The work discussed here illustrates how tank-by-tank databases and processing requirements have been used to generate processing sequences and time-phased deployment schedules. The processing sequences take into account requirements such as the amount and types of data available for the tanks, tank waste form and composition, required decontamination factors, and types of compact processing units (CPUS) required and technology availability. These sequences were developed from processing requirements for the tanks, which were determined from spreadsheet analyses. The spreadsheet analysis program was generated by this task in FY 1993. Efforts conducted for this task have focused on the processing requirements for Hanford double-shell tank (DST) supernatant wastes (pumpable liquid) because this waste type is easier to retrieve than the other types (saltcake and sludge), and more tank space would become available for future processing needs. The processing requirements were based on Class A criteria set by the U.S. Nuclear Regulatory Commission and Clean Option goals provided by Pacific Northwest Laboratory

  2. Database Administrator

    Science.gov (United States)

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  3. Recent updates and developments to plant genome size databases

    Science.gov (United States)

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  4. Databases on biotechnology and biosafety of GMOs.

    Science.gov (United States)

    Degrassi, Giuliano; Alexandrova, Nevena; Ripandelli, Decio

    2003-01-01

    Due to the involvement of scientific, industrial, commercial and public sectors of society, the complexity of the issues concerning the safety of genetically modified organisms (GMOs) for the environment, agriculture, and human and animal health calls for a wide coverage of information. Accordingly, development of the field of biotechnology, along with concerns related to the fate of released GMOs, has led to a rapid development of tools for disseminating such information. As a result, there is a growing number of databases aimed at collecting and storing information related to GMOs. Most of the sites deal with information on environmental releases, field trials, transgenes and related sequences, regulations and legislation, risk assessment documents, and literature. Databases are mainly established and managed by scientific, national or international authorities, and are addressed towards scientists, government officials, policy makers, consumers, farmers, environmental groups and civil society representatives. This complexity can lead to an overlapping of information. The purpose of the present review is to analyse the relevant databases currently available on the web, providing comments on their vastly different information and on the structure of the sites pertaining to different users. A preliminary overview on the development of these sites during the last decade, at both the national and international level, is also provided.

  5. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.

    Science.gov (United States)

    Verheggen, Kenneth; Raeder, Helge; Berven, Frode S; Martens, Lennart; Barsnes, Harald; Vaudel, Marc

    2017-09-13

    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines. © 2017 Wiley Periodicals, Inc.

  6. Security aspects of database systems implementation

    OpenAIRE

    Pokorný, Tomáš

    2009-01-01

    The aim of this thesis is to provide a comprehensive overview of database systems security. Reader is introduced into the basis of information security and its development. Following chapter defines a concept of database system security using ISO/IEC 27000 Standard. The findings from this chapter form a complex list of requirements on database security. One chapter also deals with legal aspects of this domain. Second part of this thesis offers a comparison of four object-relational database s...

  7. The National Solar Radiation Database (NSRDB)

    Energy Technology Data Exchange (ETDEWEB)

    Sengupta, Manajit; Habte, Aron; Lopez, Anthony; Xie, Yu; Molling, Christine; Gueymard, Christian

    2017-03-13

    This presentation provides a high-level overview of the National Solar Radiation Database (NSRDB), including sensing, measurement and forecasting, and discusses observations that are needed for research and product development.

  8. Multigenerational information: the example of the Icelandic Genealogy Database.

    Science.gov (United States)

    Tulinius, Hrafn

    2011-01-01

    The first part of the chapter describes the Icelandic Genealogical Database, how it was created, what it contains, and how it operates. In the second part, an overview of research accomplished with material from the database is given.

  9. Program overview

    International Nuclear Information System (INIS)

    Anon.

    1977-01-01

    The program overview describes the following resources and facilities; laser facilities, main laser room, target room, energy storage, laboratory area, building support systems, general plant project, and the new trailer complex

  10. Disability Overview

    Science.gov (United States)

    ... About CDC.gov . Disability & Health Home Disability Overview Disability Inclusion Barriers to Inclusion Inclusion Strategies Inclusion in Programs & Activities Resources Healthy Living Disability & Physical Activity Disability & Obesity Disability & Smoking Disability & Breast ...

  11. Vulvovaginitis - overview

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/000897.htm Vulvovaginitis - overview To use the sharing features on this page, please enable JavaScript. Vulvovaginitis or vaginitis is swelling or infection of the ...

  12. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  13. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data.

    Science.gov (United States)

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-10-01

    Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.

  14. Comparison of sequencing the D2 region of the large subunit ribosomal RNA gene (MicroSEQ®) versus the internal transcribed spacer (ITS) regions using two public databases for identification of common and uncommon clinically relevant fungal species.

    Science.gov (United States)

    Arbefeville, S; Harris, A; Ferrieri, P

    2017-09-01

    Fungal infections cause considerable morbidity and mortality in immunocompromised patients. Rapid and accurate identification of fungi is essential to guide accurately targeted antifungal therapy. With the advent of molecular methods, clinical laboratories can use new technologies to supplement traditional phenotypic identification of fungi. The aims of the study were to evaluate the sole commercially available MicroSEQ® D2 LSU rDNA Fungal Identification Kit compared to the in-house developed internal transcribed spacer (ITS) regions assay in identifying moulds, using two well-known online public databases to analyze sequenced data. 85 common and uncommon clinically relevant fungi isolated from clinical specimens were sequenced for the D2 region of the large subunit (LSU) of ribosomal RNA (rRNA) gene with the MicroSEQ® Kit and the ITS regions with the in house developed assay. The generated sequenced data were analyzed with the online GenBank and MycoBank public databases. The D2 region of the LSU rRNA gene identified 89.4% or 92.9% of the 85 isolates to the genus level and the full ITS region (f-ITS) 96.5% or 100%, using GenBank or MycoBank, respectively, when compared to the consensus ID. When comparing species-level designations to the consensus ID, D2 region of the LSU rRNA gene aligned with 44.7% (38/85) or 52.9% (45/85) of these isolates in GenBank or MycoBank, respectively. By comparison, f-ITS possessed greater specificity, followed by ITS1, then ITS2 regions using GenBank or MycoBank. Using GenBank or MycoBank, D2 region of the LSU rRNA gene outperformed phenotypic based ID at the genus level. Comparing rates of ID between D2 region of the LSU rRNA gene and the ITS regions in GenBank or MycoBank at the species level against the consensus ID, f-ITS and ITS2 exceeded performance of the D2 region of the LSU rRNA gene, but ITS1 had similar performance to the D2 region of the LSU rRNA gene using MycoBank. Our results indicated that the MicroSEQ® D2 LSU r

  15. Database Replication

    CERN Document Server

    Kemme, Bettina

    2010-01-01

    Database replication is widely used for fault-tolerance, scalability and performance. The failure of one database replica does not stop the system from working as available replicas can take over the tasks of the failed replica. Scalability can be achieved by distributing the load across all replicas, and adding new replicas should the load increase. Finally, database replication can provide fast local access, even if clients are geographically distributed clients, if data copies are located close to clients. Despite its advantages, replication is not a straightforward technique to apply, and

  16. Refactoring databases evolutionary database design

    CERN Document Server

    Ambler, Scott W

    2006-01-01

    Refactoring has proven its value in a wide range of development projects–helping software professionals improve system designs, maintainability, extensibility, and performance. Now, for the first time, leading agile methodologist Scott Ambler and renowned consultant Pramodkumar Sadalage introduce powerful refactoring techniques specifically designed for database systems. Ambler and Sadalage demonstrate how small changes to table structures, data, stored procedures, and triggers can significantly enhance virtually any database design–without changing semantics. You’ll learn how to evolve database schemas in step with source code–and become far more effective in projects relying on iterative, agile methodologies. This comprehensive guide and reference helps you overcome the practical obstacles to refactoring real-world databases by covering every fundamental concept underlying database refactoring. Using start-to-finish examples, the authors walk you through refactoring simple standalone databas...

  17. An Integrated Molecular Database on Indian Insects.

    Science.gov (United States)

    Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil

    2018-01-01

    MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.

  18. RDD Databases

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This database was established to oversee documents issued in support of fishery research activities including experimental fishing permits (EFP), letters of...

  19. Snowstorm Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Snowstorm Database is a collection of over 500 snowstorms dating back to 1900 and updated operationally. Only storms having large areas of heavy snowfall (10-20...

  20. Dealer Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The dealer reporting databases contain the primary data reported by federally permitted seafood dealers in the northeast. Electronic reporting was implemented May 1,...

  1. National database

    DEFF Research Database (Denmark)

    Kristensen, Helen Grundtvig; Stjernø, Henrik

    1995-01-01

    Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen.......Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen....

  2. Overview of AEOD's program for trending reactor operational events

    International Nuclear Information System (INIS)

    Baranowsky, P.W.; O'Reilly, P.D.; Rasmuson, D.M.; Houghton, J.R.

    1994-01-01

    This paper presents an overview of the trending program being performed by AEOD. The major elements of the program include: (1) system and component reliability trending and analysis, (2) special data collection and analysis (e.g., IPE and PRA component failure data, common cause failure event data), (3) risk assessment of safety issues based on actual operating experience, (4) Accident Sequence Precursor (ASP) Program, and (5) trending US industry risk. AEOD plans to maintain up-to-date safety data trends for selected high risk or high regulatory profile components, systems, accident initiators, accident sequences, and regulatory issues. AEOD will also make greater use of PRA insights and perform limited probabilistic safety assessments to evaluate the safety significance of qualitative results. Examples of a system study and an issue evaluation are presented, as well as a summary of the common cause failure event database

  3. Database Description - AcEST | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name AcEST Alternative n...hi, Tokyo-to 192-0397 Tel: +81-42-677-1111(ext.3654) E-mail: Database classificat...eneris Taxonomy ID: 13818 Database description This is a database of EST sequences of Adiantum capillus-vene...(3): 223-227. External Links: Original website information Database maintenance site Plant Environmental Res...base Database Description Download License Update History of This Database Site Policy | Contact Us Database Description - AcEST | LSDB Archive ...

  4. Legume and Lotus japonicus Databases

    DEFF Research Database (Denmark)

    Hirakawa, Hideki; Mun, Terry; Sato, Shusei

    2014-01-01

    Since the genome sequence of Lotus japonicus, a model plant of family Fabaceae, was determined in 2008 (Sato et al. 2008), the genomes of other members of the Fabaceae family, soybean (Glycine max) (Schmutz et al. 2010) and Medicago truncatula (Young et al. 2011), have been sequenced. In this sec....... In this section, we introduce representative, publicly accessible online resources related to plant materials, integrated databases containing legume genome information, and databases for genome sequence and derived marker information of legume species including L. japonicus...

  5. GEOTAB. Overview

    International Nuclear Information System (INIS)

    Eriksson, E.; Stark, T.; Johansson, B.; Magnusson, S.; Gerlach, M.; Sehlstedt, S.; Nilsson, A.C.

    1992-01-01

    Since 1977 Swedish Nuclear Fuel and Waste Managements Co., SKB has been performing a research and development program for the final disposal of spent nuclear fuel. One aim of this program is to gain knowledge of different bedrock properties. Measurements for the characterization of geological, geophysical, hydrogeological and hydrochemical conditions are performed in specific site investigations as well as for geoscientific projects. Large volumes of data have been produced since the start of the program, in the form of both raw data and results. During (the course of) the research program this data has been stored in various formats by different institutions and companies performing the investigations. It was therefore decided that all data from the research and development program should be stored in a single database. The database, called Geotab, is a relational database, based on a database handling system from Mimer Information Systems. The application program, also called GEOTAB, was developed by Ergodata. This manual provides a description of the basics of the database and application programs used in connection with GEOTAB. It is intended as an introduction for new and regular users. The database is a mix of information organized so that storage, retrieval and modification of the data is as efficient as possible. GEOTAB is built around a relational database model. (au)

  6. Modelling Overview

    DEFF Research Database (Denmark)

    Larsen, Lars Bjørn; Vesterager, Johan

    This report provides an overview of the existing models of global manufacturing, describes the required modelling views and associated methods and identifies tools, which can provide support for this modelling activity.The model adopted for global manufacturing is that of an extended enterprise s...

  7. Introductory Overviews

    NARCIS (Netherlands)

    Jakeman, A.J.; Hamilton, S.H.; Athanasiadis, I.N.; Pierce, S.A.

    2015-01-01

    Introductory Overview articles are designed to provide introductory level background to key themes and topics that caters to the eclectic readership of EMS. It is envisaged that these articles will help to break down barriers to shared understanding and dialogue within multidisciplinary teams, and

  8. Conference overview

    Indian Academy of Sciences (India)

    It had 17 plenary talks and as a new feature it also had 8 short talks which ..... absence of black holes, long-term simulations are possible and quantitative .... [3] For a brief overview, see The Nag Memorial Lecture by Ashoke Sen at the Inst-.

  9. Database on veterinary clinical research in homeopathy.

    Science.gov (United States)

    Clausen, Jürgen; Albrecht, Henning

    2010-07-01

    The aim of the present report is to provide an overview of the first database on clinical research in veterinary homeopathy. Detailed searches in the database 'Veterinary Clinical Research-Database in Homeopathy' (http://www.carstens-stiftung.de/clinresvet/index.php). The database contains about 200 entries of randomised clinical trials, non-randomised clinical trials, observational studies, drug provings, case reports and case series. Twenty-two clinical fields are covered and eight different groups of species are included. The database is free of charge and open to all interested veterinarians and researchers. The database enables researchers and veterinarians, sceptics and supporters to get a quick overview of the status of veterinary clinical research in homeopathy and alleviates the preparation of systematical reviews or may stimulate reproductions or even new studies. 2010 Elsevier Ltd. All rights reserved.

  10. Prediction methods and databases within chemoinformatics

    DEFF Research Database (Denmark)

    Jónsdóttir, Svava Osk; Jørgensen, Flemming Steen; Brunak, Søren

    2005-01-01

    MOTIVATION: To gather information about available databases and chemoinformatics methods for prediction of properties relevant to the drug discovery and optimization process. RESULTS: We present an overview of the most important databases with 2-dimensional and 3-dimensional structural information...... about drugs and drug candidates, and of databases with relevant properties. Access to experimental data and numerical methods for selecting and utilizing these data is crucial for developing accurate predictive in silico models. Many interesting predictive methods for classifying the suitability...

  11. "Mr. Database" : Jim Gray and the History of Database Technologies.

    Science.gov (United States)

    Hanwahr, Nils C

    2017-12-01

    Although the widespread use of the term "Big Data" is comparatively recent, it invokes a phenomenon in the developments of database technology with distinct historical contexts. The database engineer Jim Gray, known as "Mr. Database" in Silicon Valley before his disappearance at sea in 2007, was involved in many of the crucial developments since the 1970s that constitute the foundation of exceedingly large and distributed databases. Jim Gray was involved in the development of relational database systems based on the concepts of Edgar F. Codd at IBM in the 1970s before he went on to develop principles of Transaction Processing that enable the parallel and highly distributed performance of databases today. He was also involved in creating forums for discourse between academia and industry, which influenced industry performance standards as well as database research agendas. As a co-founder of the San Francisco branch of Microsoft Research, Gray increasingly turned toward scientific applications of database technologies, e. g. leading the TerraServer project, an online database of satellite images. Inspired by Vannevar Bush's idea of the memex, Gray laid out his vision of a Personal Memex as well as a World Memex, eventually postulating a new era of data-based scientific discovery termed "Fourth Paradigm Science". This article gives an overview of Gray's contributions to the development of database technology as well as his research agendas and shows that central notions of Big Data have been occupying database engineers for much longer than the actual term has been in use.

  12. Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs

    DEFF Research Database (Denmark)

    Friis, Susanne L; Buchard, Anders; Rockenbauer, Eszter

    2016-01-01

    This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report with the......This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report...

  13. Teaching Databases at Southampton University

    OpenAIRE

    Thomas, Ken

    2003-01-01

    In this paper, we describe some of the issues faced when designing a database systems course that will be a compulsory component for second year undergraduates in computer science. The main goal is to give an overview of database systems starting from Codd’s classical paper through to practical implementation using a SQL server (MySQL) For conceptual modelling, we chose UML because of the prior knowledge of the target class. The logical model is derived from the conceptual model and we place ...

  14. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  15. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  16. PHOBOS Overview

    Science.gov (United States)

    Hofman, David J.; Phobos Collaboration; Bbback; Baker, M. D.; Ballintijn, M.; Barton, D. S.; Betts, R. R.; Bickley, A. A.; Bindel, R.; Budzanowski, A.; Busza, W.; Carroll, A.; Chai, Z.; Decowski, M. P.; García, E.; Gburek, T.; George, N.; Gulbrandsen, K.; Gushue, S.; Halliwell, C.; Hamblen, J.; Hauer, M.; Heintzelman, G. A.; Henderson, C.; Hollis, R. S.; Hołyński, R.; Holzman, B.; Iordanova, A.; Johnson, E.; Kane, J. L.; Khan, N.; Kulinich, P.; Kuo, C. M.; Lin, W. T.; Manly, S.; Mignerey, A. C.; Nouicer, R.; Olszewski, A.; Pak, R.; Park, I. C.; Reed, C.; Roland, C.; Roland, G.; Sagerer, J.; Seals, H.; Sedykh, I.; Smith, C. E.; Stankiewicz, M. A.; Steinberg, P.; Stephans, G. S. F.; Sukhanov, A.; Tonjes, M. B.; Trzupek, A.; Vale, C.; van Nieuwenhuizen, G. J.; Vaurynovich, S. S.; Verdier, R.; Veres, G. I.; Wenger, E.; Wolfs, F. L. H.; Wosiek, B.; Kwoźniak; Wysłouch, B.

    2006-11-01

    A brief overview of the current results and conclusions from the PHOBOS experiment at the Relativistic Heavy Ion Collider (RHIC) is given. No evidence is found for non-monotonic behavior of observables measured by PHOBOS in the RHIC energy region. Convincing evidence is found that we have created a state of matter with high energy-density, that is nearly net-baryon free and is strongly interacting. The data are found to exhibit "simple" scaling behaviors, which include extended longitudinal scaling and scaling with the number of participating nucleons. The Au+Au collision charged particle data also exhibit a remarkable factorization of collision energy and geometry.

  17. REDIdb: the RNA editing database.

    Science.gov (United States)

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

  18. Experimental overview

    International Nuclear Information System (INIS)

    Nagamiya, Shoji

    1992-01-01

    Five years ago the first heavy-ion beams were accelerated at both the BNL-AGS and the CERN-SPS. This conference is the 5th anniversary in the experimental field. Currently, four experimental groups (E802/E859, E810, E814, E858) are taking data at BNL and eight groups (NA34-3, NA44, NA45, NA35, NA36, NA38, WA80/WA93, WA85) at CERN. Au and Pb beams are about to come, and a lot of activities are going on for RHIC and LHC. The purpose of this talk is to overview where we are, in particular, by looking at the past data. In this talk, the data of proton rapidity distributions are reviewed first to study nuclear transparency, then, the data of energy spectra and slopes, HBT and anti d production are discussed in connection with the evolution of the collision. Third, the data of strangeness production are described. Finally, the status of J/ψ and that of soft photons and electron pairs are briefly overviewed. (orig.)

  19. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  20. KALIMER design database development and operation manual

    International Nuclear Information System (INIS)

    Jeong, Kwan Seong; Hahn, Do Hee; Lee, Yong Bum; Chang, Won Pyo

    2000-12-01

    KALIMER Design Database is developed to utilize the integration management for Liquid Metal Reactor Design Technology Development using Web Applications. KALIMER Design database consists of Results Database, Inter-Office Communication (IOC), 3D CAD database, Team Cooperation System, and Reserved Documents. Results Database is a research results database for mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD Database is a schematic design overview for KALIMER. Team Cooperation System is to inform team member of research cooperation and meetings. Finally, KALIMER Reserved Documents is developed to manage collected data and several documents since project accomplishment

  1. KALIMER design database development and operation manual

    Energy Technology Data Exchange (ETDEWEB)

    Jeong, Kwan Seong; Hahn, Do Hee; Lee, Yong Bum; Chang, Won Pyo

    2000-12-01

    KALIMER Design Database is developed to utilize the integration management for Liquid Metal Reactor Design Technology Development using Web Applications. KALIMER Design database consists of Results Database, Inter-Office Communication (IOC), 3D CAD database, Team Cooperation System, and Reserved Documents. Results Database is a research results database for mid-term and long-term nuclear R and D. IOC is a linkage control system inter sub project to share and integrate the research results for KALIMER. 3D CAD Database is a schematic design overview for KALIMER. Team Cooperation System is to inform team member of research cooperation and meetings. Finally, KALIMER Reserved Documents is developed to manage collected data and several documents since project accomplishment.

  2. Database Description - eSOL | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name eSOL Alternative nam...eator Affiliation: The Research and Development of Biological Databases Project, National Institute of Genet...nology 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8501 Japan Email: Tel.: +81-45-924-5785 Database... classification Protein sequence databases - Protein properties Organism Taxonomy Name: Escherichia coli Taxonomy ID: 562 Database...i U S A. 2009 Mar 17;106(11):4201-6. External Links: Original website information Database maintenance site

  3. Stackfile Database

    Science.gov (United States)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  4. Draft secure medical database standard.

    Science.gov (United States)

    Pangalos, George

    2002-01-01

    Medical database security is a particularly important issue for all Healthcare establishments. Medical information systems are intended to support a wide range of pertinent health issues today, for example: assure the quality of care, support effective management of the health services institutions, monitor and contain the cost of care, implement technology into care without violating social values, ensure the equity and availability of care, preserve humanity despite the proliferation of technology etc.. In this context, medical database security aims primarily to support: high availability, accuracy and consistency of the stored data, the medical professional secrecy and confidentiality, and the protection of the privacy of the patient. These properties, though of technical nature, basically require that the system is actually helpful for medical care and not harmful to patients. These later properties require in turn not only that fundamental ethical principles are not violated by employing database systems, but instead, are effectively enforced by technical means. This document reviews the existing and emerging work on the security of medical database systems. It presents in detail the related problems and requirements related to medical database security. It addresses the problems of medical database security policies, secure design methodologies and implementation techniques. It also describes the current legal framework and regulatory requirements for medical database security. The issue of medical database security guidelines is also examined in detailed. The current national and international efforts in the area are studied. It also gives an overview of the research work in the area. The document also presents in detail the most complete to our knowledge set of security guidelines for the development and operation of medical database systems.

  5. Clinical Databases for Chest Physicians.

    Science.gov (United States)

    Courtwright, Andrew M; Gabriel, Peter E

    2018-04-01

    A clinical database is a repository of patient medical and sociodemographic information focused on one or more specific health condition or exposure. Although clinical databases may be used for research purposes, their primary goal is to collect and track patient data for quality improvement, quality assurance, and/or actual clinical management. This article aims to provide an introduction and practical advice on the development of small-scale clinical databases for chest physicians and practice groups. Through example projects, we discuss the pros and cons of available technical platforms, including Microsoft Excel and Access, relational database management systems such as Oracle and PostgreSQL, and Research Electronic Data Capture. We consider approaches to deciding the base unit of data collection, creating consensus around variable definitions, and structuring routine clinical care to complement database aims. We conclude with an overview of regulatory and security considerations for clinical databases. Copyright © 2018 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.

  6. dBBQs: dataBase of Bacterial Quality scores

    OpenAIRE

    Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David

    2017-01-01

    Background: It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from al...

  7. Combined sequencing of mRNA and DNA from human embryonic stem cells

    Directory of Open Access Journals (Sweden)

    Florian Mertes

    2016-06-01

    Full Text Available Combined transcriptome and whole genome sequencing of the same ultra-low input sample down to single cells is a rapidly evolving approach for the analysis of rare cells. Besides stem cells, rare cells originating from tissues like tumor or biopsies, circulating tumor cells and cells from early embryonic development are under investigation. Herein we describe a universal method applicable for the analysis of minute amounts of sample material (150 to 200 cells derived from sub-colony structures from human embryonic stem cells. The protocol comprises the combined isolation and separate amplification of poly(A mRNA and whole genome DNA followed by next generation sequencing. Here we present a detailed description of the method developed and an overview of the results obtained for RNA and whole genome sequencing of human embryonic stem cells, sequencing data is available in the Gene Expression Omnibus (GEO database under accession number GSE69471.

  8. Phylogenetic reconstruction methods: an overview.

    Science.gov (United States)

    De Bruyn, Alexandre; Martin, Darren P; Lefeuvre, Pierre

    2014-01-01

    Initially designed to infer evolutionary relationships based on morphological and physiological characters, phylogenetic reconstruction methods have greatly benefited from recent developments in molecular biology and sequencing technologies with a number of powerful methods having been developed specifically to infer phylogenies from macromolecular data. This chapter, while presenting an overview of basic concepts and methods used in phylogenetic reconstruction, is primarily intended as a simplified step-by-step guide to the construction of phylogenetic trees from nucleotide sequences using fairly up-to-date maximum likelihood methods implemented in freely available computer programs. While the analysis of chloroplast sequences from various Vanilla species is used as an illustrative example, the techniques covered here are relevant to the comparative analysis of homologous sequences datasets sampled from any group of organisms.

  9. RETRAN overview

    International Nuclear Information System (INIS)

    Agee, L.J.

    1985-01-01

    The RETRAN code has become the industry standard with respect to NSSS transient analysis. The objective of this paper is to present an overview of important RETRAN-related events since the second International meeting in April of 1982. This paper is divided into three parts. The first part addresses the current status of the code with emphasis on the design review of RETRAN-02 MOD002 and the goal of RETRAN-02 in the Reactor Analysis Support Package (RASP). These activities are being undertaken to simplify the use of RETRAN for safety analysis and reload application which may be part of an NRC submittal. The second part of the paper describes significant applications of RETRAN. In the analysis section, special emphasis is placed on validation analyses which compare the code to actual plant data or experimental facilities. The third section briefly describes the pre-release version of RETRAN and the developmental goals for the next version of RETRAN. One major limitation of all state-of-the-art thermal-hydraulic codes is the determination of the structure of the fluid. A brief description of research needs in this are indicated

  10. Open Geoscience Database

    Science.gov (United States)

    Bashev, A.

    2012-04-01

    treatment could be conducted in other programs after extraction the filtered data into *.csv file. It makes the database understandable for non-experts. The database employs open data format (*.csv) and wide spread tools: PHP as the program language, MySQL as database management system, JavaScript for interaction with GoogleMaps and JQueryUI for create user interface. The database is multilingual: there are association tables, which connect with elements of the database. In total the development required about 150 hours. The database still has several problems. The main problem is the reliability of the data. Actually it needs an expert system for estimation the reliability, but the elaboration of such a system would take more resources than the database itself. The second problem is the problem of stream selection - how to select the stations that are connected with each other (for example, belong to one water stream) and indicate their sequence. Currently the interface is English and Russian. However it can be easily translated to your language. But some problems we decided. For example problem "the problem of the same station" (sometimes the distance between stations is smaller, than the error of position): when you adding new station to the database our application automatically find station near this place. Also we decided problem of object and parameter type (how to regard "EC" and "electrical conductivity" as the same parameter). This problem has been solved using "associative tables". If you would like to see the interface on your language, just contact us. We should send you the list of terms and phrases for translation on your language. The main advantage of the database is that it is totally open: everybody can see, extract the data from the database and use them for non-commercial purposes with no charge. Registered users can contribute to the database without getting paid. We hope, that it will be widely used first of all for education purposes, but

  11. Extending Database Integration Technology

    National Research Council Canada - National Science Library

    Buneman, Peter

    1999-01-01

    Formal approaches to the semantics of databases and database languages can have immediate and practical consequences in extending database integration technologies to include a vastly greater range...

  12. The phytophthora genome initiative database: informatics and analysis for distributed pathogenomic research.

    Science.gov (United States)

    Waugh, M; Hraber, P; Weller, J; Wu, Y; Chen, G; Inman, J; Kiphart, D; Sobral, B

    2000-01-01

    The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi, with database access through the database access hyperlink.

  13. Review of Spatial-Database System Usability: Recommendations for the ADDNS Project

    National Research Council Canada - National Science Library

    Abdalla, R. M; Niall, K. K

    2007-01-01

    ...) and three-dimensional (3D) visualizations. This report presents an overview of the basic concepts of GIS and spatial databases, provides an analytical usability evaluation and critically analyses different spatial- database applications...

  14. Reliability databases: State-of-the-art and perspectives

    DEFF Research Database (Denmark)

    Akhmedjanov, Farit

    2001-01-01

    The report gives a history of development and an overview of the existing reliability databases. This overview also describes some other (than computer databases) sources of reliability and failures information, e.g. reliability handbooks, but the mainattention is paid to standard models...... and software packages containing the data mentioned. The standards corresponding to collection and exchange of reliability data are observed too. Finally, perspective directions in such data sources development areshown....

  15. Foundations of database systems : an introductory tutorial

    NARCIS (Netherlands)

    Paredaens, J.; Paredaens, J.; Tenenbaum, L. A.

    1994-01-01

    A very short overview is given of the principles of databases. The entity relationship model is used to define the conceptual base. Furthermore file management, the hierarchical model, the network model, the relational model and the object oriented model are discussed During the second world war,

  16. GRIP Database original data - GRIPDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us GRI...PDB GRIP Database original data Data detail Data name GRIP Database original data DOI 10....18908/lsdba.nbdc01665-006 Description of data contents GRIP Database original data It consists of data table...s and sequences. Data file File name: gripdb_original_data.zip File URL: ftp://ftp.biosciencedbc.jp/archive/gripdb/LATEST/gri...e Database Description Download License Update History of This Database Site Policy | Contact Us GRIP Database original data - GRIPDB | LSDB Archive ...

  17. Simplified validation of borderline hits of database searches

    OpenAIRE

    Thomas, Henrik; Shevchenko, Andrej

    2008-01-01

    Along with unequivocal hits produced by matching multiple MS/MS spectra to database sequences, LC-MS/MS analysis often yields a large number of hits of borderline statistical confidence. To simplify their validation, we propose to use rapid de novo interpretation of all acquired MS/MS spectra and, with the help of a simple software tool, display the candidate sequences together with each database search hit. We demonstrate that comparing hit database sequences and independent de novo interpre...

  18. The UCSC Genome Browser Database: 2008 update

    DEFF Research Database (Denmark)

    Karolchik, D; Kuhn, R M; Baertsch, R

    2007-01-01

    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrat...

  19. Datamining on distributed medical databases

    DEFF Research Database (Denmark)

    Have, Anna Szynkowiak

    2004-01-01

    This Ph.D. thesis focuses on clustering techniques for Knowledge Discovery in Databases. Various data mining tasks relevant for medical applications are described and discussed. A general framework which combines data projection and data mining and interpretation is presented. An overview...... is available. If data is unlabeled, then it is possible to generate keywords (in case of textual data) or key-patterns, as an informative representation of the obtained clusters. The methods are applied on simple artificial data sets, as well as collections of textual and medical data. In Danish: Denne ph...

  20. Overview of the SBS 2016 Mining Track

    DEFF Research Database (Denmark)

    Bogers, Toine; Hendrickx, Iris; Koolen, Marijn

    2016-01-01

    In this paper we present an overview of the mining track in the Social Book Search (SBS) lab 2016. The mining track addressed two tasks: (1) classifying forum posts as book search requests, and (2) linking book title mentions in forum posts to unique book IDs in a database. Both tasks are important...

  1. Database development and management

    CERN Document Server

    Chao, Lee

    2006-01-01

    Introduction to Database Systems Functions of a DatabaseDatabase Management SystemDatabase ComponentsDatabase Development ProcessConceptual Design and Data Modeling Introduction to Database Design Process Understanding Business ProcessEntity-Relationship Data Model Representing Business Process with Entity-RelationshipModelTable Structure and NormalizationIntroduction to TablesTable NormalizationTransforming Data Models to Relational Databases .DBMS Selection Transforming Data Models to Relational DatabasesEnforcing ConstraintsCreating Database for Business ProcessPhysical Design and Database

  2. The Ensembl genome database project.

    Science.gov (United States)

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  3. Database resources for the tuberculosis community.

    Science.gov (United States)

    Lew, Jocelyne M; Mao, Chunhong; Shukla, Maulik; Warren, Andrew; Will, Rebecca; Kuznetsov, Dmitry; Xenarios, Ioannis; Robertson, Brian D; Gordon, Stephen V; Schnappinger, Dirk; Cole, Stewart T; Sobral, Bruno

    2013-01-01

    Access to online repositories for genomic and associated "-omics" datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th-8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Efficient Disk-Based Techniques for Manipulating Very Large String Databases

    KAUST Repository

    Allam, Amin

    2017-01-01

    Indexing and processing strings are very important topics in database management. Strings can be database records, DNA sequences, protein sequences, or plain text. Various string operations are required for several application categories

  5. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  6. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  7. The UCSC Genome Browser Database: update 2006

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, ...

  8. Databases of the marine metagenomics

    KAUST Repository

    Mineta, Katsuhiko

    2015-10-28

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  9. An overview of recent developments in genomics and associated statistical methods.

    Science.gov (United States)

    Bickel, Peter J; Brown, James B; Huang, Haiyan; Li, Qunhua

    2009-11-13

    The landscape of genomics has changed drastically in the last two decades. Increasingly inexpensive sequencing has shifted the primary focus from the acquisition of biological sequences to the study of biological function. Assays have been developed to study many intricacies of biological systems, and publicly available databases have given rise to integrative analyses that combine information from many sources to draw complex conclusions. Such research was the focus of the recent workshop at the Isaac Newton Institute, 'High dimensional statistics in biology'. Many computational methods from modern genomics and related disciplines were presented and discussed. Using, as much as possible, the material from these talks, we give an overview of modern genomics: from the essential assays that make data-generation possible, to the statistical methods that yield meaningful inference. We point to current analytical challenges, where novel methods, or novel applications of extant methods, are presently needed.

  10. Foundations of RDF Databases

    Science.gov (United States)

    Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge

    The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.

  11. Informatics derived materials databases for multifunctional properties

    International Nuclear Information System (INIS)

    Broderick, Scott; Rajan, Krishna

    2015-01-01

    In this review, we provide an overview of the development of quantitative structure–property relationships incorporating the impact of data uncertainty from small, limited knowledge data sets from which we rapidly develop new and larger databases. Unlike traditional database development, this informatics based approach is concurrent with the identification and discovery of the key metrics controlling structure–property relationships; and even more importantly we are now in a position to build materials databases based on design ‘intent’ and not just design parameters. This permits for example to establish materials databases that can be used for targeted multifunctional properties and not just one characteristic at a time as is presently done. This review provides a summary of the computational logic of building such virtual databases and gives some examples in the field of complex inorganic solids for scintillator applications. (review)

  12. Database Software Selection for the Egyptian National STI Network.

    Science.gov (United States)

    Slamecka, Vladimir

    The evaluation and selection of information/data management system software for the Egyptian National Scientific and Technical (STI) Network are described. An overview of the state-of-the-art of database technology elaborates on the differences between information retrieval and database management systems (DBMS). The desirable characteristics of…

  13. The Nordic prescription databases as a resource for pharmacoepidemiological research

    DEFF Research Database (Denmark)

    Wettermark, B; Zoëga, H; Furu, K

    2013-01-01

    All five Nordic countries have nationwide prescription databases covering all dispensed drugs, with potential for linkage to outcomes. The aim of this review is to present an overview of therapeutic areas studied and methods applied in pharmacoepidemiologic studies using data from these databases....

  14. Mathematics for Databases

    NARCIS (Netherlands)

    ir. Sander van Laar

    2007-01-01

    A formal description of a database consists of the description of the relations (tables) of the database together with the constraints that must hold on the database. Furthermore the contents of a database can be retrieved using queries. These constraints and queries for databases can very well be

  15. Databases and their application

    NARCIS (Netherlands)

    Grimm, E.C.; Bradshaw, R.H.W; Brewer, S.; Flantua, S.; Giesecke, T.; Lézine, A.M.; Takahara, H.; Williams, J.W.,Jr; Elias, S.A.; Mock, C.J.

    2013-01-01

    During the past 20 years, several pollen database cooperatives have been established. These databases are now constituent databases of the Neotoma Paleoecology Database, a public domain, multiproxy, relational database designed for Quaternary-Pliocene fossil data and modern surface samples. The

  16. DOT Online Database

    Science.gov (United States)

    Page Home Table of Contents Contents Search Database Search Login Login Databases Advisory Circulars accessed by clicking below: Full-Text WebSearch Databases Database Records Date Advisory Circulars 2092 5 data collection and distribution policies. Document Database Website provided by MicroSearch

  17. The STRING database in 2011

    DEFF Research Database (Denmark)

    Szklarczyk, Damian; Franceschini, Andrea; Kuhn, Michael

    2011-01-01

    present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score...... models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org....

  18. Engineering method to build the composite structure ply database

    Directory of Open Access Journals (Sweden)

    Qinghua Shi

    Full Text Available In this paper, a new method to build a composite ply database with engineering design constraints is proposed. This method has two levels: the core stacking sequence design and the whole stacking sequence design. The core stacking sequences are obtained by the full permutation algorithm considering the ply ratio requirement and the dispersion character which characterizes the dispersion of ply angles. The whole stacking sequences are the combinations of the core stacking sequences. By excluding the ply sequences which do not meet the engineering requirements, the final ply database is obtained. One example with the constraints that the total layer number is 100 and the ply ratio is 30:60:10 is presented to validate the method. This method provides a new way to set up the ply database based on the engineering requirements without adopting intelligent optimization algorithms. Keywords: Composite ply database, VBA program, Structure design, Stacking sequence

  19. Brassica ASTRA: an integrated database for Brassica genomic research.

    Science.gov (United States)

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  20. Integrated Medical Model Overview

    Science.gov (United States)

    Myers, J.; Boley, L.; Foy, M.; Goodenow, D.; Griffin, D.; Keenan, A.; Kerstman, E.; Melton, S.; McGuire, K.; Saile, L.; hide

    2015-01-01

    The Integrated Medical Model (IMM) Project represents one aspect of NASA's Human Research Program (HRP) to quantitatively assess medical risks to astronauts for existing operational missions as well as missions associated with future exploration and commercial space flight ventures. The IMM takes a probabilistic approach to assessing the likelihood and specific outcomes of one hundred medical conditions within the envelope of accepted space flight standards of care over a selectable range of mission capabilities. A specially developed Integrated Medical Evidence Database (iMED) maintains evidence-based, organizational knowledge across a variety of data sources. Since becoming operational in 2011, version 3.0 of the IMM, the supporting iMED, and the expertise of the IMM project team have contributed to a wide range of decision and informational processes for the space medical and human research community. This presentation provides an overview of the IMM conceptual architecture and range of application through examples of actual space flight community questions posed to the IMM project.

  1. Database Search Engines: Paradigms, Challenges and Solutions.

    Science.gov (United States)

    Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.

  2. Cancer Genetics Overview (PDQ®)—Health Professional Version

    Science.gov (United States)

    Cancer Genetics Overview discusses hereditary cancers and the role of genetic variants (mutations). Get information about genetic counseling, familial cancer syndromes, genomic sequencing, germline and somatic testing, ethical and legal issues and more in this summary for clinicians.

  3. Dietary Supplement Ingredient Database

    Science.gov (United States)

    ... and US Department of Agriculture Dietary Supplement Ingredient Database Toggle navigation Menu Home About DSID Mission Current ... values can be saved to build a small database or add to an existing database for national, ...

  4. Energy Consumption Database

    Science.gov (United States)

    Consumption Database The California Energy Commission has created this on-line database for informal reporting ) classifications. The database also provides easy downloading of energy consumption data into Microsoft Excel (XLSX

  5. YMDB: the Yeast Metabolome Database

    Science.gov (United States)

    Jewison, Timothy; Knox, Craig; Neveu, Vanessa; Djoumbou, Yannick; Guo, An Chi; Lee, Jacqueline; Liu, Philip; Mandal, Rupasri; Krishnamurthy, Ram; Sinelnikov, Igor; Wilson, Michael; Wishart, David S.

    2012-01-01

    The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry. PMID:22064855

  6. The MAR databases: development and implementation of databases specific for marine metagenomics.

    Science.gov (United States)

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

    2018-01-04

    We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Collecting Taxes Database

    Data.gov (United States)

    US Agency for International Development — The Collecting Taxes Database contains performance and structural indicators about national tax systems. The database contains quantitative revenue performance...

  8. USAID Anticorruption Projects Database

    Data.gov (United States)

    US Agency for International Development — The Anticorruption Projects Database (Database) includes information about USAID projects with anticorruption interventions implemented worldwide between 2007 and...

  9. NoSQL databases

    OpenAIRE

    Mrozek, Jakub

    2012-01-01

    This thesis deals with database systems referred to as NoSQL databases. In the second chapter, I explain basic terms and the theory of database systems. A short explanation is dedicated to database systems based on the relational data model and the SQL standardized query language. Chapter Three explains the concept and history of the NoSQL databases, and also presents database models, major features and the use of NoSQL databases in comparison with traditional database systems. In the fourth ...

  10. Integrating Variances into an Analytical Database

    Science.gov (United States)

    Sanchez, Carlos

    2010-01-01

    For this project, I enrolled in numerous SATERN courses that taught the basics of database programming. These include: Basic Access 2007 Forms, Introduction to Database Systems, Overview of Database Design, and others. My main job was to create an analytical database that can handle many stored forms and make it easy to interpret and organize. Additionally, I helped improve an existing database and populate it with information. These databases were designed to be used with data from Safety Variances and DCR forms. The research consisted of analyzing the database and comparing the data to find out which entries were repeated the most. If an entry happened to be repeated several times in the database, that would mean that the rule or requirement targeted by that variance has been bypassed many times already and so the requirement may not really be needed, but rather should be changed to allow the variance's conditions permanently. This project did not only restrict itself to the design and development of the database system, but also worked on exporting the data from the database to a different format (e.g. Excel or Word) so it could be analyzed in a simpler fashion. Thanks to the change in format, the data was organized in a spreadsheet that made it possible to sort the data by categories or types and helped speed up searches. Once my work with the database was done, the records of variances could be arranged so that they were displayed in numerical order, or one could search for a specific document targeted by the variances and restrict the search to only include variances that modified a specific requirement. A great part that contributed to my learning was SATERN, NASA's resource for education. Thanks to the SATERN online courses I took over the summer, I was able to learn many new things about computers and databases and also go more in depth into topics I already knew about.

  11. Overview of national bird population monitoring programs and databases

    Science.gov (United States)

    Gregory S. Butcher; Bruce Peterjohn; C. John Ralph

    1993-01-01

    A number of programs have been set up to monitor populations of nongame migratory birds. We review these programs and their purposes and provide information on obtaining data or results from these programs. In addition, we review recommendations for improving these programs.

  12. BIOSPIDA: A Relational Database Translator for NCBI.

    Science.gov (United States)

    Hagen, Matthew S; Lee, Eva K

    2010-11-13

    As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.

  13. Mapping data - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...tional Rice Genome Sequencing Project (IRGSP) Data file File name: kome_mapping_data.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...(Transcriptional Unit) About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Mapping data - KOME | LSDB Archive ...

  14. PrimateLit Database

    Science.gov (United States)

    Primate Info Net Related Databases NCRR PrimateLit: A bibliographic database for primatology Top of any problems with this service. We welcome your feedback. The PrimateLit database is no longer being Resources, National Institutes of Health. The database is a collaborative project of the Wisconsin Primate

  15. Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database.

    Science.gov (United States)

    Badisco, Liesbeth; Huybrechts, Jurgen; Simonet, Gert; Verlinden, Heleen; Marchal, Elisabeth; Huybrechts, Roger; Schoofs, Liliane; De Loof, Arnold; Vanden Broeck, Jozef

    2011-03-21

    The desert locust (Schistocerca gregaria) displays a fascinating type of phenotypic plasticity, designated as 'phase polyphenism'. Depending on environmental conditions, one genome can be translated into two highly divergent phenotypes, termed the solitarious and gregarious (swarming) phase. Although many of the underlying molecular events remain elusive, the central nervous system (CNS) is expected to play a crucial role in the phase transition process. Locusts have also proven to be interesting model organisms in a physiological and neurobiological research context. However, molecular studies in locusts are hampered by the fact that genome/transcriptome sequence information available for this branch of insects is still limited. We have generated 34,672 raw expressed sequence tags (EST) from the CNS of desert locusts in both phases. These ESTs were assembled in 12,709 unique transcript sequences and nearly 4,000 sequences were functionally annotated. Moreover, the obtained S. gregaria EST information is highly complementary to the existing orthopteran transcriptomic data. Since many novel transcripts encode neuronal signaling and signal transduction components, this paper includes an overview of these sequences. Furthermore, several transcripts being differentially represented in solitarious and gregarious locusts were retrieved from this EST database. The findings highlight the involvement of the CNS in the phase transition process and indicate that this novel annotated database may also add to the emerging knowledge of concomitant neuronal signaling and neuroplasticity events. In summary, we met the need for novel sequence data from desert locust CNS. To our knowledge, we hereby also present the first insect EST database that is derived from the complete CNS. The obtained S. gregaria EST data constitute an important new source of information that will be instrumental in further unraveling the molecular principles of phase polyphenism, in further establishing

  16. Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database.

    Directory of Open Access Journals (Sweden)

    Liesbeth Badisco

    Full Text Available BACKGROUND: The desert locust (Schistocerca gregaria displays a fascinating type of phenotypic plasticity, designated as 'phase polyphenism'. Depending on environmental conditions, one genome can be translated into two highly divergent phenotypes, termed the solitarious and gregarious (swarming phase. Although many of the underlying molecular events remain elusive, the central nervous system (CNS is expected to play a crucial role in the phase transition process. Locusts have also proven to be interesting model organisms in a physiological and neurobiological research context. However, molecular studies in locusts are hampered by the fact that genome/transcriptome sequence information available for this branch of insects is still limited. METHODOLOGY: We have generated 34,672 raw expressed sequence tags (EST from the CNS of desert locusts in both phases. These ESTs were assembled in 12,709 unique transcript sequences and nearly 4,000 sequences were functionally annotated. Moreover, the obtained S. gregaria EST information is highly complementary to the existing orthopteran transcriptomic data. Since many novel transcripts encode neuronal signaling and signal transduction components, this paper includes an overview of these sequences. Furthermore, several transcripts being differentially represented in solitarious and gregarious locusts were retrieved from this EST database. The findings highlight the involvement of the CNS in the phase transition process and indicate that this novel annotated database may also add to the emerging knowledge of concomitant neuronal signaling and neuroplasticity events. CONCLUSIONS: In summary, we met the need for novel sequence data from desert locust CNS. To our knowledge, we hereby also present the first insect EST database that is derived from the complete CNS. The obtained S. gregaria EST data constitute an important new source of information that will be instrumental in further unraveling the molecular

  17. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  18. Logical database design principles

    CERN Document Server

    Garmany, John; Clark, Terry

    2005-01-01

    INTRODUCTION TO LOGICAL DATABASE DESIGNUnderstanding a Database Database Architectures Relational Databases Creating the Database System Development Life Cycle (SDLC)Systems Planning: Assessment and Feasibility System Analysis: RequirementsSystem Analysis: Requirements Checklist Models Tracking and Schedules Design Modeling Functional Decomposition DiagramData Flow Diagrams Data Dictionary Logical Structures and Decision Trees System Design: LogicalSYSTEM DESIGN AND IMPLEMENTATION The ER ApproachEntities and Entity Types Attribute Domains AttributesSet-Valued AttributesWeak Entities Constraint

  19. An Interoperable Cartographic Database

    OpenAIRE

    Slobodanka Ključanin; Zdravko Galić

    2007-01-01

    The concept of producing a prototype of interoperable cartographic database is explored in this paper, including the possibilities of integration of different geospatial data into the database management system and their visualization on the Internet. The implementation includes vectorization of the concept of a single map page, creation of the cartographic database in an object-relation database, spatial analysis, definition and visualization of the database content in the form of a map on t...

  20. Software listing: CHEMTOX database

    International Nuclear Information System (INIS)

    Moskowitz, P.D.

    1993-01-01

    Initially launched in 1983, the CHEMTOX Database was among the first microcomputer databases containing hazardous chemical information. The database is used in many industries and government agencies in more than 17 countries. Updated quarterly, the CHEMTOX Database provides detailed environmental and safety information on 7500-plus hazardous substances covered by dozens of regulatory and advisory sources. This brief listing describes the method of accessing data and provides ordering information for those wishing to obtain the CHEMTOX Database

  1. INIS: Database manual

    International Nuclear Information System (INIS)

    2003-01-01

    This document is one in a series of publications known as the INIS Reference Series. It is intended for users of INIS (International Nuclear Information System) output data on various media (FTP file, CD-ROM, e-mail file, earlier magnetic tape, cartridge, etc.). This manual provides a description of each data element including information on contents, structure and usage as well as historical overview of additions, deletions and changes of data elements and their contents that have taken place over the years. Each record contains certain control data fields (001-009), one, two or three bibliographic levels, a set of descriptors, and zero, one or more abstracts, one in English and optionally one or more in another language. In order to facilitate the description of the system, the sequence of data elements is based on the input or, as it is internally called, worksheet format which differs from the exchange format described in the manual IAEA-INIS-7. A separate section is devoted to each data element and deviations from the exchange format are indicated whenever present. As the Record Leader and the Directory are sufficiently explained in Chapter 3.1 of IAEA-INIS-7, the contents of this manual are limited to control fields and data fields; the detailed explanations are intended to supplement the basic information given in Chapter 3.2 of IAEA-INIS-7. Bibliographic levels are used to identify component parts of a publication, i.e. chapters in a book, articles in a journal issue, conference papers in a proceedings volume. All bibliographic levels contained in a record are given in a control data field. Each bibliographic level identifier appears in the subdirectory with a pointer to its position in the record

  2. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  3. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  4. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  5. IAEA nuclear databases for applications

    International Nuclear Information System (INIS)

    Schwerer, Otto

    2003-01-01

    The Nuclear Data Section (NDS) of the International Atomic Energy Agency (IAEA) provides nuclear data services to scientists on a worldwide scale with particular emphasis on developing countries. More than 100 data libraries are made available cost-free by Internet, CD-ROM and other media. These databases are used for practically all areas of nuclear applications as well as basic research. An overview is given of the most important nuclear reaction and nuclear structure databases, such as EXFOR, CINDA, ENDF, NSR, ENSDF, NUDAT, and of selected special purpose libraries such as FENDL, RIPL, RNAL, the IAEA Photonuclear Data Library, and the IAEA charged-particle cross section database for medical radioisotope production. The NDS also coordinates two international nuclear data centre networks and is involved in data development activities (to create new or improve existing data libraries when the available data are inadequate) and in technology transfer to developing countries, e.g. through the installation and support of the mirror web site of the IAEA Nuclear Data Services at IPEN (operational since March 2000) and by organizing nuclear-data related workshops. By encouraging their participation in IAEA Co-ordinated Research Projects and also by compiling their experimental results in databases such as EXFOR, the NDS helps to make developing countries' contributions to nuclear science visible and conveniently available. The web address of the IAEA Nuclear Data Services is http://www.nds.iaea.org and the NDS mirror service at IPEN (Brasil) can be accessed at http://www.nds.ipen.br/ (author)

  6. Chronic Diseases Overview

    Science.gov (United States)

    ... Plan Templates All Chronic Surveillance Systems Communications Center Social Media Press Room Press Release Archives Multimedia Communication Campaigns Publications Chronic Disease Overview 2016–2017 At A ...

  7. BioWarehouse: a bioinformatics database warehouse toolkit

    Directory of Open Access Journals (Sweden)

    Stringer-Calvert David WJ

    2006-03-01

    Full Text Available Abstract Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the

  8. BioWarehouse: a bioinformatics database warehouse toolkit.

    Science.gov (United States)

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  9. Database Description - PSCDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name PSCDB Alternative n...rial Science and Technology (AIST) Takayuki Amemiya E-mail: Database classification Structure Databases - Protein structure Database...554-D558. External Links: Original website information Database maintenance site Graduate School of Informat...available URL of Web services - Need for user registration Not available About This Database Database Descri...ption Download License Update History of This Database Site Policy | Contact Us Database Description - PSCDB | LSDB Archive ...

  10. Directory of IAEA databases

    International Nuclear Information System (INIS)

    1991-11-01

    The first edition of the Directory of IAEA Databases is intended to describe the computerized information sources available to IAEA staff members. It contains a listing of all databases produced at the IAEA, together with information on their availability

  11. Native Health Research Database

    Science.gov (United States)

    ... Indian Health Board) Welcome to the Native Health Database. Please enter your search terms. Basic Search Advanced ... To learn more about searching the Native Health Database, click here. Tutorial Video The NHD has made ...

  12. Cell Centred Database (CCDB)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Cell Centered Database (CCDB) is a web accessible database for high resolution 2D, 3D and 4D data from light and electron microscopy, including correlated imaging.

  13. E3 Staff Database

    Data.gov (United States)

    US Agency for International Development — E3 Staff database is maintained by E3 PDMS (Professional Development & Management Services) office. The database is Mysql. It is manually updated by E3 staff as...

  14. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  15. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  16. Creating databases for biological information: an introduction.

    Science.gov (United States)

    Stein, Lincoln

    2013-06-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.

  17. NIRS database of the original research database

    International Nuclear Information System (INIS)

    Morita, Kyoko

    1991-01-01

    Recently, library staffs arranged and compiled the original research papers that have been written by researchers for 33 years since National Institute of Radiological Sciences (NIRS) established. This papers describes how the internal database of original research papers has been created. This is a small sample of hand-made database. This has been cumulating by staffs who have any knowledge about computer machine or computer programming. (author)

  18. Scopus database: a review.

    Science.gov (United States)

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  19. Aviation Safety Issues Database

    Science.gov (United States)

    Morello, Samuel A.; Ricks, Wendell R.

    2009-01-01

    The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.

  20. Automated Oracle database testing

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Ensuring database stability and steady performance in the modern world of agile computing is a major challenge. Various changes happening at any level of the computing infrastructure: OS parameters & packages, kernel versions, database parameters & patches, or even schema changes, all can potentially harm production services. This presentation shows how an automatic and regular testing of Oracle databases can be achieved in such agile environment.

  1. Inleiding database-systemen

    NARCIS (Netherlands)

    Pels, H.J.; Lans, van der R.F.; Pels, H.J.; Meersman, R.A.

    1993-01-01

    Dit artikel introduceert de voornaamste begrippen die een rol spelen rond databases en het geeft een overzicht van de doelstellingen, de functies en de componenten van database-systemen. Hoewel de functie van een database intuitief vrij duidelijk is, is het toch een in technologisch opzicht complex

  2. Database Description - RMOS | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name RMOS Alternative nam...arch Unit Shoshi Kikuchi E-mail : Database classification Plant databases - Rice Microarray Data and other Gene Expression Database...s Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description The Ric...19&lang=en Whole data download - Referenced database Rice Expression Database (RED) Rice full-length cDNA Database... (KOME) Rice Genome Integrated Map Database (INE) Rice Mutant Panel Database (Tos17) Rice Genome Annotation Database

  3. The World Bacterial Biogeography and Biodiversity through Databases: A Case Study of NCBI Nucleotide Database and GBIF Database

    Directory of Open Access Journals (Sweden)

    Okba Selama

    2013-01-01

    Full Text Available Databases are an essential tool and resource within the field of bioinformatics. The primary aim of this study was to generate an overview of global bacterial biodiversity and biogeography using available data from the two largest public online databases, NCBI Nucleotide and GBIF. The secondary aim was to highlight the contribution each geographic area has to each database. The basis for data analysis of this study was the metadata provided by both databases, mainly, the taxonomy and the geographical area origin of isolation of the microorganism (record. These were directly obtained from GBIF through the online interface, while E-utilities and Python were used in combination with a programmatic web service access to obtain data from the NCBI Nucleotide Database. Results indicate that the American continent, and more specifically the USA, is the top contributor, while Africa and Antarctica are less well represented. This highlights the imbalance of exploration within these areas rather than any reduction in biodiversity. This study describes a novel approach to generating global scale patterns of bacterial biodiversity and biogeography and indicates that the Proteobacteria are the most abundant and widely distributed phylum within both databases.

  4. An Interoperable Cartographic Database

    Directory of Open Access Journals (Sweden)

    Slobodanka Ključanin

    2007-05-01

    Full Text Available The concept of producing a prototype of interoperable cartographic database is explored in this paper, including the possibilities of integration of different geospatial data into the database management system and their visualization on the Internet. The implementation includes vectorization of the concept of a single map page, creation of the cartographic database in an object-relation database, spatial analysis, definition and visualization of the database content in the form of a map on the Internet. 

  5. Keyword Search in Databases

    CERN Document Server

    Yu, Jeffrey Xu; Chang, Lijun

    2009-01-01

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from

  6. Nuclear power economic database

    International Nuclear Information System (INIS)

    Ding Xiaoming; Li Lin; Zhao Shiping

    1996-01-01

    Nuclear power economic database (NPEDB), based on ORACLE V6.0, consists of three parts, i.e., economic data base of nuclear power station, economic data base of nuclear fuel cycle and economic database of nuclear power planning and nuclear environment. Economic database of nuclear power station includes data of general economics, technique, capital cost and benefit, etc. Economic database of nuclear fuel cycle includes data of technique and nuclear fuel price. Economic database of nuclear power planning and nuclear environment includes data of energy history, forecast, energy balance, electric power and energy facilities

  7. The Belle II VXD production database

    Energy Technology Data Exchange (ETDEWEB)

    Valentan, Manfred; Ritter, Martin [Max-Planck-Institut fuer Physik, Muenchen (Germany); Wuerkner, Benedikt; Leitl, Bernhard [Institut fuer Hochenergiephysik, Wien (Austria); Pilo, Federico [Istituto Nazionale di Fisica Nucleare, Pisa (Italy); Collaboration: Belle II-Collaboration

    2015-07-01

    The construction and commissioning of the Belle II Vertex Detector (VXD) is a huge endeavor involving a large number of valuable components. Both subsystems PXD (Pixel Detector) and SVD (Silicon Vertex Detector) deploy a large number of sensors, readout electronic parts and mechanical elements. These items are scattered around the world at many institutes, where they are built, measured and assembled. One has to keep track of measurement configurations and results, know at any time the location of the sensors, their processing state, quality, where they end up in an assembly, and who is responsible. These requirements call for a flexible and extensive database which is able to reflect the processes in the laboratories and the logistics between the institutes. This talk introduces the database requirements of a physics experiment using the PXD construction workflow as a showcase, and presents an overview of the database ''HephyDb'', which is used by the groups constructing the Belle II VXD.

  8. Panorama das pesquisas sobre TDIC e formação de professores de língua inglesa em LA: um levantamento bibliográfico a partir da base de dissertações/teses da CAPES Overview of the researches about DICT and English Teacher Education in Al: a bibliographical study based on CAPES database of theses and dissertations

    Directory of Open Access Journals (Sweden)

    Lucas Moreira dos Anjos Santos

    2013-03-01

    Full Text Available Esse trabalho tem por objetivo apresentar um panorama das pesquisas sobre Tecnologias Digitais de Informação e Comunicação (TDIC e formação de professores de língua inglesa desenvolvidas no âmbito da Linguística Aplicada no Brasil entre os anos de 2000 a 2009. A fim de atingir o objetivo, foi realizada uma busca bibliográfica no banco de teses e dissertações da CAPES a partir de termos chaves no campo assunto. Como resultados, apontamos uma ampla variedade de estudos que se voltam para tal área evidenciando diferentes focos: na transposição de políticas públicas, no uso de ferramentas digitais para formação (inicial e continuada de professores, na apropriação de ferramentas digitais pelo professor na sua prática pedagógica, nas crenças e representações construídas a partir do uso de ferramentas tecnológicas, dentre outros. Esperamos contribuir para um mapeamento da literatura já desenvolvida na área de formação de professores de língua inglesa e TDIC para a realização de futuras pesquisas.This paper aims at presenting an overview of the researches about Digital Information and Communication Technologies (DICT and English teacher education carried in Applied Linguistics in Brazil from 2000 to 2009. In order to do so, a bibliographical search was carried out at the CAPES database of theses and dissertations according to some key words related to this subject. The results point out a number of varied studies developed in this field with different focus such as: public policies and its implementation, use of digital tools in pre and in service teacher education, appropriation of digital tools by teachers into their work, believes and representations concerning the use of digital tools, among others. We hope to contribute to those who further wish to develop more studies under the theme English teacher education and DICT.

  9. Using the TIGR gene index databases for biological discovery.

    Science.gov (United States)

    Lee, Yuandan; Quackenbush, John

    2003-11-01

    The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

  10. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    Science.gov (United States)

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  11. NOMAD - more than a simple sequencer

    International Nuclear Information System (INIS)

    Mutti, P.; Cecillon, F.; Elaazzouzi, A.; Le Goc, Y.; Locatelli, J.; Ortiz, H.; Ratel, J.

    2012-01-01

    NOMAD is the new instrument control software of the Institut Laue-Langevin (ILL). A highly shareable code among all the instruments' suite, a user oriented design for tailored functionality and the improvement of the instrument team's autonomy thanks to a uniform and ergonomic user interface are the essential elements guiding the software development. NOMAD implements a client/server approach. The server is the core business containing all the instrument methods and the hardware drivers, while the GUI (Graphical User Interface) provides all the necessary functionalities for the interaction between user and hardware. All instruments share the same executable while a set of XML configuration files adapts hardware needs and instrument methods to the specific experimental setup. Thanks to a complete graphical representation of experimental sequences, NOMAD provides an overview of past, present and future operations. Users have the freedom to build their own specific work-flows using intuitive drag-and-drop technique. A complete drivers' database to connect and control all possible instrument components has been created, simplifying the inclusion of a new piece of equipment for an experiment. A web application makes available outside the ILL all the relevant information on the status of the experiment. A set of scientific methods facilitates the interaction between users and hardware giving access to instrument control and to complex operations within just one click on the interface. (authors)

  12. Database Description - RPD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RPD Alternative name Rice Proteome Database...titute of Crop Science, National Agriculture and Food Research Organization Setsuko Komatsu E-mail: Database... classification Proteomics Resources Plant databases - Rice Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database... description Rice Proteome Database contains information on protei...and entered in the Rice Proteome Database. The database is searchable by keyword,

  13. Database Description - JSNP | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name JSNP Alternative nam...n Science and Technology Agency Creator Affiliation: Contact address E-mail : Database...sapiens Taxonomy ID: 9606 Database description A database of about 197,000 polymorphisms in Japanese populat...1):605-610 External Links: Original website information Database maintenance site Institute of Medical Scien...er registration Not available About This Database Database Description Download License Update History of This Database

  14. Database Description - RED | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ase Description General information of database Database name RED Alternative name Rice Expression Database...enome Research Unit Shoshi Kikuchi E-mail : Database classification Plant databases - Rice Database classifi...cation Microarray, Gene Expression Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database descripti... Article title: Rice Expression Database: the gateway to rice functional genomics...nt Science (2002) Dec 7 (12):563-564 External Links: Original website information Database maintenance site

  15. Database Description - PLACE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available abase Description General information of database Database name PLACE Alternative name A Database...Kannondai, Tsukuba, Ibaraki 305-8602, Japan National Institute of Agrobiological Sciences E-mail : Databas...e classification Plant databases Organism Taxonomy Name: Tracheophyta Taxonomy ID: 58023 Database...99, Vol.27, No.1 :297-300 External Links: Original website information Database maintenance site National In...- Need for user registration Not available About This Database Database Descripti

  16. Database Description - Arabidopsis Phenome Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Arabidopsis Phenome Database Database Description General information of database Database n... BioResource Center Hiroshi Masuya Database classification Plant databases - Arabidopsis thaliana Organism T...axonomy Name: Arabidopsis thaliana Taxonomy ID: 3702 Database description The Arabidopsis thaliana phenome i...heir effective application. We developed the new Arabidopsis Phenome Database integrating two novel database...seful materials for their experimental research. The other, the “Database of Curated Plant Phenome” focusing

  17. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  18. The UCSC genome browser database: update 2007

    DEFF Research Database (Denmark)

    Kuhn, R M; Karolchik, D; Zweig, A S

    2006-01-01

    The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up t...

  19. FARME DB: a functional antibiotic resistance element database

    OpenAIRE

    Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.

    2017-01-01

    Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR se...

  20. Operational experience running the HERA-B database system

    International Nuclear Information System (INIS)

    Amaral, V.; Amorim, A.; Batista, J.

    2001-01-01

    The HERA-B database system has been used in the commissioning period of the experiment. The authors present the expertise gathered during this period, covering also the improvements introduced and describing the different classes of problems faced in giving persistency to all non-event information. The author aims to give a global overview of the Database group activities, techniques developed and results based on the running experiment and dealing with large Data Volumes during and after the production phase

  1. Integrated process status overview

    International Nuclear Information System (INIS)

    Gertman, D.I.; Gaudio, P. Jr.

    1986-01-01

    This report summarizes findings to date with the IPSO, a large plant status overview currently under development at the OECD Halden Reactor Project. As part of a joint Halden and Combustion Engineering project, the overview is being tested in part to determine whether the large screen overview concept being entertained for use in the nuclear power plant (NPP) industry will facilitate operator performance. To this end an interactive simulation technique was used to establish a proof-of-principle test for the IPSO. Process control, operations, and human factors experts at Halden participated in the test and evaluation

  2. TaxMan: a taxonomic database manager

    Directory of Open Access Journals (Sweden)

    Blaxter Mark

    2006-12-01

    Full Text Available Abstract Background Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes can yield more phylogenetic signal than individual genes. However, manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. Additionally, sequence curation, alignment and assessment of the results of phylogenetic analysis are made particularly difficult by the potential for a given gene in a given species to be unrepresented, or to be represented by multiple or partial sequences. We have developed a software package, TaxMan, that largely automates the processes of sequence acquisition, consensus building, alignment and taxon selection to facilitate this type of phylogenetic study. Results TaxMan uses freely available tools to allow rapid assembly, storage and analysis of large, aligned DNA and protein sequence datasets for user-defined sets of species and genes. The user provides GenBank format files and a list of gene names and synonyms for the loci to analyse. Sequences are extracted from the GenBank files on the basis of annotation and sequence similarity. Consensus sequences are built automatically. Alignment is carried out (where possible, at the protein level and aligned sequences are stored in a database. TaxMan can automatically determine the best subset of taxa to examine phylogeny at a given taxonomic level. By using the stored aligned sequences, large concatenated multiple sequence alignments can be generated rapidly for a subset and output in analysis-ready file formats. Trees resulting from phylogenetic analysis can be stored and compared with a reference taxonomy. Conclusion TaxMan allows rapid automated assembly of a multigene datasets of aligned sequences for large taxonomic groups. By extracting sequences on the basis of

  3. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  4. Hazard Analysis Database Report

    CERN Document Server

    Grams, W H

    2000-01-01

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for U S . Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for HNF-SD-WM-SAR-067, Tank Farms Final Safety Analysis Report (FSAR). The FSAR is part of the approved Authorization Basis (AB) for the River Protection Project (RPP). This document describes, identifies, and defines the contents and structure of the Tank Farms FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The Hazard Analysis Database supports the preparation of Chapters 3 ,4 , and 5 of the Tank Farms FSAR and the Unreviewed Safety Question (USQ) process and consists of two major, interrelated data sets: (1) Hazard Analysis Database: Data from t...

  5. Database Optimizing Services

    Directory of Open Access Journals (Sweden)

    Adrian GHENCEA

    2010-12-01

    Full Text Available Almost every organization has at its centre a database. The database provides support for conducting different activities, whether it is production, sales and marketing or internal operations. Every day, a database is accessed for help in strategic decisions. The satisfaction therefore of such needs is entailed with a high quality security and availability. Those needs can be realised using a DBMS (Database Management System which is, in fact, software for a database. Technically speaking, it is software which uses a standard method of cataloguing, recovery, and running different data queries. DBMS manages the input data, organizes it, and provides ways of modifying or extracting the data by its users or other programs. Managing the database is an operation that requires periodical updates, optimizing and monitoring.

  6. National Database of Geriatrics

    DEFF Research Database (Denmark)

    Kannegaard, Pia Nimann; Vinding, Kirsten L; Hare-Bruun, Helle

    2016-01-01

    AIM OF DATABASE: The aim of the National Database of Geriatrics is to monitor the quality of interdisciplinary diagnostics and treatment of patients admitted to a geriatric hospital unit. STUDY POPULATION: The database population consists of patients who were admitted to a geriatric hospital unit....... Geriatric patients cannot be defined by specific diagnoses. A geriatric patient is typically a frail multimorbid elderly patient with decreasing functional ability and social challenges. The database includes 14-15,000 admissions per year, and the database completeness has been stable at 90% during the past......, percentage of discharges with a rehabilitation plan, and the part of cases where an interdisciplinary conference has taken place. Data are recorded by doctors, nurses, and therapists in a database and linked to the Danish National Patient Register. DESCRIPTIVE DATA: Descriptive patient-related data include...

  7. Tradeoffs in distributed databases

    OpenAIRE

    Juntunen, R. (Risto)

    2016-01-01

    Abstract In a distributed database data is spread throughout the network into separated nodes with different DBMS systems (Date, 2000). According to CAP-theorem three database properties — consistency, availability and partition tolerance cannot be achieved simultaneously in distributed database systems. Two of these properties can be achieved but not all three at the same time (Brewer, 2000). Since this theorem there has b...

  8. Specialist Bibliographic Databases

    OpenAIRE

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A.; Trukhachev, Vladimir I.; Kostyukova, Elena I.; Gerasimov, Alexey N.; Kitas, George D.

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and d...

  9. Supply Chain Initiatives Database

    Energy Technology Data Exchange (ETDEWEB)

    None

    2012-11-01

    The Supply Chain Initiatives Database (SCID) presents innovative approaches to engaging industrial suppliers in efforts to save energy, increase productivity and improve environmental performance. This comprehensive and freely-accessible database was developed by the Institute for Industrial Productivity (IIP). IIP acknowledges Ecofys for their valuable contributions. The database contains case studies searchable according to the types of activities buyers are undertaking to motivate suppliers, target sector, organization leading the initiative, and program or partnership linkages.

  10. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  11. Nevada Operations overview

    International Nuclear Information System (INIS)

    Church, B.W.

    1981-01-01

    A brief overview is given of weapon test site decontamination activities carried out by Nevada Operations Office. Tabulated data is given of event name, date, location, year of cleanup, and radioisotopes that were present, activity levels, and cost of cleanup

  12. EURO HAWK Project Overview

    National Research Council Canada - National Science Library

    2003-01-01

    Briefing charts from presentation on a EURO HAWK project overview; an airborne system with stand-off capability for wide-area intelligence, surveillance and reconnaissance meeting European NATO countries' ISR requirements...

  13. Human Reliability Program Overview

    Energy Technology Data Exchange (ETDEWEB)

    Bodin, Michael

    2012-09-25

    This presentation covers the high points of the Human Reliability Program, including certification/decertification, critical positions, due process, organizational structure, program components, personnel security, an overview of the US DOE reliability program, retirees and academia, and security program integration.

  14. Vehicle Technologies Program Overview

    Energy Technology Data Exchange (ETDEWEB)

    none,

    2006-09-05

    Overview of the Vehicle Technologies Program including external assessment and market view; internal assessment, program history and progress; program justification and federal role; program vision, mission, approach, strategic goals, outputs, and outcomes; and performance goals.

  15. Research Program Overview

    Science.gov (United States)

    PEER logo Pacific Earthquake Engineering Research Center home about peer news events research products laboratories publications nisee b.i.p. members education FAQs links research Research Program Overview Tall Buildings Initiative Transportation Research Program Lifelines Program Concrete Grand

  16. Overview of Movement Disorders

    Science.gov (United States)

    ... of Delirium Additional Content Medical News Overview of Movement Disorders By Hector A. Gonzalez-Usigli, MD, Professor ... Neurology, HE UMAE Centro Médico Nacional de Occidente; Movement Disorders Clinic, Neurology at IMSS Alberto Espay, MD, ...

  17. Chemical Emergencies Overview

    Science.gov (United States)

    ... Address What's this? Submit What's this? Submit Button Chemical Emergencies Overview Recommend on Facebook Tweet Share Compartir ... themselves during and after such an event. What chemical emergencies are A chemical emergency occurs when a ...

  18. Wind energy program overview

    International Nuclear Information System (INIS)

    1992-02-01

    This overview emphasizes the amount of electric power that could be provided by wind power rather than traditional fossil fuels. New wind power markets, advances in technology, technology transfer, and wind resources are some topics covered in this publication

  19. HAMMLAB 2000 overview

    International Nuclear Information System (INIS)

    Kvalem, J.

    1998-01-01

    In the form of a collection of overheads, this is an overview of HAMMLAB 2000. It also covers project organization, advisory groups, research agenda, status of simulators, status of software systems, VR Centre, Petro Hammlab and physical planning

  20. Database Description - RGP physicalmap | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available classification Plant databases - Rice Database classification Sequence Physical map Organism Taxonomy Name: ...inobe Journal: Nature Genetics (1994) 8: 365-372. External Links: Article title: Physical Mapping of Rice Ch...rnal: DNA Research (1997) 4(2): 133-140. External Links: Article title: Physical Mapping of Rice Chromosomes... T Sasaki Journal: Genome Research (1996) 6(10): 935-942. External Links: Article title: Physical mapping of

  1. Database Description - SAHG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name SAHG Alternative nam...h: Contact address Chie Motono Tel : +81-3-3599-8067 E-mail : Database classification Structure Databases - ...e databases - Protein properties Organism Taxonomy Name: Homo sapiens Taxonomy ID: 9606 Database description... Links: Original website information Database maintenance site The Molecular Profiling Research Center for D...stration Not available About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Database Description - SAHG | LSDB Archive ...

  2. Intermodal Passenger Connectivity Database -

    Data.gov (United States)

    Department of Transportation — The Intermodal Passenger Connectivity Database (IPCD) is a nationwide data table of passenger transportation terminals, with data on the availability of connections...

  3. Transporter Classification Database (TCDB)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Transporter Classification Database details a comprehensive classification system for membrane transport proteins known as the Transporter Classification (TC)...

  4. Residency Allocation Database

    Data.gov (United States)

    Department of Veterans Affairs — The Residency Allocation Database is used to determine allocation of funds for residency programs offered by Veterans Affairs Medical Centers (VAMCs). Information...

  5. Smart Location Database - Service

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Smart Location Database (SLD) summarizes over 80 demographic, built environment, transit service, and destination accessibility attributes for every census block...

  6. Veterans Administration Databases

    Science.gov (United States)

    The Veterans Administration Information Resource Center provides database and informatics experts, customer service, expert advice, information products, and web technology to VA researchers and others.

  7. IVR EFP Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This database contains trip-level reports submitted by vessels participating in Exempted Fishery projects with IVR reporting requirements.

  8. Towards Sensor Database Systems

    DEFF Research Database (Denmark)

    Bonnet, Philippe; Gehrke, Johannes; Seshadri, Praveen

    2001-01-01

    . These systems lack flexibility because data is extracted in a predefined way; also, they do not scale to a large number of devices because large volumes of raw data are transferred regardless of the queries that are submitted. In our new concept of sensor database system, queries dictate which data is extracted...... from the sensors. In this paper, we define the concept of sensor databases mixing stored data represented as relations and sensor data represented as time series. Each long-running query formulated over a sensor database defines a persistent view, which is maintained during a given time interval. We...... also describe the design and implementation of the COUGAR sensor database system....

  9. Database Publication Practices

    DEFF Research Database (Denmark)

    Bernstein, P.A.; DeWitt, D.; Heuer, A.

    2005-01-01

    There has been a growing interest in improving the publication processes for database research papers. This panel reports on recent changes in those processes and presents an initial cut at historical data for the VLDB Journal and ACM Transactions on Database Systems.......There has been a growing interest in improving the publication processes for database research papers. This panel reports on recent changes in those processes and presents an initial cut at historical data for the VLDB Journal and ACM Transactions on Database Systems....

  10. Smart Location Database - Download

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Smart Location Database (SLD) summarizes over 80 demographic, built environment, transit service, and destination accessibility attributes for every census block...

  11. Database Description - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name KOME Alternative nam... Sciences Plant Genome Research Unit Shoshi Kikuchi E-mail : Database classification Plant databases - Rice ...Organism Taxonomy Name: Oryza sativa Taxonomy ID: 4530 Database description Information about approximately ...Hayashizaki Y, Kikuchi S. Journal: PLoS One. 2007 Nov 28; 2(11):e1235. External Links: Original website information Database...OS) Rice mutant panel database (Tos17) A Database of Plant Cis-acting Regulatory

  12. Update History of This Database - Arabidopsis Phenome Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Arabidopsis Phenome Database Update History of This Database Date Update contents 2017/02/27 Arabidopsis Phenome Data...base English archive site is opened. - Arabidopsis Phenome Database (http://jphenom...e.info/?page_id=95) is opened. About This Database Database Description Download License Update History of This Database... Site Policy | Contact Us Update History of This Database - Arabidopsis Phenome Database | LSDB Archive ...

  13. Update History of This Database - SKIP Stemcell Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us SKIP Stemcell Database Update History of This Database Date Update contents 2017/03/13 SKIP Stemcell Database... English archive site is opened. 2013/03/29 SKIP Stemcell Database ( https://www.skip.med.k...eio.ac.jp/SKIPSearch/top?lang=en ) is opened. About This Database Database Description Download License Update History of This Databa...se Site Policy | Contact Us Update History of This Database - SKIP Stemcell Database | LSDB Archive ...

  14. Update History of This Database - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Yeast Interacting Proteins Database Update History of This Database Date Update contents 201...0/03/29 Yeast Interacting Proteins Database English archive site is opened. 2000/12/4 Yeast Interacting Proteins Database...( http://itolab.cb.k.u-tokyo.ac.jp/Y2H/ ) is released. About This Database Database Description... Download License Update History of This Database Site Policy | Contact Us Update History of This Database... - Yeast Interacting Proteins Database | LSDB Archive ...

  15. Download - Trypanosomes Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Trypanosomes Database Download First of all, please read the license of this database. Data ...1.4 KB) Simple search and download Downlaod via FTP FTP server is sometimes jammed. If it is, access [here]. About This Database Data...base Description Download License Update History of This Database Site Policy | Contact Us Download - Trypanosomes Database | LSDB Archive ...

  16. Database design and database administration for a kindergarten

    OpenAIRE

    Vítek, Daniel

    2009-01-01

    The bachelor thesis deals with creation of database design for a standard kindergarten, installation of the designed database into the database system Oracle Database 10g Express Edition and demonstration of the administration tasks in this database system. The verification of the database was proved by a developed access application.

  17. FCDD: A Database for Fruit Crops Diseases.

    Science.gov (United States)

    Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal

    2014-01-01

    Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/

  18. SoyDB: a knowledge database of soybean transcription factors

    Directory of Open Access Journals (Sweden)

    Valliyodan Babu

    2010-01-01

    Full Text Available Abstract Background Transcription factors play the crucial rule of regulating gene expression and influence almost all biological processes. Systematically identifying and annotating transcription factors can greatly aid further understanding their functions and mechanisms. In this article, we present SoyDB, a user friendly database containing comprehensive knowledge of soybean transcription factors. Description The soybean genome was recently sequenced by the Department of Energy-Joint Genome Institute (DOE-JGI and is publicly available. Mining of this sequence identified 5,671 soybean genes as putative transcription factors. These genes were comprehensively annotated as an aid to the soybean research community. We developed SoyDB - a knowledge database for all the transcription factors in the soybean genome. The database contains protein sequences, predicted tertiary structures, putative DNA binding sites, domains, homologous templates in the Protein Data Bank (PDB, protein family classifications, multiple sequence alignments, consensus protein sequence motifs, web logo of each family, and web links to the soybean transcription factor database PlantTFDB, known EST sequences, and other general protein databases including Swiss-Prot, Gene Ontology, KEGG, EMBL, TAIR, InterPro, SMART, PROSITE, NCBI, and Pfam. The database can be accessed via an interactive and convenient web server, which supports full-text search, PSI-BLAST sequence search, database browsing by protein family, and automatic classification of a new protein sequence into one of 64 annotated transcription factor families by hidden Markov models. Conclusions A comprehensive soybean transcription factor database was constructed and made publicly accessible at http://casp.rnet.missouri.edu/soydb/.

  19. Profiling of Escherichia coli Chromosome database.

    Science.gov (United States)

    Yamazaki, Yukiko; Niki, Hironori; Kato, Jun-ichi

    2008-01-01

    The Profiling of Escherichia coli Chromosome (PEC) database (http://www.shigen.nig.ac.jp/ecoli/pec/) is designed to allow E. coli researchers to efficiently access information from functional genomics studies. The database contains two principal types of data: gene essentiality and a large collection of E. coli genetic research resources. The essentiality data are based on data compilation from published single-gene essentiality studies and on cell growth studies of large-deletion mutants. Using the circular and linear viewers for both whole genomes and the minimal genome, users can not only gain an overview of the genome structure but also retrieve information on contigs, gene products, mutants, deletions, and so forth. In particular, genome-wide exhaustive mutants are an essential resource for studying E. coli gene functions. Although the genomic database was constructed independently from the genetic resources database, users may seamlessly access both types of data. In addition to these data, the PEC database also provides a summary of homologous genes of other bacterial genomes and of protein structure information, with a comprehensive interface. The PEC is thus a convenient and useful platform for contemporary E. coli researchers.

  20. Sequencing results of pncA gene at JALMA

    Indian Academy of Sciences (India)

    First page Back Continue Last page Overview Graphics. Sequencing results of pncA gene at JALMA. Red colour indicates novel mutations, Blue colour indicates the novel mutations reported at the same codon earlier also.

  1. Directory of IAEA databases

    International Nuclear Information System (INIS)

    1992-12-01

    This second edition of the Directory of IAEA Databases has been prepared within the Division of Scientific and Technical Information (NESI). Its main objective is to describe the computerized information sources available to staff members. This directory contains all databases produced at the IAEA, including databases stored on the mainframe, LAN's and PC's. All IAEA Division Directors have been requested to register the existence of their databases with NESI. For the second edition database owners were requested to review the existing entries for their databases and answer four additional questions. The four additional questions concerned the type of database (e.g. Bibliographic, Text, Statistical etc.), the category of database (e.g. Administrative, Nuclear Data etc.), the available documentation and the type of media used for distribution. In the individual entries on the following pages the answers to the first two questions (type and category) is always listed, but the answers to the second two questions (documentation and media) is only listed when information has been made available

  2. HIV Structural Database

    Science.gov (United States)

    SRD 102 HIV Structural Database (Web, free access)   The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.

  3. Balkan Vegetation Database

    NARCIS (Netherlands)

    Vassilev, Kiril; Pedashenko, Hristo; Alexandrova, Alexandra; Tashev, Alexandar; Ganeva, Anna; Gavrilova, Anna; Gradevska, Asya; Assenov, Assen; Vitkova, Antonina; Grigorov, Borislav; Gussev, Chavdar; Filipova, Eva; Aneva, Ina; Knollová, Ilona; Nikolov, Ivaylo; Georgiev, Georgi; Gogushev, Georgi; Tinchev, Georgi; Pachedjieva, Kalina; Koev, Koycho; Lyubenova, Mariyana; Dimitrov, Marius; Apostolova-Stoyanova, Nadezhda; Velev, Nikolay; Zhelev, Petar; Glogov, Plamen; Natcheva, Rayna; Tzonev, Rossen; Boch, Steffen; Hennekens, Stephan M.; Georgiev, Stoyan; Stoyanov, Stoyan; Karakiev, Todor; Kalníková, Veronika; Shivarov, Veselin; Russakova, Veska; Vulchev, Vladimir

    2016-01-01

    The Balkan Vegetation Database (BVD; GIVD ID: EU-00-019; http://www.givd.info/ID/EU-00- 019) is a regional database that consists of phytosociological relevés from different vegetation types from six countries on the Balkan Peninsula (Albania, Bosnia and Herzegovina, Bulgaria, Kosovo, Montenegro

  4. World Database of Happiness

    NARCIS (Netherlands)

    R. Veenhoven (Ruut)

    1995-01-01

    textabstractABSTRACT The World Database of Happiness is an ongoing register of research on subjective appreciation of life. Its purpose is to make the wealth of scattered findings accessible, and to create a basis for further meta-analytic studies. The database involves four sections:
    1.

  5. Fire test database

    International Nuclear Information System (INIS)

    Lee, J.A.

    1989-01-01

    This paper describes a project recently completed for EPRI by Impell. The purpose of the project was to develop a reference database of fire tests performed on non-typical fire rated assemblies. The database is designed for use by utility fire protection engineers to locate test reports for power plant fire rated assemblies. As utilities prepare to respond to Information Notice 88-04, the database will identify utilities, vendors or manufacturers who have specific fire test data. The database contains fire test report summaries for 729 tested configurations. For each summary, a contact is identified from whom a copy of the complete fire test report can be obtained. Five types of configurations are included: doors, dampers, seals, wraps and walls. The database is computerized. One version for IBM; one for Mac. Each database is accessed through user-friendly software which allows adding, deleting, browsing, etc. through the database. There are five major database files. One each for the five types of tested configurations. The contents of each provides significant information regarding the test method and the physical attributes of the tested configuration. 3 figs

  6. Children's Culture Database (CCD)

    DEFF Research Database (Denmark)

    Wanting, Birgit

    a Dialogue inspired database with documentation, network (individual and institutional profiles) and current news , paper presented at the research seminar: Electronic access to fiction, Copenhagen, November 11-13, 1996......a Dialogue inspired database with documentation, network (individual and institutional profiles) and current news , paper presented at the research seminar: Electronic access to fiction, Copenhagen, November 11-13, 1996...

  7. Atomic Spectra Database (ASD)

    Science.gov (United States)

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  8. Consumer Product Category Database

    Science.gov (United States)

    The Chemical and Product Categories database (CPCat) catalogs the use of over 40,000 chemicals and their presence in different consumer products. The chemical use information is compiled from multiple sources while product information is gathered from publicly available Material Safety Data Sheets (MSDS). EPA researchers are evaluating the possibility of expanding the database with additional product and use information.

  9. Database in Artificial Intelligence.

    Science.gov (United States)

    Wilkinson, Julia

    1986-01-01

    Describes a specialist bibliographic database of literature in the field of artificial intelligence created by the Turing Institute (Glasgow, Scotland) using the BRS/Search information retrieval software. The subscription method for end-users--i.e., annual fee entitles user to unlimited access to database, document provision, and printed awareness…

  10. NoSQL database scaling

    OpenAIRE

    Žardin, Norbert

    2017-01-01

    NoSQL database scaling is a decision, where system resources or financial expenses are traded for database performance or other benefits. By scaling a database, database performance and resource usage might increase or decrease, such changes might have a negative impact on an application that uses the database. In this work it is analyzed how database scaling affect database resource usage and performance. As a results, calculations are acquired, using which database scaling types and differe...

  11. The LHCb configuration database

    CERN Document Server

    Abadie, L; Van Herwijnen, Eric; Jacobsson, R; Jost, B; Neufeld, N

    2005-01-01

    The aim of the LHCb configuration database is to store information about all the controllable devices of the detector. The experiment's control system (that uses PVSS ) will configure, start up and monitor the detector from the information in the configuration database. The database will contain devices with their properties, connectivity and hierarchy. The ability to store and rapidly retrieve huge amounts of data, and the navigability between devices are important requirements. We have collected use cases to ensure the completeness of the design. Using the entity relationship modelling technique we describe the use cases as classes with attributes and links. We designed the schema for the tables using relational diagrams. This methodology has been applied to the TFC (switches) and DAQ system. Other parts of the detector will follow later. The database has been implemented using Oracle to benefit from central CERN database support. The project also foresees the creation of tools to populate, maintain, and co...

  12. Content Is King: Databases Preserve the Collective Information of Science.

    Science.gov (United States)

    Yates, John R

    2018-04-01

    Databases store sequence information experimentally gathered to create resources that further science. In the last 20 years databases have become critical components of fields like proteomics where they provide the basis for large-scale and high-throughput proteomic informatics. Amos Bairoch, winner of the Association of Biomolecular Resource Facilities Frederick Sanger Award, has created some of the important databases proteomic research depends upon for accurate interpretation of data.

  13. Database Description - SKIP Stemcell Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us SKIP Stemcell Database Database Description General information of database Database name SKIP Stemcell Database...rsity Journal Search: Contact address http://www.skip.med.keio.ac.jp/en/contact/ Database classification Human Genes and Diseases Dat...abase classification Stemcell Article Organism Taxonomy Name: Homo sapiens Taxonomy ID: 9606 Database...ks: Original website information Database maintenance site Center for Medical Genetics, School of medicine, ...lable Web services Not available URL of Web services - Need for user registration Not available About This Database Database

  14. INE: a rice genome database with an integrated map view.

    Science.gov (United States)

    Sakata, K; Antonio, B A; Mukai, Y; Nagasaki, H; Sakai, Y; Makino, K; Sasaki, T

    2000-01-01

    The Rice Genome Research Program (RGP) launched a large-scale rice genome sequencing in 1998 aimed at decoding all genetic information in rice. A new genome database called INE (INtegrated rice genome Explorer) has been developed in order to integrate all the genomic information that has been accumulated so far and to correlate these data with the genome sequence. A web interface based on Java applet provides a rapid viewing capability in the database. The first operational version of the database has been completed which includes a genetic map, a physical map using YAC (Yeast Artificial Chromosome) clones and PAC (P1-derived Artificial Chromosome) contigs. These maps are displayed graphically so that the positional relationships among the mapped markers on each chromosome can be easily resolved. INE incorporates the sequences and annotations of the PAC contig. A site on low quality information ensures that all submitted sequence data comply with the standard for accuracy. As a repository of rice genome sequence, INE will also serve as a common database of all sequence data obtained by collaborating members of the International Rice Genome Sequencing Project (IRGSP). The database can be accessed at http://www. dna.affrc.go.jp:82/giot/INE. html or its mirror site at http://www.staff.or.jp/giot/INE.html

  15. The use of sequence-based SSR mining for the development of a vast collection of microsatellites in Aquilegia Formosa

    Science.gov (United States)

    Brandon Schlautman; Vera Pfeiffer; Juan Zalapa; Johanne Brunet

    2014-01-01

    Numerous microsatellite markers were developed for Aquilegia formosafrom sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST...

  16. GMDD: a database of GMO detection methods.

    Science.gov (United States)

    Dong, Wei; Yang, Litao; Shen, Kailin; Kim, Banghyun; Kleter, Gijs A; Marvin, Hans J P; Guo, Rong; Liang, Wanqi; Zhang, Dabing

    2008-06-04

    Since more than one hundred events of genetically modified organisms (GMOs) have been developed and approved for commercialization in global area, the GMO analysis methods are essential for the enforcement of GMO labelling regulations. Protein and nucleic acid-based detection techniques have been developed and utilized for GMOs identification and quantification. However, the information for harmonization and standardization of GMO analysis methods at global level is needed. GMO Detection method Database (GMDD) has collected almost all the previous developed and reported GMOs detection methods, which have been grouped by different strategies (screen-, gene-, construct-, and event-specific), and also provide a user-friendly search service of the detection methods by GMO event name, exogenous gene, or protein information, etc. In this database, users can obtain the sequences of exogenous integration, which will facilitate PCR primers and probes design. Also the information on endogenous genes, certified reference materials, reference molecules, and the validation status of developed methods is included in this database. Furthermore, registered users can also submit new detection methods and sequences to this database, and the newly submitted information will be released soon after being checked. GMDD contains comprehensive information of GMO detection methods. The database will make the GMOs analysis much easier.

  17. Hazard Analysis Database Report

    Energy Technology Data Exchange (ETDEWEB)

    GAULT, G.W.

    1999-10-13

    The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for US Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for the Tank Waste Remediation System (TWRS) Final Safety Analysis Report (FSAR). The FSAR is part of the approved TWRS Authorization Basis (AB). This document describes, identifies, and defines the contents and structure of the TWRS FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The TWRS Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The database supports the preparation of Chapters 3,4, and 5 of the TWRS FSAR and the USQ process and consists of two major, interrelated data sets: (1) Hazard Evaluation Database--Data from the results of the hazard evaluations; and (2) Hazard Topography Database--Data from the system familiarization and hazard identification.

  18. The Genetic Activity Profile database.

    Science.gov (United States)

    Waters, M D; Stack, H F; Garrett, N E; Jackson, M A

    1991-12-01

    A graphic approach termed a Genetic Activity Profile (GAP) has been developed to display a matrix of data on the genetic and related effects of selected chemical agents. The profiles provide a visual overview of the quantitative (doses) and qualitative (test results) data for each chemical. Either the lowest effective dose (LED) or highest ineffective dose (HID) is recorded for each agent and bioassay. Up to 200 different test systems are represented across the GAP. Bioassay systems are organized according to the phylogeny of the test organisms and the end points of genetic activity. The methodology for the production and evaluation of GAPs has been developed in collaboration with the International Agency for Research on Cancer. Data on individual chemicals have been compiled by IARC and by the U.S. Environmental Protection Agency. Data are available on 299 compounds selected from volumes 1-50 of the IARC Monographs and on 115 compounds identified as Superfund Priority Substances. Software to display the GAPs on an IBM-compatible personal computer is available from the authors. Structurally similar compounds frequently display qualitatively and quantitatively similar GAPs. By examining the patterns of GAPs of pairs and groups of chemicals, it is possible to make more informed decisions regarding the selection of test batteries to be used in evaluating chemical analogs. GAPs have provided useful data for the development of weight-of-evidence hazard ranking schemes. Also, some knowledge of the potential genetic activity of complex environmental mixtures may be gained from assessing the GAPs of component chemicals. The fundamental techniques and computer programs devised for the GAP database may be used to develop similar databases in other disciplines.

  19. Database for propagation models

    Science.gov (United States)

    Kantak, Anil V.

    1991-07-01

    A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.

  20. Product Licenses Database Application

    CERN Document Server

    Tonkovikj, Petar

    2016-01-01

    The goal of this project is to organize and centralize the data about software tools available to CERN employees, as well as provide a system that would simplify the license management process by providing information about the available licenses and their expiry dates. The project development process is consisted of two steps: modeling the products (software tools), product licenses, legal agreements and other data related to these entities in a relational database and developing the front-end user interface so that the user can interact with the database. The result is an ASP.NET MVC web application with interactive views for displaying and managing the data in the underlying database.

  1. LandIT Database

    DEFF Research Database (Denmark)

    Iftikhar, Nadeem; Pedersen, Torben Bach

    2010-01-01

    and reporting purposes. This paper presents the LandIT database; which is result of the LandIT project, which refers to an industrial collaboration project that developed technologies for communication and data integration between farming devices and systems. The LandIT database in principal is based...... on the ISOBUS standard; however the standard is extended with additional requirements, such as gradual data aggregation and flexible exchange of farming data. This paper describes the conceptual and logical schemas of the proposed database based on a real-life farming case study....

  2. JICST Factual Database(2)

    Science.gov (United States)

    Araki, Keisuke

    The computer programme, which builds atom-bond connection tables from nomenclatures, is developed. Chemical substances with their nomenclature and varieties of trivial names or experimental code numbers are inputted. The chemical structures of the database are stereospecifically stored and are able to be searched and displayed according to stereochemistry. Source data are from laws and regulations of Japan, RTECS of US and so on. The database plays a central role within the integrated fact database service of JICST and makes interrelational retrieval possible.

  3. User Guidelines for the Brassica Database: BRAD.

    Science.gov (United States)

    Wang, Xiaobo; Cheng, Feng; Wang, Xiaowu

    2016-01-01

    The genome sequence of Brassica rapa was first released in 2011. Since then, further Brassica genomes have been sequenced or are undergoing sequencing. It is therefore necessary to develop tools that help users to mine information from genomic data efficiently. This will greatly aid scientific exploration and breeding application, especially for those with low levels of bioinformatic training. Therefore, the Brassica database (BRAD) was built to collect, integrate, illustrate, and visualize Brassica genomic datasets. BRAD provides useful searching and data mining tools, and facilitates the search of gene annotation datasets, syntenic or non-syntenic orthologs, and flanking regions of functional genomic elements. It also includes genome-analysis tools such as BLAST and GBrowse. One of the important aims of BRAD is to build a bridge between Brassica crop genomes with the genome of the model species Arabidopsis thaliana, thus transferring the bulk of A. thaliana gene study information for use with newly sequenced Brassica crops.

  4. The Saccharomyces Genome Database Variant Viewer.

    Science.gov (United States)

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Physics Survey Overview

    International Nuclear Information System (INIS)

    2002-01-01

    An overview of a series of assignments of the branches of physics carried out by the Board on Physics and Astronomy of the National Research Council. It identifies further theories in physics and makes recommendations on preventive priorities. The Board on Physics and Astronomy (BPA) has conducted a new decadal survey of physics entitled ''Physics in a New Era''. The survey includes assessments of the main branches of physics as well as certain selected emerging areas. The various elements of the survey were prepared by separately-appointed National Research Council (NRC) committees. The BPA formed the Physics Survey Overview Committee (PSOVC) to complete the survey by preparing an overview of the field of physics to summarize and synthesize the results of the various assessments and to address cross-cutting issues that concern physics as a whole

  6. An overview of GOOD

    NARCIS (Netherlands)

    Paredaens, J.; Van den Bussche, J.; Andries, M.; Gemis, M.; Gyssens, M.; Thyssens, I.; Van Gucht, D.; Sarathy, V.; Saxton, L.V.

    1992-01-01

    GOOD is an acronym, standing for Graph-Oriented Object Database. GOOD is being developed as a joint research effort of Indiana University and the University of Antwerp. The main thrust behind the project is to indicate general concepts that are fundamental to any graph-oriented database

  7. The Danish Gynecological Cancer Nursing Database

    DEFF Research Database (Denmark)

    Seibæk, Lene; Jakobsen, Dorthe Hjort; Høgdall, Claus

    2018-01-01

    Database (DGCD) established a nursing database in 2011. The aim of DGCD Nursing is to monitor the quality of preoperative and postoperative care and to generate data for research. MATERIAL AND METHODS: In accordance with the current data protection legislation, real-time data are entered by clinical nurses...... at all national cancer centers. The DGCD Nursing includes data of preoperative and postoperative care, and nurses are independently represented in the steering committee. The aim of the present article is to present the first results from DGCD Nursing and the national care improvements that have followed......, pain score, vital functions, and psychosocial support. CONCLUSIONS: At national level, DGCD offers a comprehensive overview of the total patient pathway within gynecological cancer surgery. The DGCD Nursing has added to the quality and implementation of evidence-based preoperative and postoperative...

  8. Database for radiation therapy images

    International Nuclear Information System (INIS)

    Shalev, S.; Cosby, S.; Leszczynski, K.; Chu, T.

    1989-01-01

    The authors have developed a database for images acquired during simulation and verification of radiation treatments. Simulation images originate as planning films that are digitized with a video camera, or through direct digitization of fluoroscopic images. Verification images may also be digitized from portal films or acquired with an on-line portal imaging system. Images are classified by the patient, the fraction, the field direction, static or dynamic (movie) sequences, and the type of processing applied. Additional parameters indicate whether the source is a simulation or treatment, whether images are digitized film or real-time acquisitions, and whether treatment is portal or double exposure for beam localization. Examples are presented for images acquired, processed, stored, and displayed with on-line portal imaging system (OPIUM) and digital simulation system (FLIP)

  9. EPICS system: an overview

    International Nuclear Information System (INIS)

    Bartlett, J.F.; Bobbitt, J.S.; Kramper, B.J.; Lahey, T.E.; MacKinnon, B.A.; West, R.E.

    1984-02-01

    This paper presents an overview of the EPICS control system at FERMILAB. EPICS is a distributed, multi-user, interactive system for the control and monitoring of particle beamlines at a high-energy experimental physics laboratory. The overview discusses the operating environment of the control system, the requirements which determined the design decisions, the hardware and software configurations, and plans for the future growth and enhancement of the present system. This paper is the first of three related papers on the EPICS system. The other two cover (1) the system structure and user interface and (2) RSX implementation issues

  10. Livestock Anaerobic Digester Database

    Science.gov (United States)

    The Anaerobic Digester Database provides basic information about anaerobic digesters on livestock farms in the United States, organized in Excel spreadsheets. It includes projects that are under construction, operating, or shut down.

  11. Toxicity Reference Database

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Toxicity Reference Database (ToxRefDB) contains approximately 30 years and $2 billion worth of animal studies. ToxRefDB allows scientists and the interested...

  12. Dissolution Methods Database

    Data.gov (United States)

    U.S. Department of Health & Human Services — For a drug product that does not have a dissolution test method in the United States Pharmacopeia (USP), the FDA Dissolution Methods Database provides information on...

  13. OTI Activity Database

    Data.gov (United States)

    US Agency for International Development — OTI's worldwide activity database is a simple and effective information system that serves as a program management, tracking, and reporting tool. In each country,...

  14. ARTI Refrigerant Database

    Energy Technology Data Exchange (ETDEWEB)

    Calm, J.M. [Calm (James M.), Great Falls, VA (United States)

    1994-05-27

    The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern.

  15. Marine Jurisdictions Database

    National Research Council Canada - National Science Library

    Goldsmith, Roger

    1998-01-01

    The purpose of this project was to take the data gathered for the Maritime Claims chart and create a Maritime Jurisdictions digital database suitable for use with oceanographic mission planning objectives...

  16. Medicaid CHIP ESPC Database

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Environmental Scanning and Program Characteristic (ESPC) Database is in a Microsoft (MS) Access format and contains Medicaid and CHIP data, for the 50 states and...

  17. Records Management Database

    Data.gov (United States)

    US Agency for International Development — The Records Management Database is tool created in Microsoft Access specifically for USAID use. It contains metadata in order to access and retrieve the information...

  18. Reach Address Database (RAD)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Reach Address Database (RAD) stores the reach address of each Water Program feature that has been linked to the underlying surface water features (streams,...

  19. Household Products Database: Pesticides

    Science.gov (United States)

    ... of Products Manufacturers Ingredients About the Database FAQ Product ... control bulbs carpenter ants caterpillars crabgrass control deer dogs dogs/cats fertilizer w/insecticide fertilizer w/weed ...

  20. Mouse Phenome Database (MPD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Mouse Phenome Database (MPD) has characterizations of hundreds of strains of laboratory mice to facilitate translational discoveries and to assist in selection...

  1. Consumer Product Category Database

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Chemical and Product Categories database (CPCat) catalogs the use of over 40,000 chemicals and their presence in different consumer products. The chemical use...

  2. Drycleaner Database - Region 7

    Data.gov (United States)

    U.S. Environmental Protection Agency — THIS DATA ASSET NO LONGER ACTIVE: This is metadata documentation for the Region 7 Drycleaner Database (R7DryClnDB) which tracks all Region7 drycleaners who notify...

  3. National Assessment Database

    Data.gov (United States)

    U.S. Environmental Protection Agency — The National Assessment Database stores and tracks state water quality assessment decisions, Total Maximum Daily Loads (TMDLs) and other watershed plans designed to...

  4. IVR RSA Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This database contains trip-level reports submitted by vessels participating in Research Set-Aside projects with IVR reporting requirements.

  5. Rat Genome Database (RGD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Rat Genome Database (RGD) is a collaborative effort between leading research institutions involved in rat genetic and genomic research to collect, consolidate,...

  6. The CAPEC Database

    DEFF Research Database (Denmark)

    Nielsen, Thomas Lund; Abildskov, Jens; Harper, Peter Mathias

    2001-01-01

    in the compound. This classification makes the CAPEC database a very useful tool, for example, in the development of new property models, since properties of chemically similar compounds are easily obtained. A program with efficient search and retrieval functions of properties has been developed.......The Computer-Aided Process Engineering Center (CAPEC) database of measured data was established with the aim to promote greater data exchange in the chemical engineering community. The target properties are pure component properties, mixture properties, and special drug solubility data....... The database divides pure component properties into primary, secondary, and functional properties. Mixture properties are categorized in terms of the number of components in the mixture and the number of phases present. The compounds in the database have been classified on the basis of the functional groups...

  7. Danish Urogynaecological Database

    DEFF Research Database (Denmark)

    Hansen, Ulla Darling; Gradel, Kim Oren; Larsen, Michael Due

    2016-01-01

    , complications if relevant, implants used if relevant, 3-6-month postoperative recording of symptoms, if any. A set of clinical quality indicators is being maintained by the steering committee for the database and is published in an annual report which also contains extensive descriptive statistics. The database......The Danish Urogynaecological Database is established in order to ensure high quality of treatment for patients undergoing urogynecological surgery. The database contains details of all women in Denmark undergoing incontinence surgery or pelvic organ prolapse surgery amounting to ~5,200 procedures...... has a completeness of over 90% of all urogynecological surgeries performed in Denmark. Some of the main variables have been validated using medical records as gold standard. The positive predictive value was above 90%. The data are used as a quality monitoring tool by the hospitals and in a number...

  8. The Danish Urogynaecological Database

    DEFF Research Database (Denmark)

    Guldberg, Rikke; Brostrøm, Søren; Hansen, Jesper Kjær

    2013-01-01

    in the DugaBase from 1 January 2009 to 31 October 2010, using medical records as a reference. RESULTS: A total of 16,509 urogynaecological procedures were registered in the DugaBase by 31 December 2010. The database completeness has increased by calendar time, from 38.2 % in 2007 to 93.2 % in 2010 for public......INTRODUCTION AND HYPOTHESIS: The Danish Urogynaecological Database (DugaBase) is a nationwide clinical database established in 2006 to monitor, ensure and improve the quality of urogynaecological surgery. We aimed to describe its establishment and completeness and to validate selected variables....... This is the first study based on data from the DugaBase. METHODS: The database completeness was calculated as a comparison between urogynaecological procedures reported to the Danish National Patient Registry and to the DugaBase. Validity was assessed for selected variables from a random sample of 200 women...

  9. Danish Pancreatic Cancer Database

    DEFF Research Database (Denmark)

    Fristrup, Claus; Detlefsen, Sönke; Palnæs Hansen, Carsten

    2016-01-01

    : Death is monitored using data from the Danish Civil Registry. This registry monitors the survival status of the Danish population, and the registration is virtually complete. All data in the database are audited by all participating institutions, with respect to baseline characteristics, key indicators......AIM OF DATABASE: The Danish Pancreatic Cancer Database aims to prospectively register the epidemiology, diagnostic workup, diagnosis, treatment, and outcome of patients with pancreatic cancer in Denmark at an institutional and national level. STUDY POPULATION: Since May 1, 2011, all patients...... with microscopically verified ductal adenocarcinoma of the pancreas have been registered in the database. As of June 30, 2014, the total number of patients registered was 2,217. All data are cross-referenced with the Danish Pathology Registry and the Danish Patient Registry to ensure the completeness of registrations...

  10. Food Habits Database (FHDBS)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NEFSC Food Habits Database has two major sources of data. The first, and most extensive, is the standard NEFSC Bottom Trawl Surveys Program. During these...

  11. Functionally Graded Materials Database

    Science.gov (United States)

    Kisara, Katsuto; Konno, Tomomi; Niino, Masayuki

    2008-02-01

    Functionally Graded Materials Database (hereinafter referred to as FGMs Database) was open to the society via Internet in October 2002, and since then it has been managed by the Japan Aerospace Exploration Agency (JAXA). As of October 2006, the database includes 1,703 research information entries with 2,429 researchers data, 509 institution data and so on. Reading materials such as "Applicability of FGMs Technology to Space Plane" and "FGMs Application to Space Solar Power System (SSPS)" were prepared in FY 2004 and 2005, respectively. The English version of "FGMs Application to Space Solar Power System (SSPS)" is now under preparation. This present paper explains the FGMs Database, describing the research information data, the sitemap and how to use it. From the access analysis, user access results and users' interests are discussed.

  12. Tethys Acoustic Metadata Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Tethys database houses the metadata associated with the acoustic data collection efforts by the Passive Acoustic Group. These metadata include dates, locations...

  13. NLCD 2011 database

    Data.gov (United States)

    U.S. Environmental Protection Agency — National Land Cover Database 2011 (NLCD 2011) is the most recent national land cover product created by the Multi-Resolution Land Characteristics (MRLC) Consortium....

  14. Medicare Coverage Database

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Coverage Database (MCD) contains all National Coverage Determinations (NCDs) and Local Coverage Determinations (LCDs), local articles, and proposed NCD...

  15. Household Products Database

    Data.gov (United States)

    U.S. Department of Health & Human Services — This database links over 4,000 consumer brands to health effects from Material Safety Data Sheets (MSDS) provided by the manufacturers and allows scientists and...

  16. Global Volcano Locations Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — NGDC maintains a database of over 1,500 volcano locations obtained from the Smithsonian Institution Global Volcanism Program, Volcanoes of the World publication. The...

  17. 1988 Spitak Earthquake Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The 1988 Spitak Earthquake database is an extensive collection of geophysical and geological data, maps, charts, images and descriptive text pertaining to the...

  18. Uranium Location Database

    Data.gov (United States)

    U.S. Environmental Protection Agency — A GIS compiled locational database in Microsoft Access of ~15,000 mines with uranium occurrence or production, primarily in the western United States. The metadata...

  19. INIST: databases reorientation

    International Nuclear Information System (INIS)

    Bidet, J.C.

    1995-01-01

    INIST is a CNRS (Centre National de la Recherche Scientifique) laboratory devoted to the treatment of scientific and technical informations and to the management of these informations compiled in a database. Reorientation of the database content has been proposed in 1994 to increase the transfer of research towards enterprises and services, to develop more automatized accesses to the informations, and to create a quality assurance plan. The catalog of publications comprises 5800 periodical titles (1300 for fundamental research and 4500 for applied research). A science and technology multi-thematic database will be created in 1995 for the retrieval of applied and technical informations. ''Grey literature'' (reports, thesis, proceedings..) and human and social sciences data will be added to the base by the use of informations selected in the existing GRISELI and Francis databases. Strong modifications are also planned in the thematic cover of Earth sciences and will considerably reduce the geological information content. (J.S.). 1 tab

  20. Fine Arts Database (FAD)

    Data.gov (United States)

    General Services Administration — The Fine Arts Database records information on federally owned art in the control of the GSA; this includes the location, current condition and information on artists.

  1. Kansas Cartographic Database (KCD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The Kansas Cartographic Database (KCD) is an exact digital representation of selected features from the USGS 7.5 minute topographic map series. Features that are...

  2. Mining biological databases for candidate disease genes

    Science.gov (United States)

    Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

    2001-07-01

    The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

  3. Database Description - PGDBj - Ortholog DB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available e relevant data in the databases. By submitting queries to the PGDBj Ortholog DB with keywords or amino acid sequences, users... taxa including both model plants and crop plants. Following the links obtained, users can retrieve the actu

  4. Developments in diffraction databases

    International Nuclear Information System (INIS)

    Jenkins, R.

    1999-01-01

    Full text: There are a number of databases available to the diffraction community. Two of the more important of these are the Powder Diffraction File (PDF) maintained by the International Centre for Diffraction Data (ICDD), and the Inorganic Crystal Structure Database (ICSD) maintained by Fachsinformationzentrum (FIZ, Karlsruhe). In application, the PDF has been used as an indispensable tool in phase identification and identification of unknowns. The ICSD database has extensive and explicit reference to the structures of compounds: atomic coordinates, space group and even thermal vibration parameters. A similar database, but for organic compounds, is maintained by the Cambridge Crystallographic Data Centre. These databases are often used as independent sources of information. However, little thought has been given on how to exploit the combined properties of structural database tools. A recently completed agreement between ICDD and FIZ, plus ICDD and Cambridge, provides a first step in complementary use of the PDF and the ICSD databases. The focus of this paper (as indicated below) is to examine ways of exploiting the combined properties of both databases. In 1996, there were approximately 76,000 entries in the PDF and approximately 43,000 entries in the ICSD database. The ICSD database has now been used to calculate entries in the PDF. Thus, to derive d-spacing and peak intensity data requires the synthesis of full diffraction patterns, i.e., we use the structural data in the ICSD database and then add instrumental resolution information. The combined data from PDF and ICSD can be effectively used in many ways. For example, we can calculate PDF data for an ideally random crystal distribution and also in the absence of preferred orientation. Again, we can use systematic studies of intermediate members in solid solutions series to help produce reliable quantitative phase analyses. In some cases, we can study how solid solution properties vary with composition and

  5. Database Replication Prototype

    OpenAIRE

    Vandewall, R.

    2000-01-01

    This report describes the design of a Replication Framework that facilitates the implementation and com-parison of database replication techniques. Furthermore, it discusses the implementation of a Database Replication Prototype and compares the performance measurements of two replication techniques based on the Atomic Broadcast communication primitive: pessimistic active replication and optimistic active replication. The main contributions of this report can be split into four parts....

  6. Database on Wind Characteristics

    DEFF Research Database (Denmark)

    Højstrup, J.; Ejsing Jørgensen, Hans; Lundtang Petersen, Erik

    1999-01-01

    his report describes the work and results of the project: Database on Wind Characteristics which was sponsered partly by the European Commision within the framework of JOULE III program under contract JOR3-CT95-0061......his report describes the work and results of the project: Database on Wind Characteristics which was sponsered partly by the European Commision within the framework of JOULE III program under contract JOR3-CT95-0061...

  7. ORACLE DATABASE SECURITY

    OpenAIRE

    Cristina-Maria Titrade

    2011-01-01

    This paper presents some security issues, namely security database system level, data level security, user-level security, user management, resource management and password management. Security is a constant concern in the design and database development. Usually, there are no concerns about the existence of security, but rather how large it should be. A typically DBMS has several levels of security, in addition to those offered by the operating system or network. Typically, a DBMS has user a...

  8. Database computing in HEP

    International Nuclear Information System (INIS)

    Day, C.T.; Loken, S.; MacFarlane, J.F.; May, E.; Lifka, D.; Lusk, E.; Price, L.E.; Baden, A.

    1992-01-01

    The major SSC experiments are expected to produce up to 1 Petabyte of data per year each. Once the primary reconstruction is completed by farms of inexpensive processors. I/O becomes a major factor in further analysis of the data. We believe that the application of database techniques can significantly reduce the I/O performed in these analyses. We present examples of such I/O reductions in prototype based on relational and object-oriented databases of CDF data samples

  9. Update History of This Database - Trypanosomes Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Trypanosomes Database Update History of This Database Date Update contents 2014/05/07 The co...ntact information is corrected. The features and manner of utilization of the database are corrected. 2014/02/04 Trypanosomes Databas...e English archive site is opened. 2011/04/04 Trypanosomes Database ( http://www.tan...paku.org/tdb/ ) is opened. About This Database Database Description Download Lice...nse Update History of This Database Site Policy | Contact Us Update History of This Database - Trypanosomes Database | LSDB Archive ...

  10. Specialist Bibliographic Databases.

    Science.gov (United States)

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  11. Specialist Bibliographic Databases

    Science.gov (United States)

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  12. REDfly: a Regulatory Element Database for Drosophila.

    Science.gov (United States)

    Gallo, Steven M; Li, Long; Hu, Zihua; Halfon, Marc S

    2006-02-01

    Bioinformatics studies of transcriptional regulation in the metazoa are significantly hindered by the absence of readily available data on large numbers of transcriptional cis-regulatory modules (CRMs). Even the richly annotated Drosophila melanogaster genome lacks extensive CRM information. We therefore present here a database of Drosophila CRMs curated from the literature complete with both DNA sequence and a searchable description of the gene expression pattern regulated by each CRM. This resource should greatly facilitate the development of computational approaches to CRM discovery as well as bioinformatics analyses of regulatory sequence properties and evolution.

  13. An overview of wheat genome sequencing and its implications for ...

    Indian Academy of Sciences (India)

    National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi 110 067, India ... Wheat (Triticum aestivum L.) serves as the staple food for. 30% of the global .... bread wheat genome is a product of multiple rounds of hybrid.

  14. Developing an Online Database of National and Sub-National Clean Energy Policies

    Energy Technology Data Exchange (ETDEWEB)

    Haynes, R.; Cross, S.; Heinemann, A.; Booth, S.

    2014-06-01

    The Database of State Incentives for Renewables and Efficiency (DSIRE) was established in 1995 to provide summaries of energy efficiency and renewable energy policies offered by the federal and state governments. This primer provides an overview of the major policy, research, and technical topics to be considered when creating a similar clean energy policy database and website.

  15. Indexing, learning and content-based retrieval for special purpose image databases

    NARCIS (Netherlands)

    M.J. Huiskes (Mark); E.J. Pauwels (Eric)

    2005-01-01

    textabstractThis chapter deals with content-based image retrieval in special purpose image databases. As image data is amassed ever more effortlessly, building efficient systems for searching and browsing of image databases becomes increasingly urgent. We provide an overview of the current

  16. Using Web Database Tools To Facilitate the Construction of Knowledge in Online Courses.

    Science.gov (United States)

    McNeil, Sara G.; Robin, Bernard R.

    This paper presents an overview of database tools that dynamically generate World Wide Web materials and focuses on the use of these tools to support research activities, as well as teaching and learning. Database applications have been used in classrooms to support learning activities for over a decade, but, although business and e-commerce have…

  17. PAMDB: a comprehensive Pseudomonas aeruginosa metabolome database.

    Science.gov (United States)

    Huang, Weiliang; Brewer, Luke K; Jones, Jace W; Nguyen, Angela T; Marcu, Ana; Wishart, David S; Oglesby-Sherrouse, Amanda G; Kane, Maureen A; Wilks, Angela

    2018-01-04

    The Pseudomonas aeruginosaMetabolome Database (PAMDB, http://pseudomonas.umaryland.edu) is a searchable, richly annotated metabolite database specific to P. aeruginosa. P. aeruginosa is a soil organism and significant opportunistic pathogen that adapts to its environment through a versatile energy metabolism network. Furthermore, P. aeruginosa is a model organism for the study of biofilm formation, quorum sensing, and bioremediation processes, each of which are dependent on unique pathways and metabolites. The PAMDB is modelled on the Escherichia coli (ECMDB), yeast (YMDB) and human (HMDB) metabolome databases and contains >4370 metabolites and 938 pathways with links to over 1260 genes and proteins. The database information was compiled from electronic databases, journal articles and mass spectrometry (MS) metabolomic data obtained in our laboratories. For each metabolite entered, we provide detailed compound descriptions, names and synonyms, structural and physiochemical information, nuclear magnetic resonance (NMR) and MS spectra, enzymes and pathway information, as well as gene and protein sequences. The database allows extensive searching via chemical names, structure and molecular weight, together with gene, protein and pathway relationships. The PAMBD and its future iterations will provide a valuable resource to biologists, natural product chemists and clinicians in identifying active compounds, potential biomarkers and clinical diagnostics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  19. Virtual Reality: An Overview.

    Science.gov (United States)

    Franchi, Jorge

    1994-01-01

    Highlights of this overview of virtual reality include optics; interface devices; virtual worlds; potential applications, including medicine and archaeology; problems, including costs; current research and development; future possibilities; and a listing of vendors and suppliers of virtual reality products. (Contains 11 references.) (LRW)

  20. Miniature UAVs : An overview

    NARCIS (Netherlands)

    Weimar, P.W.L.; Kerkkamp, J.S.F.; Wiel, R.A.N.; Meiller, P.P.; Bos, J.G.H.

    2014-01-01

    With this book TNO provides an overview of topics related to Miniature Unmanned Aerial Vehicles (MUAVs). Both novices and experts may find this publication valuable. The Netherlands Organisation for Applied Scientific Research TNO conducts research on UAVs and MUAVs, see for example [1], on the

  1. Overview of religions.

    Science.gov (United States)

    Brooks, Nicky

    2004-01-01

    This article provides a brief overview of 9 religions: Christianity, Judaism, Jehovah's Witnesses, The Church of Jesus Christ of Latter-Day Saints, Christian Science, Islam, Hinduism, Sikhism, and Buddhism. Basic information on the origins, language, naming practices, diet, personal hygiene, and dress requirements is provided. For additional information, Web sites for each of these religions are also provided.

  2. ISAF Overview Brief

    Science.gov (United States)

    2011-01-01

    Eager Afghans NEW PA STUDENTS NATO / ISAF UNCLASSIFIED NATO / ISAF UNCLASSIFIED ISAF Overview Brief, MHS Conference 2011 – Attrition, Leader deficit...doctors (male/fem), 2 nurses, 2 midwives. 100k-300k XRAY, surgery, OB, physiotherapy , pediatrician, pharmacist, dentist. 10k-15k An extension of the BHC

  3. Breast Cancer Overview

    Science.gov (United States)

    ... are here Home > Types of Cancer > Breast Cancer Breast Cancer This is Cancer.Net’s Guide to Breast Cancer. Use the menu below to choose the Overview/ ... social workers, and patient advocates. Cancer.Net Guide Breast Cancer Introduction Statistics Medical Illustrations Risk Factors and Prevention ...

  4. Pentaquarks. An experimental overview

    International Nuclear Information System (INIS)

    Barna, D.

    2005-01-01

    Since the recent observation of a pentaquark (Θ + = qqqqq-bar) state (see Nakano et al. (LEPS Collaboration), Phys. Rev. Lett.91 (2003) 012002-1) several positive and negative experimental results have emerged. These results are overviewed, with a trial to find common features among them. (author)

  5. Cascade annealing: an overview

    International Nuclear Information System (INIS)

    Doran, D.G.; Schiffgens, J.O.

    1976-04-01

    Concepts and an overview of radiation displacement damage modeling and annealing kinetics are presented. Short-term annealing methodology is described and results of annealing simulations performed on damage cascades generated using the Marlowe and Cascade programs are included. Observations concerning the inconsistencies and inadequacies of current methods are presented along with simulation of high energy cascades and simulation of longer-term annealing

  6. THX Experiment Overview

    Science.gov (United States)

    Wernet, Mark; Wroblewski, Adam; Locke, Randy; Georgiadis, Nick

    2016-01-01

    This presentation provides an overview of experiments conducted at NASA GRC to provide turbulent flow measurements needed for new turbulence model development and validation. The experiments include particle image velocimetry (PIV) and hot-wire measurements of mean flow velocity and temperature fields, as well as fluctuating components.

  7. Database Description - RPSD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name RPSD Alternative nam...e Rice Protein Structure Database DOI 10.18908/lsdba.nbdc00749-000 Creator Creator Name: Toshimasa Yamazaki ... Ibaraki 305-8602, Japan National Institute of Agrobiological Sciences Toshimasa Yamazaki E-mail : Databas...e classification Structure Databases - Protein structure Organism Taxonomy Name: Or...or name(s): Journal: External Links: Original website information Database maintenance site National Institu

  8. Private and Efficient Query Processing on Outsourced Genomic Databases.

    Science.gov (United States)

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  9. Database Description - DGBY | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available base Description General information of database Database name DGBY Alternative name Database...EL: +81-29-838-8066 E-mail: Database classification Microarray Data and other Gene Expression Databases Orga...nism Taxonomy Name: Saccharomyces cerevisiae Taxonomy ID: 4932 Database descripti...-called phenomics). We uploaded these data on this website which is designated DGBY(Database for Gene expres...ma J, Ando A, Takagi H. Journal: Yeast. 2008 Mar;25(3):179-90. External Links: Original website information Database

  10. Transition physics and scaling overview

    International Nuclear Information System (INIS)

    Carlstrom, T.N.

    1996-01-01

    This paper presents an overview of recent experimental progress towards understanding H-mode transition physics and scaling. Terminology and techniques for studying H-mode are reviewed and discussed. The model of shear E x B flow stabilization of edge fluctuations at the L-H transition is gaining wide acceptance and is further supported by observations of edge rotation on a number of new devices. Observations of poloidal asymmetries of edge fluctuations and dephasing of density and potential fluctuations after the transition pose interesting challenges for understanding H-mode physics. Dedicated scans to determine the scaling of the power threshold have now been performed on many machines. A clear B t dependence is universally observed but dependence on the line averaged density is complicated. Other dependencies are also reported. Studies of the effect of neutrals and error fields on the power threshold are under investigation. The ITER threshold database has matured and offers guidance to the power threshold scaling issues relevant to next-step devices. (author)

  11. Transition physics and scaling overview

    International Nuclear Information System (INIS)

    Carlstrom, T.N.

    1995-12-01

    This paper presents an overview of recent experimental progress towards understanding H-mode transition physics and scaling. Terminology and techniques for studying H-mode are reviewed and discussed. The model of shear E x B flow stabilization of edge fluctuations at the L-H transition is gaining wide acceptance and is further supported by observations of edge rotation on a number of new devices. Observations of poloidal asymmetries of edge fluctuations and dephasing of density and potential fluctuations after the transition pose interesting challenges for understanding H-mode physics. Dedicated scans to determine the scaling of the power threshold have now been performed on many machines. A dear B t dependence is universally observed but dependence on the line averaged density is complicated. Other dependencies are also reported. Studies of the effect of neutrals and error fields on the power threshold are under investigation. The ITER threshold database has matured and offers guidance to the power threshold scaling issues relevant to next-step devices

  12. Big Data Analytics An Overview

    Directory of Open Access Journals (Sweden)

    Jayshree Dwivedi

    2015-08-01

    Full Text Available Big data is a data beyond the storage capacity and beyond the processing power is called big data. Big data term is used for data sets its so large or complex that traditional data it involves data sets with sizes. Big data size is a constantly moving target year by year ranging from a few dozen terabytes to many petabytes of data means like social networking sites the amount of data produced by people is growing rapidly every year. Big data is not only a data rather it become a complete subject which includes various tools techniques and framework. It defines the epidemic possibility and evolvement of data both structured and unstructured. Big data is a set of techniques and technologies that require new forms of assimilate to uncover large hidden values from large datasets that are diverse complex and of a massive scale. It is difficult to work with using most relational database management systems and desktop statistics and visualization packages exacting preferably massively parallel software running on tens hundreds or even thousands of servers. Big data environment is used to grab organize and resolve the various types of data. In this paper we describe applications problems and tools of big data and gives overview of big data.

  13. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  14. The SWISS-PROT protein sequence data bank: current status.

    OpenAIRE

    Bairoch, A; Boeckmann, B

    1994-01-01

    SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. The SWISS-PROT protein sequence data bank consist of sequence entries. Sequence entries are composed of different lines types, each with their own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the EMBL Nucleotide Sequence Databa...

  15. Human Performance Event Database

    International Nuclear Information System (INIS)

    Trager, E. A.

    1998-01-01

    The purpose of this paper is to describe several aspects of a Human Performance Event Database (HPED) that is being developed by the Nuclear Regulatory Commission. These include the background, the database structure and basis for the structure, the process for coding and entering event records, the results of preliminary analyses of information in the database, and plans for the future. In 1992, the Office for Analysis and Evaluation of Operational Data (AEOD) within the NRC decided to develop a database for information on human performance during operating events. The database was needed to help classify and categorize the information to help feedback operating experience information to licensees and others. An NRC interoffice working group prepared a list of human performance information that should be reported for events and the list was based on the Human Performance Investigation Process (HPIP) that had been developed by the NRC as an aid in investigating events. The structure of the HPED was based on that list. The HPED currently includes data on events described in augmented inspection team (AIT) and incident investigation team (IIT) reports from 1990 through 1996, AEOD human performance studies from 1990 through 1993, recent NRR special team inspections, and licensee event reports (LERs) that were prepared for the events. (author)

  16. The CUTLASS database facilities

    International Nuclear Information System (INIS)

    Jervis, P.; Rutter, P.

    1988-09-01

    The enhancement of the CUTLASS database management system to provide improved facilities for data handling is seen as a prerequisite to its effective use for future power station data processing and control applications. This particularly applies to the larger projects such as AGR data processing system refurbishments, and the data processing systems required for the new Coal Fired Reference Design stations. In anticipation of the need for improved data handling facilities in CUTLASS, the CEGB established a User Sub-Group in the early 1980's to define the database facilities required by users. Following the endorsement of the resulting specification and a detailed design study, the database facilities have been implemented as an integral part of the CUTLASS system. This paper provides an introduction to the range of CUTLASS Database facilities, and emphasises the role of Database as the central facility around which future Kit 1 and (particularly) Kit 6 CUTLASS based data processing and control systems will be designed and implemented. (author)

  17. ADANS database specification

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-01-16

    The purpose of the Air Mobility Command (AMC) Deployment Analysis System (ADANS) Database Specification (DS) is to describe the database organization and storage allocation and to provide the detailed data model of the physical design and information necessary for the construction of the parts of the database (e.g., tables, indexes, rules, defaults). The DS includes entity relationship diagrams, table and field definitions, reports on other database objects, and a description of the ADANS data dictionary. ADANS is the automated system used by Headquarters AMC and the Tanker Airlift Control Center (TACC) for airlift planning and scheduling of peacetime and contingency operations as well as for deliberate planning. ADANS also supports planning and scheduling of Air Refueling Events by the TACC and the unit-level tanker schedulers. ADANS receives input in the form of movement requirements and air refueling requests. It provides a suite of tools for planners to manipulate these requirements/requests against mobility assets and to develop, analyze, and distribute schedules. Analysis tools are provided for assessing the products of the scheduling subsystems, and editing capabilities support the refinement of schedules. A reporting capability provides formatted screen, print, and/or file outputs of various standard reports. An interface subsystem handles message traffic to and from external systems. The database is an integral part of the functionality summarized above.

  18. Database Description - Trypanosomes Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Trypanosomes Database Database Description General information of database Database name Trypanosomes Database...stitute of Genetics Research Organization of Information and Systems Yata 1111, Mishima, Shizuoka 411-8540, JAPAN E mail: Database...y Name: Trypanosoma Taxonomy ID: 5690 Taxonomy Name: Homo sapiens Taxonomy ID: 9606 Database description The... Article title: Author name(s): Journal: External Links: Original website information Database maintenance s...DB (Protein Data Bank) KEGG PATHWAY Database DrugPort Entry list Available Query search Available Web servic

  19. VKCDB: Voltage-gated potassium channel database

    Directory of Open Access Journals (Sweden)

    Gallin Warren J

    2004-01-01

    Full Text Available Abstract Background The family of voltage-gated potassium channels comprises a functionally diverse group of membrane proteins. They help maintain and regulate the potassium ion-based component of the membrane potential and are thus central to many critical physiological processes. VKCDB (Voltage-gated potassium [K] Channel DataBase is a database of structural and functional data on these channels. It is designed as a resource for research on the molecular basis of voltage-gated potassium channel function. Description Voltage-gated potassium channel sequences were identified by using BLASTP to search GENBANK and SWISSPROT. Annotations for all voltage-gated potassium channels were selectively parsed and integrated into VKCDB. Electrophysiological and pharmacological data for the channels were collected from published journal articles. Transmembrane domain predictions by TMHMM and PHD are included for each VKCDB entry. Multiple sequence alignments of conserved domains of channels of the four Kv families and the KCNQ family are also included. Currently VKCDB contains 346 channel entries. It can be browsed and searched using a set of functionally relevant categories. Protein sequences can also be searched using a local BLAST engine. Conclusions VKCDB is a resource for comparative studies of voltage-gated potassium channels. The methods used to construct VKCDB are general; they can be used to create specialized databases for other protein families. VKCDB is accessible at http://vkcdb.biology.ualberta.ca.

  20. Database Description - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Frontier Sciences, The University of Tokyo(when creating) Creator Name: Takashi ...ating) Contact address GSFS - CB-06, Transdisciplinary Sciences, The University of ...Ito* Creator Affiliation: Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo(when cre

  1. Conserved Domain Database (CDD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — CDD is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.

  2. The LHCb configuration database

    CERN Document Server

    Abadie, Lana; Gaspar, Clara; Jacobsson, Richard; Jost, Beat; Neufeld, Niko

    2005-01-01

    The Experiment Control System (ECS) will handle the monitoring, configuration and operation of all the LHCb experimental equipment. All parameters required to configure electronics equipment under the control of the ECS will reside in a configuration database. The database will contain two kinds of information: 1.\tConfiguration properties about devices such as hardware addresses, geographical location, and operational parameters associated with particular running modes (dynamic properties). 2.\tConnectivity between devices : this consists of describing the output and input connections of a device (static properties). The representation of these data using tables must be complete so that it can provide all the required information to the ECS and must cater for all the subsystems. The design should also guarantee a fast response time, even if a query results in a large volume of data being loaded from the database into the ECS. To fulfil these constraints, we apply the following methodology: Determine from the d...

  3. Database Application Schema Forensics

    Directory of Open Access Journals (Sweden)

    Hector Quintus Beyers

    2014-12-01

    Full Text Available The application schema layer of a Database Management System (DBMS can be modified to deliver results that may warrant a forensic investigation. Table structures can be corrupted by changing the metadata of a database or operators of the database can be altered to deliver incorrect results when used in queries. This paper will discuss categories of possibilities that exist to alter the application schema with some practical examples. Two forensic environments are introduced where a forensic investigation can take place in. Arguments are provided why these environments are important. Methods are presented how these environments can be achieved for the application schema layer of a DBMS. A process is proposed on how forensic evidence should be extracted from the application schema layer of a DBMS. The application schema forensic evidence identification process can be applied to a wide range of forensic settings.

  4. Tibetan Magmatism Database

    Science.gov (United States)

    Chapman, James B.; Kapp, Paul

    2017-11-01

    A database containing previously published geochronologic, geochemical, and isotopic data on Mesozoic to Quaternary igneous rocks in the Himalayan-Tibetan orogenic system are presented. The database is intended to serve as a repository for new and existing igneous rock data and is publicly accessible through a web-based platform that includes an interactive map and data table interface with search, filtering, and download options. To illustrate the utility of the database, the age, location, and ɛHft composition of magmatism from the central Gangdese batholith in the southern Lhasa terrane are compared. The data identify three high-flux events, which peak at 93, 50, and 15 Ma. They are characterized by inboard arc migration and a temporal and spatial shift to more evolved isotopic compositions.

  5. Database Vs Data Warehouse

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available Data warehouse technology includes a set of concepts and methods that offer the users useful information for decision making. The necessity to build a data warehouse arises from the necessity to improve the quality of information in the organization. The date proceeding from different sources, having a variety of forms - both structured and unstructured, are filtered according to business rules and are integrated in a single large data collection. Using informatics solutions, managers have understood that data stored in operational systems - including databases, are an informational gold mine that must be exploited. Data warehouses have been developed to answer the increasing demands for complex analysis, which could not be properly achieved with operational databases. The present paper emphasizes some of the criteria that information application developers can use in order to choose between a database solution or a data warehouse one.

  6. The Danish Sarcoma Database

    DEFF Research Database (Denmark)

    Jørgensen, Peter Holmberg; Lausten, Gunnar Schwarz; Pedersen, Alma B

    2016-01-01

    AIM: The aim of the database is to gather information about sarcomas treated in Denmark in order to continuously monitor and improve the quality of sarcoma treatment in a local, a national, and an international perspective. STUDY POPULATION: Patients in Denmark diagnosed with a sarcoma, both...... skeletal and ekstraskeletal, are to be registered since 2009. MAIN VARIABLES: The database contains information about appearance of symptoms; date of receiving referral to a sarcoma center; date of first visit; whether surgery has been performed elsewhere before referral, diagnosis, and treatment; tumor...... of Diseases - tenth edition codes and TNM Classification of Malignant Tumours, and date of death (after yearly coupling to the Danish Civil Registration System). Data quality and completeness are currently secured. CONCLUSION: The Danish Sarcoma Database is population based and includes sarcomas occurring...

  7. Danish Gynecological Cancer Database

    DEFF Research Database (Denmark)

    Sørensen, Sarah Mejer; Bjørn, Signe Frahm; Jochumsen, Kirsten Marie

    2016-01-01

    AIM OF DATABASE: The Danish Gynecological Cancer Database (DGCD) is a nationwide clinical cancer database and its aim is to monitor the treatment quality of Danish gynecological cancer patients, and to generate data for scientific purposes. DGCD also records detailed data on the diagnostic measures...... data forms as follows: clinical data, surgery, pathology, pre- and postoperative care, complications, follow-up visits, and final quality check. DGCD is linked with additional data from the Danish "Pathology Registry", the "National Patient Registry", and the "Cause of Death Registry" using the unique...... Danish personal identification number (CPR number). DESCRIPTIVE DATA: Data from DGCD and registers are available online in the Statistical Analysis Software portal. The DGCD forms cover almost all possible clinical variables used to describe gynecological cancer courses. The only limitation...

  8. RODOS database adapter

    International Nuclear Information System (INIS)

    Xie Gang

    1995-11-01

    Integrated data management is an essential aspect of many automatical information systems such as RODOS, a real-time on-line decision support system for nuclear emergency management. In particular, the application software must provide access management to different commercial database systems. This report presents the tools necessary for adapting embedded SQL-applications to both HP-ALLBASE/SQL and CA-Ingres/SQL databases. The design of the database adapter and the concept of RODOS embedded SQL syntax are discussed by considering some of the most important features of SQL-functions and the identification of significant differences between SQL-implementations. Finally fully part of the software developed and the administrator's and installation guides are described. (orig.) [de

  9. The Danish Depression Database

    DEFF Research Database (Denmark)

    Videbech, Poul Bror Hemming; Deleuran, Anette

    2016-01-01

    AIM OF DATABASE: The purpose of the Danish Depression Database (DDD) is to monitor and facilitate the improvement of the quality of the treatment of depression in Denmark. Furthermore, the DDD has been designed to facilitate research. STUDY POPULATION: Inpatients as well as outpatients...... with depression, aged above 18 years, and treated in the public psychiatric hospital system were enrolled. MAIN VARIABLES: Variables include whether the patient has been thoroughly somatically examined and has been interviewed about the psychopathology by a specialist in psychiatry. The Hamilton score as well...... as an evaluation of the risk of suicide are measured before and after treatment. Whether psychiatric aftercare has been scheduled for inpatients and the rate of rehospitalization are also registered. DESCRIPTIVE DATA: The database was launched in 2011. Every year since then ~5,500 inpatients and 7,500 outpatients...

  10. 600 MW nuclear power database

    International Nuclear Information System (INIS)

    Cao Ruiding; Chen Guorong; Chen Xianfeng; Zhang Yishu

    1996-01-01

    600 MW Nuclear power database, based on ORACLE 6.0, consists of three parts, i.e. nuclear power plant database, nuclear power position database and nuclear power equipment database. In the database, there are a great deal of technique data and picture of nuclear power, provided by engineering designing units and individual. The database can give help to the designers of nuclear power

  11. The Neotoma Paleoecology Database

    Science.gov (United States)

    Grimm, E. C.; Ashworth, A. C.; Barnosky, A. D.; Betancourt, J. L.; Bills, B.; Booth, R.; Blois, J.; Charles, D. F.; Graham, R. W.; Goring, S. J.; Hausmann, S.; Smith, A. J.; Williams, J. W.; Buckland, P.

    2015-12-01

    The Neotoma Paleoecology Database (www.neotomadb.org) is a multiproxy, open-access, relational database that includes fossil data for the past 5 million years (the late Neogene and Quaternary Periods). Modern distributional data for various organisms are also being made available for calibration and paleoecological analyses. The project is a collaborative effort among individuals from more than 20 institutions worldwide, including domain scientists representing a spectrum of Pliocene-Quaternary fossil data types, as well as experts in information technology. Working groups are active for diatoms, insects, ostracodes, pollen and plant macroscopic remains, testate amoebae, rodent middens, vertebrates, age models, geochemistry and taphonomy. Groups are also active in developing online tools for data analyses and for developing modules for teaching at different levels. A key design concept of NeotomaDB is that stewards for various data types are able to remotely upload and manage data. Cooperatives for different kinds of paleo data, or from different regions, can appoint their own stewards. Over the past year, much progress has been made on development of the steward software-interface that will enable this capability. The steward interface uses web services that provide access to the database. More generally, these web services enable remote programmatic access to the database, which both desktop and web applications can use and which provide real-time access to the most current data. Use of these services can alleviate the need to download the entire database, which can be out-of-date as soon as new data are entered. In general, the Neotoma web services deliver data either from an entire table or from the results of a view. Upon request, new web services can be quickly generated. Future developments will likely expand the spatial and temporal dimensions of the database. NeotomaDB is open to receiving new datasets and stewards from the global Quaternary community

  12. The Danish Sarcoma Database

    Directory of Open Access Journals (Sweden)

    Jorgensen PH

    2016-10-01

    Full Text Available Peter Holmberg Jørgensen,1 Gunnar Schwarz Lausten,2 Alma B Pedersen3 1Tumor Section, Department of Orthopedic Surgery, Aarhus University Hospital, Aarhus, 2Tumor Section, Department of Orthopedic Surgery, Rigshospitalet, Copenhagen, 3Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark Aim: The aim of the database is to gather information about sarcomas treated in Denmark in order to continuously monitor and improve the quality of sarcoma treatment in a local, a national, and an international perspective. Study population: Patients in Denmark diagnosed with a sarcoma, both skeletal and ekstraskeletal, are to be registered since 2009. Main variables: The database contains information about appearance of symptoms; date of receiving referral to a sarcoma center; date of first visit; whether surgery has been performed elsewhere before referral, diagnosis, and treatment; tumor characteristics such as location, size, malignancy grade, and growth pattern; details on treatment (kind of surgery, amount of radiation therapy, type and duration of chemotherapy; complications of treatment; local recurrence and metastases; and comorbidity. In addition, several quality indicators are registered in order to measure the quality of care provided by the hospitals and make comparisons between hospitals and with international standards. Descriptive data: Demographic patient-specific data such as age, sex, region of living, comorbidity, World Health Organization's International Classification of Diseases – tenth edition codes and TNM Classification of Malignant Tumours, and date of death (after yearly coupling to the Danish Civil Registration System. Data quality and completeness are currently secured. Conclusion: The Danish Sarcoma Database is population based and includes sarcomas occurring in Denmark since 2009. It is a valuable tool for monitoring sarcoma incidence and quality of treatment and its improvement, postoperative

  13. C# Database Basics

    CERN Document Server

    Schmalz, Michael

    2012-01-01

    Working with data and databases in C# certainly can be daunting if you're coming from VB6, VBA, or Access. With this hands-on guide, you'll shorten the learning curve considerably as you master accessing, adding, updating, and deleting data with C#-basic skills you need if you intend to program with this language. No previous knowledge of C# is necessary. By following the examples in this book, you'll learn how to tackle several database tasks in C#, such as working with SQL Server, building data entry forms, and using data in a web service. The book's code samples will help you get started

  14. Danish Palliative Care Database

    DEFF Research Database (Denmark)

    Grønvold, Mogens; Adsersen, Mathilde; Hansen, Maiken Bang

    2016-01-01

    Aims: The aim of the Danish Palliative Care Database (DPD) is to monitor, evaluate, and improve the clinical quality of specialized palliative care (SPC) (ie, the activity of hospital-based palliative care teams/departments and hospices) in Denmark. Study population: The study population is all...... patients were registered in DPD during the 5 years 2010–2014. Of those registered, 96% had cancer. Conclusion: DPD is a national clinical quality database for SPC having clinically relevant variables and high data and patient completeness....

  15. The CATH database

    Directory of Open Access Journals (Sweden)

    Knudsen Michael

    2010-02-01

    Full Text Available Abstract The CATH database provides hierarchical classification of protein domains based on their folding patterns. Domains are obtained from protein structures deposited in the Protein Data Bank and both domain identification and subsequent classification use manual as well as automated procedures. The accompanying website http://www.cathdb.info provides an easy-to-use entry to the classification, allowing for both browsing and downloading of data. Here, we give a brief review of the database, its corresponding website and some related tools.

  16. The Danish Anaesthesia Database

    DEFF Research Database (Denmark)

    Antonsen, Kristian; Rosenstock, Charlotte Vallentin; Lundstrøm, Lars Hyldborg

    2016-01-01

    AIM OF DATABASE: The aim of the Danish Anaesthesia Database (DAD) is the nationwide collection of data on all patients undergoing anesthesia. Collected data are used for quality assurance, quality development, and serve as a basis for research projects. STUDY POPULATION: The DAD was founded in 2004....... In addition, an annual DAD report is a benchmark for departments nationwide. CONCLUSION: The DAD is covering the anesthetic process for the majority of patients undergoing anesthesia in Denmark. Data in the DAD are increasingly used for both quality and research projects....

  17. MARKS ON ART database

    DEFF Research Database (Denmark)

    van Vlierden, Marieke; Wadum, Jørgen; Wolters, Margreet

    2016-01-01

    Mestermærker, monogrammer og kvalitetsmærker findes ofte præget eller stemplet på kunstværker fra 1300-1700. En illustreret database med denne typer mræker er under etablering på Nederlands Kunsthistoriske Institut (RKD) i Den Haag.......Mestermærker, monogrammer og kvalitetsmærker findes ofte præget eller stemplet på kunstværker fra 1300-1700. En illustreret database med denne typer mræker er under etablering på Nederlands Kunsthistoriske Institut (RKD) i Den Haag....

  18. The magnet database system

    International Nuclear Information System (INIS)

    Baggett, P.; Delagi, N.; Leedy, R.; Marshall, W.; Robinson, S.L.; Tompkins, J.C.

    1991-01-01

    This paper describes the current status of MagCom, a central database of SSC magnet information that is available to all magnet scientists via network connections. The database has been designed to contain the specifications and measured values of important properties for major materials, plus configuration information (specifying which individual items were used in each cable, coil, and magnet) and the test results on completed magnets. These data will help magnet scientists to track and control the production process and to correlate the performance of magnets with the properties of their constituents

  19. Yucca Mountain digital database

    International Nuclear Information System (INIS)

    Daudt, C.R.; Hinze, W.J.

    1992-01-01

    This paper discusses the Yucca Mountain Digital Database (DDB) which is a digital, PC-based geographical database of geoscience-related characteristics of the proposed high-level waste (HLW) repository site of Yucca Mountain, Nevada. It was created to provide the US Nuclear Regulatory Commission's (NRC) Advisory Committee on Nuclear Waste (ACNW) and its staff with a visual perspective of geological, geophysical, and hydrological features at the Yucca Mountain site as discussed in the Department of Energy's (DOE) pre-licensing reports

  20. ARTI refrigerant database

    Energy Technology Data Exchange (ETDEWEB)

    Calm, J.M.

    1998-03-15

    The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to thermophysical properties, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufacturers and those using alternative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air conditioning and refrigeration equipment. It also references documents addressing compatibility of refrigerants and lubricants with other materials.