WorldWideScience

Sample records for annotation databases improves

  1. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  2. SURFACE: a database of protein surface regions for functional annotation

    OpenAIRE

    Ferrè, Fabrizio; Ausiello, Gabriele; Zanzoni, Andreas; Helmer-Citterich, Manuela

    2004-01-01

    The SURFACE (SUrface Residues and Functions Annotated, Compared and Evaluated, URL http://cbm.bio.uniroma2.it/surface/) database is a repository of annotated and compared protein surface regions. SURFACE contains the results of a large-scale protein annotation and local structural comparison project. A non-redundant set of protein chains is used to build a database of protein surface patches, defined as putative surface functional sites. Each patch is annotated with sequence and structure-der...

  3. How well are protein structures annotated in secondary databases?

    Science.gov (United States)

    Rother, Kristian; Michalsky, Elke; Leser, Ulf

    2005-09-01

    We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.

  4. Product annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...ile name: kome_product_annotation.zip File URL: ftp://ftp.biosciencedbc.jp/archiv...ate History of This Database Site Policy | Contact Us Product annotations - KOME | LSDB Archive ...

  5. Believe It or Not: Adding Belief Annotations to Databases

    CERN Document Server

    Gatterbauer, Wolfgang; Khoussainova, Nodira; Suciu, Dan

    2009-01-01

    We propose a database model that allows users to annotate data with belief statements. Our motivation comes from scientific database applications where a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree on the information it should store. For example, Alice may believe that a tuple should be in the database, whereas Bob disagrees. He may also insert the reason why he thinks Alice believes the tuple should be in the database, and explain what he thinks the correct tuple should be instead. We propose a formal model for Belief Databases that interprets users' annotations as belief statements. These annotations can refer both to the base data and to other annotations. We give a formal semantics based on a fragment of multi-agent epistemic logic and define a query language over belief databases. We then prove a key technic...

  6. AUTOMATIC ANNOTATION OF QUERY RESULTS FROM DEEP WEB DATABASE

    OpenAIRE

    Chaitanya Bhosale

    2015-01-01

    In recent years, web database extraction and annotation has received more attention from the database . When search query is submitted to the interface the search result page is generated. Search Result Records (SRRs) are the result pages obtained from web database (WDB) and these SRRs are used to display the result for each query. Every SRRs contains multiple data units similar to one semantic. These sea rch results can be used in many web applic...

  7. Genome annotations - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ....zip File URL: ftp://ftp.biosciencedbc.jp/archive/kome/LATEST/kome_genome_annotat...e Update History of This Database Site Policy | Contact Us Genome annotations - KOME | LSDB Archive ...

  8. Large-scale annotation of small-molecule libraries using public databases.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A

    2007-01-01

    While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.

  9. DIDA: A curated and annotated digenic diseases database.

    Science.gov (United States)

    Gazzo, Andrea M; Daneels, Dorien; Cilia, Elisa; Bonduelle, Maryse; Abramowicz, Marc; Van Dooren, Sonia; Smits, Guillaume; Lenaerts, Tom

    2016-01-01

    DIDA (DIgenic diseases DAtabase) is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance. The database is accessible via http://dida.ibsquare.be and currently includes 213 digenic combinations involved in 44 different digenic diseases. These combinations are composed of 364 distinct variants, which are distributed over 136 distinct genes. The web interface provides browsing and search functionalities, as well as documentation and help pages, general database statistics and references to the original publications from which the data have been collected. The possibility to submit novel digenic data to DIDA is also provided. Creating this new repository was essential as current databases do not allow one to retrieve detailed records regarding digenic combinations. Genes, variants, diseases and digenic combinations in DIDA are annotated with manually curated information and information mined from other online resources. Next to providing a unique resource for the development of new analysis methods, DIDA gives clinical and molecular geneticists a tool to find the most comprehensive information on the digenic nature of their diseases of interest. PMID:26481352

  10. A Manual Curation Strategy to Improve Genome Annotation: Application to a Set of Haloarchael Genomes

    Directory of Open Access Journals (Sweden)

    Friedhelm Pfeiffer

    2015-06-01

    Full Text Available Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae. Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins. To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt, to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex.

  11. MannDB: A microbial annotation database for protein characterization

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Lam, M; Smith, J; Zemla, A; Dyer, M; Kuczmarski, T; Vitalis, E; Slezak, T

    2006-05-19

    MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high

  12. Web Database Query Interface Annotation Based on User Collaboration

    Institute of Scientific and Technical Information of China (English)

    LIU Wei; LIN Can; MENG Xiaofeng

    2006-01-01

    A vision based query interface annotation method is used to relate attributes and form elements in form-based web query interfaces, this method can reach accuracy of 82%.And a user participation method is used to tune the result; user can answer "yes" or "no" for existing annotations, or manually annotate form elements.Mass feedback is added to the annotation algorithm to produce more accurate result.By this approach, query interface annotation can reach a perfect accuracy.

  13. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  14. ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.

    Science.gov (United States)

    Zeng, Victor; Extavour, Cassandra G

    2012-01-01

    The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental

  15. Core Data of Yeast Interacting Proteins Database (Annotation Updated Version) - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...Yeast Interacting Proteins Database Core Data of Yeast Interacting Proteins Database (Annotation Updated Ver...sion) Data detail Data name Core Data of Yeast Interacting Proteins Database (Annotation Updated Version) De...ne name and description) is updated by the SGD (Saccharomyces Genome Database; http://www.yeastgenome.org/ ,...text file. Data file File name: core_updated.zip File URL: ftp://ftp.biosciencedbc.jp/archive/yeast_y2h/LATEST/core_update

  16. RiceDB: A Web-Based Integrated Database for Annotating Rice Microarray

    Institute of Scientific and Technical Information of China (English)

    HE Fei; SHI Qing-yun; CHEN Ming; WU Ping

    2007-01-01

    RiceDB, a web-based integrated database to annotate rice microarray in various biological contexts was developed. It is composed of eight modules. RiceMap module archives the process of Affymetrix probe sets mapping to different databases about rice, and aims to the genes represented by a microarray set by retrieving annotation information via the identifier or accession number of every database; RiceGO module indicates the association between a microarray set and gene ontology (GO) categories; RiceKO module is used to annotate a microarray set based on the KEGG biochemical pathways; RiceDO module indicates the information of domain associated with a microarray set; RiceUP module is used to obtain promoter sequences for all genes represented by a microarray set; RiceMR module lists potential microRNA which regulated the genes represented by a microarray set; RiceCD and RiceGF are used to annotate the genes represented by a microarray set in the context of chromosome distribution and rice paralogous family distribution. The results of automatic annotation are mostly consistent with manual annotation. Biological interpretation of the microarray data is quickened by the help of RiceDB.

  17. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  18. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  19. The Saccharomyces Genome Database: Exploring Genome Features and Their Annotations.

    Science.gov (United States)

    Cherry, J Michael

    2015-12-01

    Genomic-scale assays result in data that provide information over the entire genome. Such base pair resolution data cannot be summarized easily except via a graphical viewer. A genome browser is a tool that displays genomic data and experimental results as horizontal tracks. Genome browsers allow searches for a chromosomal coordinate or a feature, such as a gene name, but they do not allow searches by function or upstream binding site. Entry into a genome browser requires that you identify the gene name or chromosomal coordinates for a region of interest. A track provides a representation for genomic results and is displayed as a row of data shown as line segments to indicate regions of the chromosome with a feature. Another type of track presents a graph or wiggle plot that indicates the processed signal intensity computed for a particular experiment or set of experiments. Wiggle plots are typical for genomic assays such as the various next-generation sequencing methods (e.g., chromatin immunoprecipitation [ChIP]-seq or RNA-seq), where it represents a peak of DNA binding, histone modification, or the mapping of an RNA sequence. Here we explore the browser that has been built into the Saccharomyces Genome Database (SGD). PMID:26631126

  20. Non-redundant patent sequence databases with value-added annotations at two levels.

    Science.gov (United States)

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.

  1. Annotated checklist and database for vascular plants of the Jemez Mountains

    Energy Technology Data Exchange (ETDEWEB)

    Foxx, T. S.; Pierce, L.; Tierney, G. D.; Hansen, L. A.

    1998-03-01

    Studies done in the last 40 years have provided information to construct a checklist of the Jemez Mountains. The present database and checklist builds on the basic list compiled by Teralene Foxx and Gail Tierney in the early 1980s. The checklist is annotated with taxonomic information, geographic and biological information, economic uses, wildlife cover, revegetation potential, and ethnographic uses. There are nearly 1000 species that have been noted for the Jemez Mountains. This list is cross-referenced with the US Department of Agriculture Natural Resource Conservation Service PLANTS database species names and acronyms. All information will soon be available on a Web Page.

  2. High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example

    Directory of Open Access Journals (Sweden)

    Prasad Swati

    2008-07-01

    Full Text Available Abstract Background While the genomic annotations of diverse lineages of the Mycobacterium tuberculosis complex are available, divergences between gene prediction methods are still a challenge for unbiased protein dataset generation. M. tuberculosis gene annotation is an example, where the most used datasets from two independent institutions (Sanger Institute and Institute of Genomic Research-TIGR differ up to 12% in the number of annotated open reading frames, and 46% of the genes contained in both annotations have different start codons. Such differences emphasize the importance of the identification of the sequence of protein products to validate each gene annotation including its sequence coding area. Results With this objective, we submitted a culture filtrate sample from M. tuberculosis to a high-accuracy LTQ-Orbitrap mass spectrometer analysis and applied refined N-terminal prediction to perform comparison of two gene annotations. From a total of 449 proteins identified from the MS data, we validated 35 tryptic peptides that were specific to one of the two datasets, representing 24 different proteins. From those, 5 proteins were only annotated in the Sanger database. In the remaining proteins, the observed differences were due to differences in annotation of transcriptional start sites. Conclusion Our results indicate that, even in a less complex sample likely to represent only 10% of the bacterial proteome, we were still able to detect major differences between different gene annotation approaches. This gives hope that high-throughput proteomics techniques can be used to improve and validate gene annotations, and in particular for verification of high-throughput, automatic gene annotations.

  3. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component.

    Science.gov (United States)

    Cherry, J Michael

    2015-12-01

    An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology. PMID:26631125

  4. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component.

    Science.gov (United States)

    Cherry, J Michael

    2015-12-01

    An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology.

  5. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

    Directory of Open Access Journals (Sweden)

    Holt Carson

    2011-12-01

    Full Text Available Abstract Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  6. Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ID ID of a sequence Cluster ID ID of cluster. gclust_cluster is referenced. Annotation in original... database Annotation at the original website Species Species name Length Amino acid sequen

  7. Xylella fastidiosa comparative genomic database is an information resource to explore the annotation, genomic features, and biology of different strains

    Directory of Open Access Journals (Sweden)

    Alessandro M. Varani

    2012-01-01

    Full Text Available The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.

  8. Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters

    Directory of Open Access Journals (Sweden)

    Siddiqui Asim

    2005-10-01

    Full Text Available Abstract Background The sequencing and analysis of ESTs is for now the only practical approach for large-scale gene discovery and annotation in conifers because their very large genomes are unlikely to be sequenced in the near future. Our objective was to produce extensive collections of ESTs and cDNA clones to support manufacture of cDNA microarrays and gene discovery in white spruce (Picea glauca [Moench] Voss. Results We produced 16 cDNA libraries from different tissues and a variety of treatments, and partially sequenced 50,000 cDNA clones. High quality 3' and 5' reads were assembled into 16,578 consensus sequences, 45% of which represented full length inserts. Consensus sequences derived from 5' and 3' reads of the same cDNA clone were linked to define 14,471 transcripts. A large proportion (84% of the spruce sequences matched a pine sequence, but only 68% of the spruce transcripts had homologs in Arabidopsis or rice. Nearly all the sequences that matched the Populus trichocarpa genome (the only sequenced tree genome also matched rice or Arabidopsis genomes. We used several sequence similarity search approaches for assignment of putative functions, including blast searches against general and specialized databases (transcription factors, cell wall related proteins, Gene Ontology term assignation and Hidden Markov Model searches against PFAM protein families and domains. In total, 70% of the spruce transcripts displayed matches to proteins of known or unknown function in the Uniref100 database (blastx e-value Arabidopsis or rice genomes. Detailed analysis of translationally controlled tumour proteins and S-adenosylmethionine synthetase families confirmed a twofold size difference. Sequences and annotations were organized in a dedicated database, SpruceDB. Several search tools were developed to mine the data either based on their occurrence in the cDNA libraries or on functional annotations. Conclusion This report illustrates specific

  9. Improving tRNAscan-SE Annotation Results via Ensemble Classifiers.

    Science.gov (United States)

    Zou, Quan; Guo, Jiasheng; Ju, Ying; Wu, Meihong; Zeng, Xiangxiang; Hong, Zhiling

    2015-11-01

    tRNAScan-SE is a tRNA detection program that is widely used for tRNA annotation; however, the false positive rate of tRNAScan-SE is unacceptable for large sequences. Here, we used a machine learning method to try to improve the tRNAScan-SE results. A new predictor, tRNA-Predict, was designed. We obtained real and pseudo-tRNA sequences as training data sets using tRNAScan-SE and constructed three different tRNA feature sets. We then set up an ensemble classifier, LibMutil, to predict tRNAs from the training data. The positive data set of 623 tRNA sequences was obtained from tRNAdb 2009 and the negative data set was the false positive tRNAs predicted by tRNAscan-SE. Our in silico experiments revealed a prediction accuracy rate of 95.1 % for tRNA-Predict using 10-fold cross-validation. tRNA-Predict was developed to distinguish functional tRNAs from pseudo-tRNAs rather than to predict tRNAs from a genome-wide scan. However, tRNA-Predict can work with the output of tRNAscan-SE, which is a genome-wide scanning method, to improve the tRNAscan-SE annotation results. The tRNA-Predict web server is accessible at http://datamining.xmu.edu.cn/∼gjs/tRNA-Predict. PMID:27491037

  10. Semi-Automated Annotation of Biobank Data Using Standard Medical Terminologies in a Graph Database.

    Science.gov (United States)

    Hofer, Philipp; Neururer, Sabrina; Goebel, Georg

    2016-01-01

    Data describing biobank resources frequently contains unstructured free-text information or insufficient coding standards. (Bio-) medical ontologies like Orphanet Rare Diseases Ontology (ORDO) or the Human Disease Ontology (DOID) provide a high number of concepts, synonyms and entity relationship properties. Such standard terminologies increase quality and granularity of input data by adding comprehensive semantic background knowledge from validated entity relationships. Moreover, cross-references between terminology concepts facilitate data integration across databases using different coding standards. In order to encourage the use of standard terminologies, our aim is to identify and link relevant concepts with free-text diagnosis inputs within a biobank registry. Relevant concepts are selected automatically by lexical matching and SPARQL queries against a RDF triplestore. To ensure correctness of annotations, proposed concepts have to be confirmed by medical data administration experts before they are entered into the registry database. Relevant (bio-) medical terminologies describing diseases and phenotypes were identified and stored in a graph database which was tied to a local biobank registry. Concept recommendations during data input trigger a structured description of medical data and facilitate data linkage between heterogeneous systems. PMID:27577487

  11. SureChEMBL: a large-scale, chemically annotated patent document database.

    Science.gov (United States)

    Papadatos, George; Davies, Mark; Dedman, Nathan; Chambers, Jon; Gaulton, Anna; Siddle, James; Koks, Richard; Irvine, Sean A; Pettersson, Joe; Goncharoff, Nicko; Hersey, Anne; Overington, John P

    2016-01-01

    SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Access is available through a dedicated web-based interface and data downloads at: https://www.surechembl.org/. PMID:26582922

  12. miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations

    Directory of Open Access Journals (Sweden)

    Liu Hui

    2012-05-01

    Full Text Available Abstract Background Plant microRNAs (miRNAs have been revealed to play important roles in developmental control, hormone secretion, cell differentiation and proliferation, and response to environmental stresses. However, our knowledge about the regulatory mechanisms and functions of miRNAs remains very limited. The main difficulties lie in two aspects. On one hand, the number of experimentally validated miRNA targets is very limited and the predicted targets often include many false positives, which constrains us to reveal the functions of miRNAs. On the other hand, the regulation of miRNAs is known to be spatio-temporally specific, which increases the difficulty for us to understand the regulatory mechanisms of miRNAs. Description In this paper we present miRFANs, an online database for Arabidopsis thalianamiRNA function annotations. We integrated various type of datasets, including miRNA-target interactions, transcription factor (TF and their targets, expression profiles, genomic annotations and pathways, into a comprehensive database, and developed various statistical and mining tools, together with a user-friendly web interface. For each miRNA target predicted by psRNATarget, TargetAlign and UEA target-finder, or recorded in TarBase and miRTarBase, the effect of its up-regulated or down-regulated miRNA on the expression level of the target gene is evaluated by carrying out differential expression analysis of both miRNA and targets expression profiles acquired under the same (or similar experimental condition and in the same tissue. Moreover, each miRNA target is associated with gene ontology and pathway terms, together with the target site information and regulating miRNAs predicted by different computational methods. These associated terms may provide valuable insight for the functions of each miRNA. Conclusion First, a comprehensive collection of miRNA targets for Arabidopsis thaliana provides valuable information about the functions of

  13. Improving decoy databases for protein folding algorithms

    KAUST Repository

    Lindsey, Aaron

    2014-01-01

    Copyright © 2014 ACM. Predicting protein structures and simulating protein folding are two of the most important problems in computational biology today. Simulation methods rely on a scoring function to distinguish the native structure (the most energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing redundant structures. We test our approach on 17 different decoy databases of varying size and type and show significant improvement across a variety of metrics. We also test our improved databases on a popular modern scoring function and show that they contain a greater number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions.

  14. A database for coconut crop improvement

    OpenAIRE

    Rajagopal, Velamoor; Manimekalai, Ramaswamy; Devakumar, Krishnamurthy; Rajesh; Karun, Anitha; Niral, Vittal; Gopal, Murali; Aziz, Shamina; Gunasekaran, Marimuthu; Kumar, Mundappurathe Ramesh; Chandrasekar, Arumugam

    2005-01-01

    Coconut crop improvement requires a number of biotechnology and bioinformatics tools. A database containing information on CG (coconut germplasm), CCI (coconut cultivar identification), CD (coconut disease), MIFSPC (microbial information systems in plantation crops) and VO (vegetable oils) is described. The database was developed using MySQL and PostgreSQL running in Linux operating system. The database interface is developed in PHP, HTML and JAVA. Availability http://www.bioinfcpcri.org

  15. A database of annotated promoters of genes associated with common respiratory and related diseases

    KAUST Repository

    Chowdhary, Rajesh

    2012-07-01

    Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers. Copyright © 2012 by the American Thoracic Society.

  16. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Directory of Open Access Journals (Sweden)

    Leho Tedersoo

    Full Text Available Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/ for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/, the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  17. MITOS: improved de novo metazoan mitochondrial genome annotation.

    Science.gov (United States)

    Bernt, Matthias; Donath, Alexander; Jühling, Frank; Externbrink, Fabian; Florentz, Catherine; Fritzsch, Guido; Pütz, Joern; Middendorf, Martin; Stadler, Peter F

    2013-11-01

    About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.

  18. Annotated text databases in the context of the Kaj Munk corpus

    DEFF Research Database (Denmark)

    Sandborg-Petersen, Ulrik

    most of the demands which Doedens placed on an annotated text database system, as I show in Chapters 4, 5, and 14. The dissertation is structured as follows. Part I on “Theoretrical Foundations” encompasses Chapters 1 to 8. Chapter 1 introduces the topics of the thesis, and provides definitions of some...... of the main terms used in the dissertation. Chapter 2 provides a literature review. Chapter 3 provides a brief introduction to the topic of “ontology”, for use in later chapters. Chapter 4 introduces and discusses the EMdF model, while Chapter 5 does the same for the MQL query language. Chapter 6 introduces...... expressible in the EMdF model) on the one hand, and time on the other. Part II on “Applications” encompasses Chapters 9 to 13. Chapter 9 introduces Part II. Chapter 10 describes how the theoretical foundations laid in Part I can be used to implement the Kaj Munk Corpus in the EMdF model. Chapter 11 discusses...

  19. Improved annotation of conjugated bile acid hydrolase superfamily members in Gram-positive bacteria

    NARCIS (Netherlands)

    Lambert, J.M.; Siezen, R.J.; Vos, de W.M.; Kleerebezem, M.

    2008-01-01

    Most Gram-positive bacteria inhabiting the gastrointestinal tract are capable of hydrolysing bile salts. Bile salt hydrolysis is thought to play an important role in various biological processes in the host. Therefore, correct annotation of bacterial bile salt hydrolases (Bsh) in public databases (E

  20. Polymorphism Identification and Improved Genome Annotation of Brassica rapa Through Deep RNA Sequencing

    OpenAIRE

    Devisetty, Upendra Kumar; Covington, Michael F.; An V Tat; Lekkala, Saradadevi; Maloof, Julin N.

    2014-01-01

    The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system...

  1. Sequence ID and annotation information - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...s by Artio About This Database Database Description Download License Update History of This Database Site Po

  2. Improving HIV proteome annotation: new features of BioAfrica HIV Proteomics Resource.

    Science.gov (United States)

    Druce, Megan; Hulo, Chantal; Masson, Patrick; Sommer, Paula; Xenarios, Ioannis; Le Mercier, Philippe; De Oliveira, Tulio

    2016-01-01

    The Human Immunodeficiency Virus (HIV) is one of the pathogens that cause the greatest global concern, with approximately 35 million people currently infected with HIV. Extensive HIV research has been performed, generating a large amount of HIV and host genomic data. However, no effective vaccine that protects the host from HIV infection is available and HIV is still spreading at an alarming rate, despite effective antiretroviral (ARV) treatment. In order to develop effective therapies, we need to expand our knowledge of the interaction between HIV and host proteins. In contrast to virus proteins, which often rapidly evolve drug resistance mutations, the host proteins are essentially invariant within all humans. Thus, if we can identify the host proteins needed for virus replication, such as those involved in transporting viral proteins to the cell surface, we have a chance of interrupting viral replication. There is no proteome resource that summarizes this interaction, making research on this subject a difficult enterprise. In order to fill this gap in knowledge, we curated a resource presents detailed annotation on the interaction between the HIV proteome and host proteins. Our resource was produced in collaboration with ViralZone and used manual curation techniques developed by UniProtKB/Swiss-Prot. Our new website also used previous annotations of the BioAfrica HIV-1 Proteome Resource, which has been accessed by approximately 10 000 unique users a year since its inception in 2005. The novel features include a dedicated new page for each HIV protein, a graphic display of its function and a section on its interaction with host proteins. Our new webpages also add information on the genomic location of each HIV protein and the position of ARV drug resistance mutations. Our improved BioAfrica HIV-1 Proteome Resource fills a gap in the current knowledge of biocuration.Database URL:http://www.bioafrica.net/proteomics/HIVproteome.html. PMID:27087306

  3. The IMM Face Database - An Annotated Dataset of 240 Face Images

    DEFF Research Database (Denmark)

    Nordstrøm, Michael M.; Larsen, Mads; Sierakowski, Janusz;

    This note describes a dataset consisting of 240 annotated monocular images of 40 different human faces. Points of correspondence are placed on each image so the dataset can be readily used for building statistical models of shape. Format specifications and terms of use are also given in this note....

  4. ASPicDB: a database of annotated transcript and protein variants generated by alternative splicing

    Science.gov (United States)

    Martelli, Pier L.; D’Antonio, Mattia; Bonizzoni, Paola; Castrignanò, Tiziana; D’Erchia, Anna M.; D’Onorio De Meo, Paolo; Fariselli, Piero; Finelli, Michele; Licciulli, Flavio; Mangiulli, Marina; Mignone, Flavio; Pavesi, Giulio; Picardi, Ernesto; Rizzi, Raffaella; Rossi, Ivan; Valletti, Alessio; Zauli, Andrea; Zambelli, Federico; Casadio, Rita; Pesole, Graziano

    2011-01-01

    Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/. PMID:21051348

  5. Recent improvements to the SMART domain-based sequence annotation resource.

    Science.gov (United States)

    Letunic, Ivica; Goodstadt, Leo; Dickens, Nicholas J; Doerks, Tobias; Schultz, Joerg; Mott, Richard; Ciccarelli, Francesca; Copley, Richard R; Ponting, Chris P; Bork, Peer

    2002-01-01

    SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk. PMID:11752305

  6. Recent improvements to the SMART domain-based sequence annotation resource

    Science.gov (United States)

    Letunic, Ivica; Goodstadt, Leo; Dickens, Nicholas J.; Doerks, Tobias; Schultz, Joerg; Mott, Richard; Ciccarelli, Francesca; Copley, Richard R.; Ponting, Chris P.; Bork, Peer

    2002-01-01

    SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users’ documents. A SMART mirror has been created at http://smart.ox.ac.uk. PMID:11752305

  7. Improvement of the European thermodynamic database NUCLEA

    Energy Technology Data Exchange (ETDEWEB)

    Brissoneau, L.; Journeau, C.; Piluso, P. [CEA Cadarache, DEN, F-13108 St Paul Les Durance (France); Bakardjieva, S. [Acad Sci Czech Republic, Inst Inorgan Chem, CZ-25068 Rez (Czech Republic); Barrachin, M. [Inst Radioprotect and Surete Nucl, St Paul Les Durance (France); Bechta, S. [NITI, Aleksandrov Res Inst Technol, Sosnovyi Bor (Russian Federation); Bottomley, D. [Commiss European Communities, Joint Res Ctr, Inst Transuranium Elements, D-76125 Karlsruhe (Germany); Cheynet, B.; Fischer, E. [Thermodata, F-38400 St Martin Dheres (France); Kiselova, M. [Nucl Res Inst UJV, Rez 25068 (Czech Republic); Mezentseva, L. [Russian Acad Sci, Inst Silicate Chem, St Petersburg (Russian Federation)

    2010-07-01

    Modelling of corium behaviour during a severe accident requires knowledge of the phases present at equilibrium for a given corium composition, temperature and pressure. The thermodynamic database NUCLEA in combination with a Gibbs Energy minimizer is the European reference tool to achieve this goal. This database has been improved thanks to the analysis of bibliographical data and to EU-funded experiments performed within the SARNET network, PLINIUS as well as the ISTC CORPHAD and EVAN projects. To assess the uncertainty range associated with Energy Dispersive X-ray analyses, a round-robin exercise has been launched in which a UO{sub 2}-containing corium-concrete interaction sample from VULCANO has been analyzed by three European laboratories with satisfactorily small differences. (authors)

  8. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    Science.gov (United States)

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  9. Citrus sinensis Annotation Project (CAP): A Comprehensive Database for Sweet Orange Genome

    OpenAIRE

    Wang, Jia; Chen, DiJun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-...

  10. A kingdom-specific protein domain HMM library for improved annotation of fungal genomes

    Directory of Open Access Journals (Sweden)

    Oliver Stephen G

    2007-04-01

    Full Text Available Abstract Background Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs, which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. Results Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. Conclusion Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam

  11. HIVBrainSeqDB: a database of annotated HIV envelope sequences from brain and other anatomical sites

    Directory of Open Access Journals (Sweden)

    O'Connor Niall

    2010-12-01

    Full Text Available Abstract Background The population of HIV replicating within a host consists of independently evolving and interacting sub-populations that can be genetically distinct within anatomical compartments. HIV replicating within the brain causes neurocognitive disorders in up to 20-30% of infected individuals and is a viral sanctuary site for the development of drug resistance. The primary determinant of HIV neurotropism is macrophage tropism, which is primarily determined by the viral envelope (env gene. However, studies of genetic aspects of HIV replicating in the brain are hindered because existing repositories of HIV sequences are not focused on neurotropic virus nor annotated with neurocognitive and neuropathological status. To address this need, we constructed the HIV Brain Sequence Database. Results The HIV Brain Sequence Database is a public database of HIV envelope sequences, directly sequenced from brain and other tissues from the same patients. Sequences are annotated with clinical data including viral load, CD4 count, antiretroviral status, neurocognitive impairment, and neuropathological diagnosis, all curated from the original publication. Tissue source is coded using an anatomical ontology, the Foundational Model of Anatomy, to capture the maximum level of detail available, while maintaining ontological relationships between tissues and their subparts. 44 tissue types are represented within the database, grouped into 4 categories: (i brain, brainstem, and spinal cord; (ii meninges, choroid plexus, and CSF; (iii blood and lymphoid; and (iv other (bone marrow, colon, lung, liver, etc. Patient coding is correlated across studies, allowing sequences from the same patient to be grouped to increase statistical power. Using Cytoscape, we visualized relationships between studies, patients and sequences, illustrating interconnections between studies and the varying depth of sequencing, patient number, and tissue representation across studies

  12. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  13. Citrus sinensis annotation project (CAP: a comprehensive database for sweet orange genome.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia, and constructed the Citrus sinensis annotation project (CAP to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  14. Enhanced oil recovery using improved aqueous fluid-injection methods: an annotated bibliography. [328 citations

    Energy Technology Data Exchange (ETDEWEB)

    Meister, M.J.; Kettenbrink, G.K.; Collins, A.G.

    1976-10-01

    This annotated bibliography contains abstracts, prepared by the authors, of articles published between 1968 and early 1976 on tests of improved aqueous fluid injection methods (i.e., polymer and surfactant floods). The abstracts have been written and organized to facilitate studies of the oil recovery potential of polymer and surfactant floods under known reservoir conditions. 328 citations.

  15. Improving Automated Annotation of Benthic Survey Images Using Wide-band Fluorescence

    Science.gov (United States)

    Beijbom, Oscar; Treibitz, Tali; Kline, David I.; Eyal, Gal; Khen, Adi; Neal, Benjamin; Loya, Yossi; Mitchell, B. Greg; Kriegman, David

    2016-03-01

    Large-scale imaging techniques are used increasingly for ecological surveys. However, manual analysis can be prohibitively expensive, creating a bottleneck between collected images and desired data-products. This bottleneck is particularly severe for benthic surveys, where millions of images are obtained each year. Recent automated annotation methods may provide a solution, but reflectance images do not always contain sufficient information for adequate classification accuracy. In this work, the FluorIS, a low-cost modified consumer camera, was used to capture wide-band wide-field-of-view fluorescence images during a field deployment in Eilat, Israel. The fluorescence images were registered with standard reflectance images, and an automated annotation method based on convolutional neural networks was developed. Our results demonstrate a 22% reduction of classification error-rate when using both images types compared to only using reflectance images. The improvements were large, in particular, for coral reef genera Platygyra, Acropora and Millepora, where classification recall improved by 38%, 33%, and 41%, respectively. We conclude that convolutional neural networks can be used to combine reflectance and fluorescence imagery in order to significantly improve automated annotation accuracy and reduce the manual annotation bottleneck.

  16. T3DB: a comprehensively annotated database of common toxins and their targets.

    Science.gov (United States)

    Lim, Emilia; Pon, Allison; Djoumbou, Yannick; Knox, Craig; Shrivastava, Savita; Guo, An Chi; Neveu, Vanessa; Wishart, David S

    2010-01-01

    In an effort to capture meaningful biological, chemical and mechanistic information about clinically relevant, commonly encountered or important toxins, we have developed the Toxin and Toxin-Target Database (T3DB). The T3DB is a unique bioinformatics resource that compiles comprehensive information about common or ubiquitous toxins and their toxin-targets into a single electronic repository. The database currently contains over 2900 small molecule and peptide toxins, 1300 toxin-targets and more than 33,000 toxin-target associations. Each T3DB record (ToxCard) contains over 80 data fields providing detailed information on chemical properties and descriptors, toxicity values, protein and gene sequences (for both targets and toxins), molecular and cellular interaction data, toxicological data, mechanistic information and references. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, government documents, textbooks and scientific journals. A key focus of the T3DB is on providing 'depth' over 'breadth' with detailed descriptions, mechanisms of action, and information on toxins and toxin-targets. T3DB is fully searchable and supports extensive text, sequence, chemical structure and relational query searches, similar to those found in the Human Metabolome Database (HMDB) and DrugBank. Potential applications of the T3DB include clinical metabolomics, toxin target prediction, toxicity prediction and toxicology education. The T3DB is available online at http://www.t3db.org. PMID:19897546

  17. Modeling Loosely Annotated Images with Imagined Annotations

    CERN Document Server

    Tang, Hong; Chen, Yunhao

    2008-01-01

    In this paper, we present an approach to learning latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: (1) ambiguous correspondences between visual features and annotated keywords; (2) incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning topic models. In particular, some imagined keywords are poured into the incomplete annotation through measuring similarity between keywords. Then, both given and imagined annotations are used to learning probabilistic topic models for automatically annotating new images. We conduct experiments on a typical Corel dataset of images and loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods (using a set of discrete blobs to represent an image). The proposed method improves word-driven probability Latent Semantic Analysis (PLSA-words) up to ...

  18. InterMitoBase: An annotated database and analysis platform of protein-protein interactions for human mitochondria

    Directory of Open Access Journals (Sweden)

    Zhang Chenyu

    2011-06-01

    Full Text Available Abstract Background The mitochondrion is an essential organelle which plays important roles in diverse biological processes, such as metabolism, apoptosis, signal transduction and cell cycle. Characterizing protein-protein interactions (PPIs that execute mitochondrial functions is fundamental in understanding the mechanisms underlying biological functions and diseases associated with mitochondria. Investigations examining mitochondria are expanding to the system level because of the accumulation of mitochondrial proteomes and human interactome. Consequently, the development of a database that provides the entire protein interaction map of the human mitochondrion is urgently required. Results InterMitoBase provides a comprehensive interactome of human mitochondria. It contains the PPIs in biological pathways mediated by mitochondrial proteins, the PPIs between mitochondrial proteins and non-mitochondrial proteins as well as the PPIs between mitochondrial proteins. The current version of InterMitoBase covers 5,883 non-redundant PPIs of 2,813 proteins integrated from a wide range of resources including PubMed, KEGG, BioGRID, HPRD, DIP and IntAct. Comprehensive curations have been made on the interactions derived from PubMed. All the interactions in InterMitoBase are annotated according to the information collected from their original sources, GenBank and GO. Additionally, InterMitoBase features a user-friendly graphic visualization platform to present functional and topological analysis of PPI networks identified. This should aid researchers in the study of underlying biological properties. Conclusions InterMitoBase is designed as an integrated PPI database which provides the most up-to-date PPI information for human mitochondria. It also works as a platform by integrating several on-line tools for the PPI analysis. As an analysis platform and as a PPI database, InterMitoBase will be an important database for the study of mitochondria biochemistry

  19. A New Improved Algorithm for Distributed Databases

    OpenAIRE

    K. Karpagam; Balasubramanian, R

    2011-01-01

    The development of web, data stores from disparate sources has contributed to the growth of very large data sources and distributed systems. Large amounts of data are stored in distributed databases, since it is difficult to store these data in single place on account of communication, efficiency and security. Researches on mining association rules in distributed databases have more relevance in today’s world. Recently, as the need to mine patterns across distributed databases has grown, Dist...

  20. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development

    Directory of Open Access Journals (Sweden)

    Siew Woh Choo

    2014-01-01

    Full Text Available To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC tool, and pathogenomics profiling tool (PathoProT. The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.

  1. CrAgDb--a database of annotated chaperone repertoire in archaeal genomes.

    Science.gov (United States)

    Rani, Shikha; Srivastava, Abhishikha; Kumar, Manish; Goel, Manisha

    2016-03-01

    Chaperones are a diverse class of ubiquitous proteins that assist other cellular proteins in folding correctly and maintaining their native structure. Many different chaperones cooperate to constitute the 'proteostasis' machinery in the cells. It has been proposed earlier that archaeal organisms could be ideal model systems for deciphering the basic functioning of the 'protein folding machinery' in higher eukaryotes. Several chaperone families have been characterized in archaea over the years but mostly one protein at a time, making it difficult to decipher the composition and mechanistics of the protein folding system as a whole. In order to deal with these lacunae, we have developed a database of all archaeal chaperone proteins, CrAgDb (Chaperone repertoire in Archaeal genomes). The data have been presented in a systematic way with intuitive browse and search facilities for easy retrieval of information. Access to these curated datasets should expedite large-scale analysis of archaeal chaperone networks and significantly advance our understanding of operation and regulation of the protein folding machinery in archaea. Researchers could then translate this knowledge to comprehend the more complex protein folding pathways in eukaryotic systems. The database is freely available at http://14.139.227.92/mkumar/cragdb/. PMID:26862144

  2. EST sequences and their annotation (amino acid sequence and results of homology search) - Dicty_cDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available and VS) derived from five developmental stages. Clone ID ID of cDNA clone Atlas ID ID of Atlas database ( ht...tp://dictycdb.biol.tsukuba.ac.jp/~tools/bin/ISH/index.html ) and link to Atlas database NBRP ID ID of cDNA c...ir annotations (amino acid sequence, homology search results (with target DBs: dicty EST-DB, DNA-DB and prot...ein-DB)). Links to the Atlas database ( http://dictycdb.biol.tsukuba.ac.jp/~tools/bin/ISH/index.html ), whic

  3. EnzDP: improved enzyme annotation for metabolic network reconstruction based on domain composition profiles.

    Science.gov (United States)

    Nguyen, Nam-Ninh; Srihari, Sriganesh; Leong, Hon Wai; Chong, Ket-Fah

    2015-10-01

    Determining the entire complement of enzymes and their enzymatic functions is a fundamental step for reconstructing the metabolic network of cells. High quality enzyme annotation helps in enhancing metabolic networks reconstructed from the genome, especially by reducing gaps and increasing the enzyme coverage. Currently, structure-based and network-based approaches can only cover a limited number of enzyme families, and the accuracy of homology-based approaches can be further improved. Bottom-up homology-based approach improves the coverage by rebuilding Hidden Markov Model (HMM) profiles for all known enzymes. However, its clustering procedure relies firmly on BLAST similarity score, ignoring protein domains/patterns, and is sensitive to changes in cut-off thresholds. Here, we use functional domain architecture to score the association between domain families and enzyme families (Domain-Enzyme Association Scoring, DEAS). The DEAS score is used to calculate the similarity between proteins, which is then used in clustering procedure, instead of using sequence similarity score. We improve the enzyme annotation protocol using a stringent classification procedure, and by choosing optimal threshold settings and checking for active sites. Our analysis shows that our stringent protocol EnzDP can cover up to 90% of enzyme families available in Swiss-Prot. It achieves a high accuracy of 94.5% based on five-fold cross-validation. EnzDP outperforms existing methods across several testing scenarios. Thus, EnzDP serves as a reliable automated tool for enzyme annotation and metabolic network reconstruction. Available at: www.comp.nus.edu.sg/~nguyennn/EnzDP . PMID:26542446

  4. Novel transcriptome assembly and improved annotation of the whiteleg shrimp (Litopenaeus vannamei), a dominant crustacean in global seafood mariculture

    OpenAIRE

    Ghaffari, Noushin; Sanchez-Flores, Alejandro; Doan, Ryan; Garcia-Orozco, Karina D.; Chen, Patricia L.; Ochoa-Leyva, Adrian; Lopez-Zavala, Alonso A.; Carrasco, J. Salvador; Hong, Chris; Brieba, Luis G.; Rudiño-Piñera, Enrique; Blood, Philip D; Jason E. Sawyer; Charles D Johnson; Dindot, Scott V.

    2014-01-01

    We present a new transcriptome assembly of the Pacific whiteleg shrimp (Litopenaeus vannamei), the species most farmed for human consumption. Its functional annotation, a substantial improvement over previous ones, is provided freely. RNA-Seq with Illumina HiSeq technology was used to analyze samples extracted from shrimp abdominal muscle, hepatopancreas, gills and pleopods. We used the Trinity and Trinotate software suites for transcriptome assembly and annotation, respectively. The quality ...

  5. SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals.

    Science.gov (United States)

    Nikolaichik, Yevgeny; Damienikan, Aliaksandr U

    2016-01-01

    The majority of bacterial genome annotations are currently automated and based on a 'gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn't fit with regulatory information allowed us to correct product and gene names for over 300 loci.

  6. SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals.

    Science.gov (United States)

    Nikolaichik, Yevgeny; Damienikan, Aliaksandr U

    2016-01-01

    The majority of bacterial genome annotations are currently automated and based on a 'gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn't fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541

  7. SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals

    Science.gov (United States)

    Damienikan, Aliaksandr U.

    2016-01-01

    The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541

  8. Creating Annotation Tools with the Annotation Graph Toolkit

    OpenAIRE

    Maeda, Kazuaki; Bird, Steven; Ma, Xiaoyi; Lee, Haejoong

    2002-01-01

    The Annotation Graph Toolkit is a collection of software supporting the development of annotation tools based on the annotation graph model. The toolkit includes application programming interfaces for manipulating annotation graph data and for importing data from other formats. There are interfaces for the scripting languages Tcl and Python, a database interface, specialized graphical user interfaces for a variety of annotation tasks, and several sample applications. This paper describes all ...

  9. Improving integrative searching of systems chemical biology data using semantic annotation

    Directory of Open Access Journals (Sweden)

    Chen Bin

    2012-03-01

    Full Text Available Abstract Background Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. Results We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i simplifies the process of building SPARQL queries, (ii enables useful new kinds of queries on the data and (iii makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Availability Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  10. Annotation of novel neuropeptide precursors in the migratory locust based on transcript screening of a public EST database and mass spectrometry

    Directory of Open Access Journals (Sweden)

    De Loof Arnold

    2006-08-01

    Full Text Available Abstract Background For holometabolous insects there has been an explosion of proteomic and peptidomic information thanks to large genome sequencing projects. Heterometabolous insects, although comprising many important species, have been far less studied. The migratory locust Locusta migratoria, a heterometabolous insect, is one of the most infamous agricultural pests. They undergo a well-known and profound phase transition from the relatively harmless solitary form to a ferocious gregarious form. The underlying regulatory mechanisms of this phase transition are not fully understood, but it is undoubtedly that neuropeptides are involved. However, neuropeptide research in locusts is hampered by the absence of genomic information. Results Recently, EST (Expressed Sequence Tag databases from Locusta migratoria were constructed. Using bioinformatical tools, we searched these EST databases specifically for neuropeptide precursors. Based on known locust neuropeptide sequences, we confirmed the sequence of several previously identified neuropeptide precursors (i.e. pacifastin-related peptides, which consolidated our method. In addition, we found two novel neuroparsin precursors and annotated the hitherto unknown tachykinin precursor. Besides one of the known tachykinin peptides, this EST contained an additional tachykinin-like sequence. Using neuropeptide precursors from Drosophila melanogaster as a query, we succeeded in annotating the Locusta neuropeptide F, allatostatin-C and ecdysis-triggering hormone precursor, which until now had not been identified in locusts or in any other heterometabolous insect. For the tachykinin precursor, the ecdysis-triggering hormone precursor and the allatostatin-C precursor, translation of the predicted neuropeptides in neural tissues was confirmed with mass spectrometric techniques. Conclusion In this study we describe the annotation of 6 novel neuropeptide precursors and the neuropeptides they encode from the

  11. Improving Network Scalability Using NoSql Database

    Directory of Open Access Journals (Sweden)

    Ankita Bhatewara , Kalyani Waghmare

    2012-12-01

    Full Text Available The traditional database is designed for the structured data and the complex query. In the environment of the cloud, the scale of data is very large, the data is non-structured, the request of the data is dynamic, these characteristics raise new challenges for the data storage and administration, in this context, the NoSQL database comes into picture. This paper discusses about some nonstructured databases. It also discusses advantages and disadvantages of Cassandra and howCassandra is used to improve the scalability of thenetwork compared to RDBMS

  12. Image Annotation and Database Mining to Create a Novel Screen for the Chemotype-Dependent Crystallization of HCV NS3 Protease

    Energy Technology Data Exchange (ETDEWEB)

    H Klei; K Kish; M Russo; S Michalczyk; M Cahn; J Tredup; C Chang; J Khan; E Baldwin

    2011-12-31

    An effective process for screening, imaging, and optimizing crystallization trials using a combination of external and internal hardware and software has been deployed. The combination of this infrastructure with a vast annotated crystallization database enables the creation of custom crystallization screening strategies. Because of the strong chemotype-dependent crystallization observed with HCV NS3 protease (HCVPr), this strategy was applied to a chemotype resistant to all prior crystallization efforts. The crystallization database was mined for ingredients used to generate earlier HCVPr/inhibitor co-crystals. A random screen was created from the most prolific ingredients. A previously untested combination of proven ingredients was identified that led to a successful crystallization condition for the resistant chemotype.

  13. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Science.gov (United States)

    Ely, Bert; Scott, LaTia Etheredge

    2014-01-01

    Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  14. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  15. Fedora Content Modelling for Improved Services for Research Databases

    DEFF Research Database (Denmark)

    Elbæk, Mikael Karstensen; Heller, Alfred; Pedersen, Gert Schmeltz

    A re-implementation of the research database of the Technical University of Denmark, DTU, is based on Fedora. The backbone consists of content models for primary and secondary entities and their relationships, giving flexible and powerful extraction capabilities for interoperability and reporting....... By adopting such an abstract data model, the platform enables new and improved services for researchers, librarians and administrators....

  16. An annotated bibliography on the Copper River Delta with emphasis on waterfowl habitat management and improvements

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Annotated, coded bibliography based on holdings of Chugach National Forest office files, Alaska Dept. of Fish & Game Cordova office files, University of Alaska...

  17. Recent improvements to the SMART domain-based sequence annotation resource

    OpenAIRE

    Letunic, I.; Goodstadt, L.; Dickens, N.J.; Doerks, T.; Schultz, J; R. Mott; Ciccarelli, F.; Copley, R. R.; Ponting, C. P.; Bork, P.

    2002-01-01

    SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models....

  18. Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation

    Energy Technology Data Exchange (ETDEWEB)

    Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcin P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew W.; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

    2008-10-27

    Hypothetical and conserved hypothetical genes account for>30percent of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved hypothetical (9.5percent) along with 887 hypothetical genes (24.4percent). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 hypothetical and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC-MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. 1212 of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations.

  19. GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations.

    Science.gov (United States)

    Sangrador-Vegas, Amaia; Mitchell, Alex L; Chang, Hsin-Yu; Yong, Siew-Yit; Finn, Robert D

    2016-01-01

    The removal of annotation from biological databases is often perceived as an indicator of erroneous annotation. As a corollary, annotation stability is considered to be a measure of reliability. However, diverse data-driven events can affect the stability of annotations in both primary protein sequence databases and the protein family databases that are built upon the sequence databases and used to help annotate them. Here, we describe some of these events and their consequences for the InterPro database, and demonstrate that annotation removal or reassignment is not always linked to incorrect annotation by the curator. Database URL: http://www.ebi.ac.uk/interpro.

  20. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  1. AraPPISite: a database of fine-grained protein-protein interaction site annotations for Arabidopsis thaliana.

    Science.gov (United States)

    Li, Hong; Yang, Shiping; Wang, Chuan; Zhou, Yuan; Zhang, Ziding

    2016-09-01

    Knowledge about protein interaction sites provides detailed information of protein-protein interactions (PPIs). To date, nearly 20,000 of PPIs from Arabidopsis thaliana have been identified. Nevertheless, the interaction site information has been largely missed by previously published PPI databases. Here, AraPPISite, a database that presents fine-grained interaction details for A. thaliana PPIs is established. First, the experimentally determined 3D structures of 27 A. thaliana PPIs are collected from the Protein Data Bank database and the predicted 3D structures of 3023 A. thaliana PPIs are modeled by using two well-established template-based docking methods. For each experimental/predicted complex structure, AraPPISite not only provides an interactive user interface for browsing interaction sites, but also lists detailed evolutionary and physicochemical properties of these sites. Second, AraPPISite assigns domain-domain interactions or domain-motif interactions to 4286 PPIs whose 3D structures cannot be modeled. In this case, users can easily query protein interaction regions at the sequence level. AraPPISite is a free and user-friendly database, which does not require user registration or any configuration on local machines. We anticipate AraPPISite can serve as a helpful database resource for the users with less experience in structural biology or protein bioinformatics to probe the details of PPIs, and thus accelerate the studies of plant genetics and functional genomics. AraPPISite is available at http://systbio.cau.edu.cn/arappisite/index.html . PMID:27338257

  2. AraPPISite: a database of fine-grained protein-protein interaction site annotations for Arabidopsis thaliana.

    Science.gov (United States)

    Li, Hong; Yang, Shiping; Wang, Chuan; Zhou, Yuan; Zhang, Ziding

    2016-09-01

    Knowledge about protein interaction sites provides detailed information of protein-protein interactions (PPIs). To date, nearly 20,000 of PPIs from Arabidopsis thaliana have been identified. Nevertheless, the interaction site information has been largely missed by previously published PPI databases. Here, AraPPISite, a database that presents fine-grained interaction details for A. thaliana PPIs is established. First, the experimentally determined 3D structures of 27 A. thaliana PPIs are collected from the Protein Data Bank database and the predicted 3D structures of 3023 A. thaliana PPIs are modeled by using two well-established template-based docking methods. For each experimental/predicted complex structure, AraPPISite not only provides an interactive user interface for browsing interaction sites, but also lists detailed evolutionary and physicochemical properties of these sites. Second, AraPPISite assigns domain-domain interactions or domain-motif interactions to 4286 PPIs whose 3D structures cannot be modeled. In this case, users can easily query protein interaction regions at the sequence level. AraPPISite is a free and user-friendly database, which does not require user registration or any configuration on local machines. We anticipate AraPPISite can serve as a helpful database resource for the users with less experience in structural biology or protein bioinformatics to probe the details of PPIs, and thus accelerate the studies of plant genetics and functional genomics. AraPPISite is available at http://systbio.cau.edu.cn/arappisite/index.html .

  3. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    Directory of Open Access Journals (Sweden)

    Hamilton John P

    2007-10-01

    Full Text Available Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1 the submission of gene annotation to an annotation project, 2 the review of the submitted models by project annotators, and 3 the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP, an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the

  4. The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation

    Directory of Open Access Journals (Sweden)

    King Nichole L

    2009-02-01

    Full Text Available Abstract Background Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments. Results In this manuscript, we present the Drosophila melanogaster PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal http://www.drosophila-peptideatlas.org allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s in which it was observed. Conclusion PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1 reduction of the complexity inherently associated with performing targeted proteomic studies, (2 designing and accelerating shotgun proteomics experiments, (3 confirming or questioning gene models, and (4 adjusting gene models such that they are in line with observed Drosophila peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.

  5. The Plant Ontology Database: A Community Resource for Plant Structure and Developmental Stages Controlled Vocabulary and Annotations

    Science.gov (United States)

    The Plant Ontology Consortium (POC, http://www.plantontology.org) is a collaborative effort among model plant genome databases and plant researchers that aims to create, maintain and facilitate the use of a controlled vocabulary(ontology) for plants. The ontology allows users to ascribe attributes o...

  6. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    DEFF Research Database (Denmark)

    Vongsangnak, Wanwipa; Olsen, Peter; Hansen, Kim;

    2008-01-01

    Background: Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number...

  7. Improving knowledge management through the support of image examination and data annotation using DICOM structured reporting.

    Science.gov (United States)

    Torres, José Salavert; Damian Segrelles Quilis, J; Espert, Ignacio Blanquer; García, Vicente Hernandez

    2012-12-01

    An important effort has been invested on improving the image diagnosis process in different medical areas using information technologies. The field of medical imaging involves two main data types: medical imaging and reports. Developments based on the DICOM standard have demonstrated to be a convenient and widespread solution among the medical community. The main objective of this work is to design a Web application prototype that will be able to improve diagnosis and follow-on of breast cancer patients. It is based on TRENCADIS middleware, which provides a knowledge-oriented storage model composed by federated repositories of DICOM image studies and DICOM-SR medical reports. The full structure and contents of the diagnosis reports are used as metadata for indexing images. The TRENCADIS infrastructure takes full advantage of Grid technologies by deploying multi-resource grid services that enable multiple views (reports schemes) of the knowledge database. The paper presents a real deployment of such Web application prototype in the Dr. Peset Hospital providing radiologists with a tool to create, store and search diagnostic reports based on breast cancer explorations (mammography, magnetic resonance, ultrasound, pre-surgery biopsy and post-surgery biopsy), improving support for diagnostics decisions. A technical details for use cases (outlining enhanced multi-resource grid services communication and processing steps) and interactions between actors and the deployed prototype are described. As a result, information is more structured, the logic is clearer, network messages have been reduced and, in general, the system is more resistant to failures. PMID:22841747

  8. Improving knowledge management through the support of image examination and data annotation using DICOM structured reporting.

    Science.gov (United States)

    Torres, José Salavert; Damian Segrelles Quilis, J; Espert, Ignacio Blanquer; García, Vicente Hernandez

    2012-12-01

    An important effort has been invested on improving the image diagnosis process in different medical areas using information technologies. The field of medical imaging involves two main data types: medical imaging and reports. Developments based on the DICOM standard have demonstrated to be a convenient and widespread solution among the medical community. The main objective of this work is to design a Web application prototype that will be able to improve diagnosis and follow-on of breast cancer patients. It is based on TRENCADIS middleware, which provides a knowledge-oriented storage model composed by federated repositories of DICOM image studies and DICOM-SR medical reports. The full structure and contents of the diagnosis reports are used as metadata for indexing images. The TRENCADIS infrastructure takes full advantage of Grid technologies by deploying multi-resource grid services that enable multiple views (reports schemes) of the knowledge database. The paper presents a real deployment of such Web application prototype in the Dr. Peset Hospital providing radiologists with a tool to create, store and search diagnostic reports based on breast cancer explorations (mammography, magnetic resonance, ultrasound, pre-surgery biopsy and post-surgery biopsy), improving support for diagnostics decisions. A technical details for use cases (outlining enhanced multi-resource grid services communication and processing steps) and interactions between actors and the deployed prototype are described. As a result, information is more structured, the logic is clearer, network messages have been reduced and, in general, the system is more resistant to failures.

  9. Potential impacts of OCS oil and gas activities on fisheries. Volume 1. Annotated bibliography and database descriptions for target species distribution and abundance studies. Section 1, Part 1. Final report. [Outer Continental Shelf

    Energy Technology Data Exchange (ETDEWEB)

    Tear, L.M.

    1989-10-01

    The purpose of the volume is to present an annotated bibliography of unpublished and grey literature related to the distribution and abundance of select species of finfish and shellfish along the coasts of the United States. The volume also includes descriptions of databases that contain information related to target species' distribution and abundance. An index is provided at the end of each section to help the reader locate studies or databases related to a particular species.

  10. Potential impacts of OCS oil and gas activities on fisheries. Volume 1. Annotated bibliography and data-base descriptions for target-species distribution and abundance studies. Section 2. Final report. [Outer Continental Shelf

    Energy Technology Data Exchange (ETDEWEB)

    Tear, L.M.

    1989-10-01

    The purpose of the volume is to present an annotated bibliography of unpublished and grey literature related to the distribution and abundance of select species of finfish and shellfish along the coasts of the United States. The volume also includes description of databases that contain information related to target species' distribution and abundance. An index is provided at the end of each section to help the reader locate studies or databases related to a particular species.

  11. Potential impacts of OCS oil and gas activities on fisheries. Volume 1. Annotated bibliography and database descriptions for target-species distribution and abundance studies. Section 1, Part 2. Final report. [Outer Continental Shelf

    Energy Technology Data Exchange (ETDEWEB)

    Tear, L.M.

    1989-10-01

    The purpose of the volume is to present an annotated bibliography of unpublished and grey literature related to the distribution and abundance of select species of finfish and shellfish along the coasts of the United States. The volume also includes descriptions of databases that contain information related to target species' distribution and abundance. An index is provided at the end of each section to help the reader locate studies or databases related to a particular species.

  12. Annotated English

    CERN Document Server

    Hernandez-Orallo, Jose

    2010-01-01

    This document presents Annotated English, a system of diacritical symbols which turns English pronunciation into a precise and unambiguous process. The annotations are defined and located in such a way that the original English text is not altered (not even a letter), thus allowing for a consistent reading and learning of the English language with and without annotations. The annotations are based on a set of general rules that make the frequency of annotations not dramatically high. This makes the reader easily associate annotations with exceptions, and makes it possible to shape, internalise and consolidate some rules for the English language which otherwise are weakened by the enormous amount of exceptions in English pronunciation. The advantages of this annotation system are manifold. Any existing text can be annotated without a significant increase in size. This means that we can get an annotated version of any document or book with the same number of pages and fontsize. Since no letter is affected, the ...

  13. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    2012-07-12

    Abstract BACKGROUND: Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION: The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes

  14. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis.

    Science.gov (United States)

    Blohm, Philipp; Frishman, Goar; Smialowski, Pawel; Goebels, Florian; Wachinger, Benedikt; Ruepp, Andreas; Frishman, Dmitrij

    2014-01-01

    Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.

  15. New tools and improvements in the Exoplanet Transit Database

    Directory of Open Access Journals (Sweden)

    Pejcha O.

    2011-02-01

    Full Text Available Comprehensive collection of the available light curves, prediction possibilities and the online model fitting procedure, that are available via Exoplanet Transit Database became very popular in the community. In this paper we summarized the changes, that we made in the ETD during last year (including the Kepler candidates into the prediction section, modeling of an unknown planet in the model-fit section and some other small improvements. All this new tools cannot be found in the main ETD paper.

  16. Databases

    Data.gov (United States)

    National Aeronautics and Space Administration — The databases of computational and experimental data from the first Aeroelastic Prediction Workshop are located here. The databases file names tell their contents...

  17. Databases

    Digital Repository Service at National Institute of Oceanography (India)

    Kunte, P.D.

    Information on bibliographic as well as numeric/textual databases relevant to coastal geomorphology has been included in a tabular form. Databases cover a broad spectrum of related subjects like coastal environment and population aspects, coastline...

  18. Information Retrieval in Domain-Specific Databases: An Analysis To Improve the User Interface of the Alcohol Studies Database.

    Science.gov (United States)

    Jantz, Ronald

    2003-01-01

    Describes the methodology and results of the log analysis for the Alcohol Studies Database (ASDB), a domain-specific database supported by the Center of Alcohol Studies at Rutgers University Libraries. The objectives were to better understand user search behavior, to analyze failure rates, and to develop approaches for improving the user…

  19. AN ENCRYPTION ALGORITHM FOR IMPROVING DATABASE SECURITY USING ROT & REA

    Directory of Open Access Journals (Sweden)

    M. Sujitha

    2015-06-01

    Full Text Available Database is an organized collection of data, many user wants to store their personal and confidential data’s in such database. Unauthorized persons may try to get the data’s from database and misuse them without the owner’s knowledge. To overcome such problem the advanced control mechanism, known as Database security was introduced. Encryption algorithm is one of the way to give protection to the database from various threat or hackers who target to get confidential information. This paper discuss about the proposed encryption algorithm to give security to such database.

  20. Improving the reliability of material databases using multiscale approaches

    CERN Document Server

    Rollet, Y; Carrère, N; Leroy, F -H; Maire, J -F

    2007-01-01

    This article addresses the propagation of constitutive uncertainties between scales occurring in the multiscale modelling of fibre-reinforced composites. The amplification of such uncertainties through upward or downward transitions by a homogenisation model is emphasized and exemplified with the Mori-Tanaka model. In particular, the sensitivity to data uncertainty in the inverse determination of constituent parameters based on downward transitions is stressed on an example. Then a database improvement method, which exploits simultaneously the available information on constitutive uncertainties at all scales instead of just propagating those associated with one scale, is presented and shown to yield substantial reductions in uncertainty for both the constitutive parameters and the response of structures. The latter finding is demonstrated on two examples of structures, with significant gains in confidence obtained on both.

  1. Improvement WSD Dictionary Using Annotated Corpus and Testing it with Simplified Lesk Algorithm

    Directory of Open Access Journals (Sweden)

    Ahmed H. Aliwy

    2015-02-01

    Full Text Available WSD is a task with a long history in computational lin guistics. It is open problem in NLP. This research focuses on increasing the accuracy of Lesk algorithm with assistant of annotated corpus using Narodowy Korpus Jezyka Polskiego (NKJP “Polish National Corpus”. The NKJP_WSI (NKJP Word Sense Inventory is used as sen ses inventory. A Lesk algorithm is firstly implemented on the whole corpus (training a nd test and then getting the results. This is done with assistance of special dictionary that con tains all possible senses for each ambiguous word. In this implementation, the similarity equati on is applied to information retrieval using tf- idf with small modification in order to achieve the requirements. Experimental results show that the accuracy of 82.016% and 84.063% without and wit h deleting stop words respectively. Moreover, this paper practically solves the challen ge of an execution time. Therefore, we proposed special structure for building another dic tionary from the corpus in order to reduce time complicity of the training process. The new di ctionary contains all the possible words (only these which help us in solving WSD with their tf-i df from the existing dictionary with assistant of annotated corpus. Furthermore, eexperimental res ults show that the two tests are identical. The execution time - of the second test dropped dow n to 20 times compared to first test with same accuracy

  2. Databases

    Directory of Open Access Journals (Sweden)

    Nick Ryan

    2004-01-01

    Full Text Available Databases are deeply embedded in archaeology, underpinning and supporting many aspects of the subject. However, as well as providing a means for storing, retrieving and modifying data, databases themselves must be a result of a detailed analysis and design process. This article looks at this process, and shows how the characteristics of data models affect the process of database design and implementation. The impact of the Internet on the development of databases is examined, and the article concludes with a discussion of a range of issues associated with the recording and management of archaeological data.

  3. Leveraging Genomic Annotations and Pleiotropic Enrichment for Improved Replication Rates in Schizophrenia GWAS

    DEFF Research Database (Denmark)

    Wang, Yunpeng; Thompson, Wesley K; Schork, Andrew J;

    2016-01-01

    meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach......Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations...... to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both...

  4. The Ensembl gene annotation system.

    Science.gov (United States)

    Aken, Bronwen L; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J; Murphy, Daniel N; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul; Searle, Stephen M J

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html. PMID:27337980

  5. Radio Station Digital Audio Database Automatic Annotation and Retrieval Technology%广播电台数字音频资料库的自动标注及检索技术研究

    Institute of Scientific and Technical Information of China (English)

    王晓艳; 梁晋春; 姚颖颖; 马艳

    2013-01-01

    With the development of broadcasting,audio data of radio has been dramatically increased,the demands of automation and efficiem retrieval in the program content management of radio station are increasing.Based on the analysis of radio digital audio database construction and problems in content management,this paper focuses on the audio data metadata automatic annotation and retrieval of several key technologies involved,looking forward to the future it can be directly applied to the content management of broadcast programs,and improves work efficiency.%随着广播事业的发展,电台拥有音频数据量急剧增长,电台在节目内容管理自动化及有效检索方面的需求日益增加.在分析广播电台数字音频资料库建设及在内容管理方面问题的基础上,重点介绍音频资料元数据项自动标注及检索涉及的几项关键技术,期望将来能直接应用到广播节目的内容管理上,提高工作效率.

  6. Contig sequences and their annotation (amino acid sequence and results of homology search), and expression profile - Dicty_cDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us Dicty_cDB Contig sequence...s and their annotation (amino acid sequence and results of homology search), and ex...pression profile Data detail Data name Contig sequences and their annotation (amino acid sequence and result...s of homology search), and expression profile Description of data contents Contig sequences of cDNA sequen...ces of Dictyostelium discoideum and their annotation (amino acid sequence and resul

  7. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, D. [UCLA; Casero, D. [UCLA; Cokus, S. J. [UCLA; Merchant, S. S. [UCLA; Pellegrini, M. [UCLA

    2012-07-01

    The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.

  8. Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

    Directory of Open Access Journals (Sweden)

    Tie Hua Zhou

    2015-05-01

    Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.

  9. Next Generation Models for Storage and Representation of Microbial Biological Annotation

    Energy Technology Data Exchange (ETDEWEB)

    Quest, Daniel J [ORNL; Land, Miriam L [ORNL; Brettin, Thomas S [ORNL; Cottingham, Robert W [ORNL

    2010-01-01

    Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to

  10. Shared Context for Improving Collaboration in Database Administration

    Directory of Open Access Journals (Sweden)

    Hassane Tahir

    2013-05-01

    Full Text Available Database administrators (DBAs and experts face a large spectrum of procedures in order to ensure theongoing operational functionality and efficiency oftheir organization's databases and the applications thataccess those databases. Unfortunately, these procedures cannot be used directly in a multitude of specificsituations and contexts. To deal with situation specificity and contexts at hand, DBAs often cooperatewithdifferent actors such as developers, system administrators, data administrators, end users, etc. However,communication processes are often complex because (1 actors come from different domains different fromDBA’s domain (e.g. Business user (2 the context in which a database activity (e.g. incident solvingoccurs may not be shared and understood in the sameway by all actors. The paper presents how to makeshared context explicit in a cooperative work and an analysis of the cooperative work from the viewpoint ofone the actors. Making context explicit is possiblethrough a formalism allowing a uniform representationof elements of knowledge, reasoning and contexts, like Contextual-Graphs formalism. Currently, we aredeveloping an experience base that will be used bya system to support DBAs.

  11. Improved search of PCA databases for spectro-polarimetric inversion

    CERN Document Server

    Casini, R; Lites, B W; Ariste, A Lopez

    2013-01-01

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of $2^{4n}$ bits, where $n$ is the number of PCA orders used for the indexing. Each of these binary numbers (indexes) identifies a group of "compatible" models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of $2^{4n}$ as compared to the syste...

  12. WOVOdat, A Worldwide Volcano Unrest Database, to Improve Eruption Forecasts

    Science.gov (United States)

    Widiwijayanti, C.; Costa, F.; Win, N. T. Z.; Tan, K.; Newhall, C. G.; Ratdomopurbo, A.

    2015-12-01

    WOVOdat is the World Organization of Volcano Observatories' Database of Volcanic Unrest. An international effort to develop common standards for compiling and storing data on volcanic unrests in a centralized database and freely web-accessible for reference during volcanic crises, comparative studies, and basic research on pre-eruption processes. WOVOdat will be to volcanology as an epidemiological database is to medicine. Despite the large spectrum of monitoring techniques, the interpretation of monitoring data throughout the evolution of the unrest and making timely forecasts remain the most challenging tasks for volcanologists. The field of eruption forecasting is becoming more quantitative, based on the understanding of the pre-eruptive magmatic processes and dynamic interaction between variables that are at play in a volcanic system. Such forecasts must also acknowledge and express the uncertainties, therefore most of current research in this field focused on the application of event tree analysis to reflect multiple possible scenarios and the probability of each scenario. Such forecasts are critically dependent on comprehensive and authoritative global volcano unrest data sets - the very information currently collected in WOVOdat. As the database becomes more complete, Boolean searches, side-by-side digital and thus scalable comparisons of unrest, pattern recognition, will generate reliable results. Statistical distribution obtained from WOVOdat can be then used to estimate the probabilities of each scenario after specific patterns of unrest. We established main web interface for data submission and visualizations, and have now incorporated ~20% of worldwide unrest data into the database, covering more than 100 eruptive episodes. In the upcoming years we will concentrate in acquiring data from volcano observatories develop a robust data query interface, optimizing data mining, and creating tools by which WOVOdat can be used for probabilistic eruption

  13. Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Directory of Open Access Journals (Sweden)

    Dorrell Nick

    2007-06-01

    Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

  14. AcEST(EST sequences of Adiantum capillus-veneris and their annotation) - AcEST | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us AcEST AcEST(EST sequence...s of Adiantum capillus-veneris and their annotation) Data detail Data name AcEST(EST sequence...s of Adiantum capillus-veneris and their annotation) Description of data contents EST sequence of A...db/view/archive_acest#en Data acquisition method Capillary sequencer Data analysi...atabases) Number of data entries Adiantum capillus-veneris ESTs: 30,540. Data item Description Clone id Clone ID of EST sequence

  15. Annotated Answer Set Programming

    OpenAIRE

    Straccia, Umberto

    2005-01-01

    We present Annotated Answer Set Programming, that extends the ex pressive power of disjunctive logic programming with annotation terms, taken from the generalized annotated logic programming framework.

  16. Annotation Method (AM): SE22_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available ether with predicted molecular formulae and putative structures, were provided as metabolite annotations. Comparison with public data...bases was performed. A grading system was introduced to describe the evidence supporting the annotations. ...

  17. Genome re-annotation: a wiki solution?

    OpenAIRE

    Salzberg, Steven L.

    2007-01-01

    The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution.

  18. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences

    DEFF Research Database (Denmark)

    Huerta-Cepas, Jaime; Szklarczyk, Damian; Forslund, Kristoffer;

    2016-01-01

    eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making ...

  19. tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us tRNADB-CE tRNA sequence... data, annotation data and curation data Data detail Data name tRNA sequence data, an... first intron 1st Intron end position End position of first intron Seq tRNA sequence Upstream seq. tRNA gene upstream sequence...-leaf secondary structures of tRNA gene Downstream seq. tRNA gene downstream sequence (10 bps) 1st Intron seq. First intron sequence...nd position of second intron 2st Intron seq. Second intron sequence Decision from

  20. An Improved Algorithm for Generating Database Transactions from Relational Algebra Specifications

    CERN Document Server

    Dougherty, Daniel J

    2010-01-01

    Alloy is a lightweight modeling formalism based on relational algebra. In prior work with Fisler, Giannakopoulos, Krishnamurthi, and Yoo, we have presented a tool, Alchemy, that compiles Alloy specifications into implementations that execute against persistent databases. The foundation of Alchemy is an algorithm for rewriting relational algebra formulas into code for database transactions. In this paper we report on recent progress in improving the robustness and efficiency of this transformation.

  1. An Improved Algorithm for Generating Database Transactions from Relational Algebra Specifications

    OpenAIRE

    Dougherty, Daniel J.

    2010-01-01

    Alloy is a lightweight modeling formalism based on relational algebra. In prior work with Fisler, Giannakopoulos, Krishnamurthi, and Yoo, we have presented a tool, Alchemy, that compiles Alloy specifications into implementations that execute against persistent databases. The foundation of Alchemy is an algorithm for rewriting relational algebra formulas into code for database transactions. In this paper we report on recent progress in improving the robustness and efficiency of this transforma...

  2. An Improved Algorithm for Generating Database Transactions from Relational Algebra Specifications

    Directory of Open Access Journals (Sweden)

    Daniel J. Dougherty

    2010-03-01

    Full Text Available Alloy is a lightweight modeling formalism based on relational algebra. In prior work with Fisler, Giannakopoulos, Krishnamurthi, and Yoo, we have presented a tool, Alchemy, that compiles Alloy specifications into implementations that execute against persistent databases. The foundation of Alchemy is an algorithm for rewriting relational algebra formulas into code for database transactions. In this paper we report on recent progress in improving the robustness and efficiency of this transformation.

  3. Annotated bibliography

    International Nuclear Information System (INIS)

    Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners

  4. An on-line scaling method for improving scalability of a database cluster

    Institute of Scientific and Technical Information of China (English)

    JANG Yong-ll; LEE Chung-ho; LEE Jae-dong; BAE Hae-young

    2004-01-01

    The explosive growth of the Internet and database applications has driven database to be more scalable and available, and able to support on-line scaling without interrupting service. To support more client's queries without downtime and degrading the response time, more nodes have to be scaled up while the database is running. This paper presents the overview of scalable and available database that satisfies the above characteristics. And we propose a novel on-line scaling method. Our method improves the existing on-line scaling method for fast response time and higher throughputs. Our proposed method reduces unnecessary network use, i.e. , we decrease the number of data copy by reusing the backup data. Also, our on-line scaling operation can be processed parallel by selecting adequate nodes as new node. Our performance study shows that our method results in significant reduction in data copy time.

  5. Towards a Library of Standard Operating Procedures (SOPs) for (meta)genomic annotation

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Angiuoli, Samuel V.; Cochrane, Guy; Field, Dawn; Garrity, George; Gussman, Aaron; Kodira, Chinnappa D.; Klimke, William; Kyrpides, Nikos; Madupu, Ramana; Markowitz, Victor; Tatusova, Tatiana; Thomson, Nick; White, Owen

    2008-04-01

    Genome annotations describe the features of genomes and accompany sequences in genome databases. The methodologies used to generate genome annotation are diverse and typically vary amongst groups. Descriptions of the annotation procedure are helpful in interpreting genome annotation data. Standard Operating Procedures (SOPs) for genome annotation describe the processes that generate genome annotations. Some groups are currently documenting procedures but standards are lacking for structure and content of annotation SOPs. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse a central online repository of SOPs.

  6. Structured Max Margin Learning on Image Annotation and Multimodal Image Retrieval

    OpenAIRE

    Guo, Zhen; Xing, Eric P.; Faloutsos, Christos; Zhongfei,

    2010-01-01

    In this chapter, we discuss a multimodal framework on image annotation and retrieval based on the max margin approach. The whole problem is mapped to a quadratic programming problem. Our framework is highly scalable in the sense that it takes a constant time to accommodate the database updating without needing to retrain the database from the scratch. The evaluation result shows significant improvements on the performance over a state-of-the-art method.

  7. Lessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science

    Science.gov (United States)

    Thessen, Anne E.; McGinnis, Sean; North, Elizabeth W.

    2016-02-01

    Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically accessible databases which integrate these diverse data types in a georeferenced format, nor are there guidelines for developing such a database. The objective of this research was to analyze the process of building one such database to provide baseline information on data sources and data sharing and to document the challenges and solutions that arose during this major undertaking. The resulting Deepwater Horizon Database was approximately 2.4 GB in size and contained over 8 million georeferenced data points collected from industry, government databases, volunteer networks, and individual researchers. The major technical challenges that were overcome were reconciliation of terms, units, and quality flags which were necessary to effectively integrate the disparate data sets. Assembling this database required the development of relationships with individual researchers and data managers which often involved extensive e-mail contacts. The average number of emails exchanged per data set was 7.8. Of the 95 relevant data sets that were discovered, 38 (40%) were obtained, either in whole or in part. Over one third (36%) of the requests for data went unanswered. The majority of responses were received after the first request (64%) and within the first week of the first request (67%). Although fewer than half of the potentially relevant datasets were incorporated into the database, the level of sharing (40%) was high compared to some other disciplines where sharing can be as low as 10%. Our suggestions for building integrated databases include budgeting significant time for e-mail exchanges, being cognizant of the cost versus

  8. Improvement of the Oracle setup and database design at the Heidelberg ion therapy center

    International Nuclear Information System (INIS)

    The HIT (Heidelberg Ion Therapy) center is an accelerator facility for cancer therapy using both carbon ions and protons, located at the university hospital in Heidelberg. It provides three therapy treatment rooms: two with fixed beam exit (both in clinical use), and a unique gantry with a rotating beam head, currently under commissioning. The backbone of the proprietary accelerator control system consists of an Oracle database running on a Windows server, storing and delivering data of beam cycles, error logging, measured values, and the device parameters and beam settings for about 100,000 combinations of energy, beam size and particle rate used in treatment plans. Since going operational, we found some performance problems with the current database setup. Thus, we started an analysis that focused on the following topics: hardware resources of the database server, configuration of the Oracle instance, and a review of the database design that underwent several changes since its original design. The analysis revealed issues on all fields. The outdated server will be replaced by a state-of-the-art machine soon. We will present improvements of the Oracle configuration, the optimization of SQL statements, and the performance tuning of database design by adding new indexes which proved directly visible in accelerator operation, while data integrity was improved by additional foreign key constraints. (authors)

  9. Information Technologies in Public Health Management: A Database on Biocides to Improve Quality of Life

    Directory of Open Access Journals (Sweden)

    A Grigoriu

    2012-05-01

    Full Text Available Background: Biocides for prolonging the shelf life of a large variety of materials have been extensively used over the last decades. It has estimated that the worldwide biocide consumption to be about 12.4 billion dollars in 2011, and is expected to increase in 2012. As biocides are substances we get in contact with in our everyday lives, access to this type of information is of paramount importance in order to ensure an appropriate living environment. Consequently, a database where information may be quickly processed, sorted, and easily accessed, according to different search criteria, is the most desirable solution. The main aim of this work was to design and implement a relational database with complete information about biocides used in public health management to improve the quality of life.Methods: Design and implementation of a relational database for biocides, by using the software "phpMyAdmin".Results: A database, which allows for an efficient collection, storage, and management of information including chemical properties and applications of a large quantity of biocides, as well as its adequate dissemination into the public health environment.Conclusion: The information contained in the database herein presented promotes an adequate use of biocides, by means of information technologies, which in consequence may help achieve important improvement in our quality of life.

  10. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. PMID:23161678

  11. An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

    OpenAIRE

    Bell, Michael J; Colin S Gillespie; Swan, Daniel; Lord, Phillip

    2012-01-01

    Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases ...

  12. Database improvements of national student enrollment registry. Link external systems to NSER DB

    Directory of Open Access Journals (Sweden)

    Ph. D. Cosmin Catalin Olteanu

    2012-05-01

    Full Text Available The general idea is to improve NSER database to have a strong unique informatic system where all the data should be collected from all universities. Employers and all academic institutions can check someone’s background easy through a national portal just by log in. As a result of the paper, the author found that this system has it’s flows but can be improved.

  13. Curation of the genome annotation of Pichia pastoris (Komagataella phaffii) CBS7435 from gene level to protein function.

    Science.gov (United States)

    Valli, Minoska; Tatto, Nadine E; Peymann, Armin; Gruber, Clemens; Landes, Nils; Ekker, Heinz; Thallinger, Gerhard G; Mattanovich, Diethard; Gasser, Brigitte; Graf, Alexandra B

    2016-09-01

    As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function. PMID:27388471

  14. Adaptive Lockable Units to Improve Data Availability in a Distributed Database System

    Directory of Open Access Journals (Sweden)

    Khaled Maabreh

    2016-01-01

    Full Text Available Distributed database systems have become a phenomenon and have been considered a crucial source of information for numerous users. Users with different jobs are using such systems locally or via the Internet to meet their professional requirements. Distributed database systems consist of a number of sites connected over a computer network. Each site deals with its own database and interacts with other sites as needed. Data replication in these systems is considered a key factor in improving data availability. However, it may affect system performance when most of the transactions that access the data contain write or a mix of read and write operations because of exclusive locks and update propagation. This research proposes a new adaptive approach for increasing the availability of data contained in a distributed database system. The proposed approach suggests a new lockable unit by increasing the database hierarchy tree by one level to include attributes as lockable units instead of the entire row. This technique may allow several transactions to access the database row simultaneously by utilizing some attributes and keeping others available for other transactions. Data in a distributed database system can be accessed locally or remotely by a distributed transaction, with each distributed transaction decomposed into several sub-transactions called participants or agents. These agents access the data at multiple sites and must guarantee that any changes to the data must be committed in order to complete the main transaction. The experimental results show that using attribute-level locking will increase data availability, reliability, and throughput, as well as enhance overall system performance. Moreover, it will increase the overhead of managing such a large number of locks, which will be managed according to the qualification of the query.

  15. Identifying phosphopeptide by searching a site annotated protein database%一种位点注释的蛋白质数据库用于磷酸化肽段的鉴定

    Institute of Scientific and Technical Information of China (English)

    程凯; 王方军; 边阳阳; 叶明亮; 邹汉法

    2015-01-01

    Phosphoproteome analysis is one of the important research fields in proteomics. In shotgun proteomics,phosphopeptides could be identified directly by setting phosphorylation as variable modifications in database search. However,search space increases significantly when variable modifications are set in post-translation modifications( PTMs)analysis,which will decrease the identification sensitivity. Because setting a variable modification on a specific type of amino acid residue means all of this amino acid residues in the database might be modified, which is not consistent with actual conditions. Phosphorylation and dephosphorylation are regu-lated by protein kinases and phosphatases,which can only occur on particular substrates. Therefore only residues within specific sequence are potential sites which may be modified. To address this issue,we extracted the characteristic sequence from the identified phosphorylation sites and created an annotated database containing phosphorylation site information,which allowed the searching engine to set variable modifications only on the serine,threonine and tyrosine residues that were identified to be phosphorylated previously. In this database only annotated serine,threonine and tyrosine can be modified. This strategy significantly reduced the search space. The performance of this new database searching strategy was evaluated by searching different types of data with Mascot,and higher sensitivity for phosphopeptide identifi-cation was achieved with high reliability.%磷酸化修饰的分析一直是蛋白质组学研究的热点之一。在鸟枪法的蛋白质组学研究中,通过在数据库检索中设定磷酸化为可变修饰可以直接鉴定磷酸化修饰的位点。但是翻译后修饰的引入会增加数据检索空间,造成鉴定灵敏度的降低。为了解决这一问题,我们构建了一种位点注释的数据库,这种数据库包含蛋白质的磷酸化位点信息,并开发了一种新的

  16. The surplus value of semantic annotations

    NARCIS (Netherlands)

    M. Marx

    2010-01-01

    We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries, an

  17. Computer systems for annotation of single molecule fragments

    Energy Technology Data Exchange (ETDEWEB)

    Schwartz, David Charles; Severin, Jessica

    2016-07-19

    There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.

  18. Database Description - Q-TARO | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Q-TARO Database Description General information of database Database name Q-TARO Alternative name QTL Annotation Rice Online...extracted from 463 reports as representative QTLs. To arrange QTL information, we constructed QTL database (QTL Annotation Rice Onlin...chnology, QT-1006, and Genomics for Agricultural Innovation, GIR-1003). Reference(s) Article title: Q-TARO: QTL Annotation Rice Onlin

  19. Novel definition files for human GeneChips based on GeneAnnot

    Directory of Open Access Journals (Sweden)

    Ferrari Sergio

    2007-11-01

    Full Text Available Abstract Background Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. Results We developed a novel set of custom Chip Definition Files (CDF and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene. Conclusion GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results.

  20. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  1. Concept annotation in the CRAFT corpus

    Directory of Open Access Journals (Sweden)

    Bada Michael

    2012-07-01

    Full Text Available Abstract Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released. Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens, our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection, the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are

  2. Can Inferred Provenance and Its Visualisation Be Used to Detect Erroneous Annotation? A Case Study Using UniProtKB

    OpenAIRE

    Bell, Michael J.; Matthew Collison; Phillip Lord

    2013-01-01

    A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge; during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation...

  3. Retrieval-based Face Annotation by Weak Label Regularized Local Coordinate Coding.

    Science.gov (United States)

    Wang, Dayong; Hoi, Steven C H; He, Ying; Zhu, Jianke; Mei, Tao; Luo, Jiebo

    2013-08-01

    Retrieval-based face annotation is a promising paradigm of mining massive web facial images for automated face annotation. This paper addresses a critical problem of such paradigm, i.e., how to effectively perform annotation by exploiting the similar facial images and their weak labels which are often noisy and incomplete. In particular, we propose an effective Weak Label Regularized Local Coordinate Coding (WLRLCC) technique, which exploits the principle of local coordinate coding in learning sparse features, and employs the idea of graph-based weak label regularization to enhance the weak labels of the similar facial images. We present an efficient optimization algorithm to solve the WLRLCC task. We conduct extensive empirical studies on two large-scale web facial image databases: (i) a Western celebrity database with a total of $6,025$ persons and $714,454$ web facial images, and (ii)an Asian celebrity database with $1,200$ persons and $126,070$ web facial images. The encouraging results validate the efficacy of the proposed WLRLCC algorithm. To further improve the efficiency and scalability, we also propose a PCA-based approximation scheme and an offline approximation scheme (AWLRLCC), which generally maintains comparable results but significantly saves much time cost. Finally, we show that WLRLCC can also tackle two existing face annotation tasks with promising performance.

  4. Collaborative annotation of 3D crystallographic models.

    Science.gov (United States)

    Hunter, J; Henderson, M; Khan, I

    2007-01-01

    This paper describes the AnnoCryst system-a tool that was designed to enable authenticated collaborators to share online discussions about 3D crystallographic structures through the asynchronous attachment, storage, and retrieval of annotations. Annotations are personal comments, interpretations, questions, assessments, or references that can be attached to files, data, digital objects, or Web pages. The AnnoCryst system enables annotations to be attached to 3D crystallographic models retrieved from either private local repositories (e.g., Fedora) or public online databases (e.g., Protein Data Bank or Inorganic Crystal Structure Database) via a Web browser. The system uses the Jmol plugin for viewing and manipulating the 3D crystal structures but extends Jmol by providing an additional interface through which annotations can be created, attached, stored, searched, browsed, and retrieved. The annotations are stored on a standardized Web annotation server (Annotea), which has been extended to support 3D macromolecular structures. Finally, the system is embedded within a security framework that is capable of authenticating users and restricting access only to trusted colleagues.

  5. Improving Hip Fracture Care in Ireland: A Preliminary Report of the Irish Hip Fracture Database

    Directory of Open Access Journals (Sweden)

    Prasad Ellanti

    2014-01-01

    IHFD is a clinically led web-based audit. We summarize the data collected on hip fractures from April 2012 to March 2013 from 8 centres. Results. There were 843 patients with the majority being (70% female. The 80–89-year age group accounted for the majority of fractures (44%. Most (71% sustained a fall at home. Intertrochanteric fractures (40% were most common. Only 28% were admitted to an orthopaedic ward within 4 hours. The majority (97% underwent surgery with 44% having surgery within 36 hours. Medical optimization (35% and lack of theatre space (26% accounted for most of the surgical delay. While 29% were discharged home, 33% were discharged to a nursing home or other long-stay facilities. There was a 4% in-hospital mortality rate. Conclusions. Several key areas in both the database and aspects of patient care needing improvement have been highlighted. The implementation of similar databases has led to improved hip fracture care in other countries and we believe this can be replicated in Ireland.

  6. Improving Quality and Quantity of Contributions: Two Models for Promoting Knowledge Exchange with Shared Databases

    Science.gov (United States)

    Cress, U.; Barquero, B.; Schwan, S.; Hesse, F. W.

    2007-01-01

    Shared databases are used for knowledge exchange in groups. Whether a person is willing to contribute knowledge to a shared database presents a social dilemma: Each group member saves time and energy by not contributing any information to the database and by using the database only to retrieve information which was contributed by others. But if…

  7. Selectome update: quality control and computational improvements to a database of positive selection.

    Science.gov (United States)

    Moretti, Sébastien; Laurenczy, Balazs; Gharib, Walid H; Castella, Briséïs; Kuzniar, Arnold; Schabauer, Hannes; Studer, Romain A; Valle, Mario; Salamin, Nicolas; Stockinger, Heinz; Robinson-Rechavi, Marc

    2014-01-01

    Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing. PMID:24225318

  8. Selectome update: quality control and computational improvements to a database of positive selection

    Science.gov (United States)

    Moretti, Sébastien; Laurenczy, Balazs; Gharib, Walid H.; Castella, Briséïs; Kuzniar, Arnold; Schabauer, Hannes; Studer, Romain A.; Valle, Mario; Salamin, Nicolas; Stockinger, Heinz; Robinson-Rechavi, Marc

    2014-01-01

    Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing. PMID:24225318

  9. Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies

    Science.gov (United States)

    Wang, Qian; He, Beixin Julie; Zhao, Hongyu

    2016-01-01

    Extensive efforts have been made to understand genomic function through both experimental and computational approaches, yet proper annotation still remains challenging, especially in non-coding regions. In this manuscript, we introduce GenoSkyline, an unsupervised learning framework to predict tissue-specific functional regions through integrating high-throughput epigenetic annotations. GenoSkyline successfully identified a variety of non-coding regulatory machinery including enhancers, regulatory miRNA, and hypomethylated transposable elements in extensive case studies. Integrative analysis of GenoSkyline annotations and results from genome-wide association studies (GWAS) led to novel biological insights on the etiologies of a number of human complex traits. We also explored using tissue-specific functional annotations to prioritize GWAS signals and predict relevant tissue types for each risk locus. Brain and blood-specific annotations led to better prioritization performance for schizophrenia than standard GWAS p-values and non-tissue-specific annotations. As for coronary artery disease, heart-specific functional regions was highly enriched of GWAS signals, but previously identified risk loci were found to be most functional in other tissues, suggesting a substantial proportion of still undetected heart-related loci. In summary, GenoSkyline annotations can guide genetic studies at multiple resolutions and provide valuable insights in understanding complex diseases. GenoSkyline is available at http://genocanyon.med.yale.edu/GenoSkyline. PMID:27058395

  10. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...

  11. Annotations for Intersection Typechecking

    Directory of Open Access Journals (Sweden)

    Joshua Dunfield

    2013-07-01

    Full Text Available In functional programming languages, the classic form of annotation is a single type constraint on a term. Intersection types add complications: a single term may have to be checked several times against different types, in different contexts, requiring annotation with several types. Moreover, it is useful (in some systems, necessary to indicate the context in which each such type is to be used. This paper explores the technical design space of annotations in systems with intersection types. Earlier work (Dunfield and Pfenning 2004 introduced contextual typing annotations, which we now tease apart into more elementary mechanisms: a "right hand" annotation (the standard form, a "left hand" annotation (the context in which a right-hand annotation is to be used, a merge that allows for multiple annotations, and an existential binder for index variables. The most novel element is the left-hand annotation, which guards terms (and right-hand annotations with a judgment that must follow from the current context.

  12. MimoSA: a system for minimotif annotation

    Directory of Open Access Journals (Sweden)

    Kundeti Vamsi

    2010-06-01

    Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to

  13. Automated Data Mining of A Proprietary Database System for Physician Quality Improvement

    International Nuclear Information System (INIS)

    Purpose: Physician practice quality improvement is a subject of intense national debate. This report describes using a software data acquisition program to mine an existing, commonly used proprietary radiation oncology database to assess physician performance. Methods and Materials: Between 2003 and 2004, a manual analysis was performed of electronic portal image (EPI) review records. Custom software was recently developed to mine the record-and-verify database and the review process of EPI at our institution. In late 2006, a report was developed that allowed for immediate review of physician completeness and speed of EPI review for any prescribed period. Results: The software extracted >46,000 EPIs between 2003 and 2007, providing EPI review status and time to review by each physician. Between 2003 and 2007, the department EPI review improved from 77% to 97% (range, 85.4-100%), with a decrease in the mean time to review from 4.2 days to 2.4 days. The initial intervention in 2003 to 2004 was moderately successful in changing the EPI review patterns; it was not repeated because of the time required to perform it. However, the implementation in 2006 of the automated review tool yielded a profound change in practice. Using the software, the automated chart review required ∼1.5 h for mining and extracting the data for the 4-year period. Conclusion: This study quantified the EPI review process as it evolved during a 4-year period at our institution and found that automation of data retrieval and review simplified and facilitated physician quality improvement

  14. National Trauma Database (NTrD)--improving trauma care: first year report.

    Science.gov (United States)

    Sabariah, F J; Ramesh, N; Mahathar, A W

    2008-09-01

    The first Malaysian National Trauma Database was launched in May 2006 with five tertiary referral centres to determine the fundamental data on major trauma, subsequently to evaluate the major trauma management and to come up with guidelines for improved trauma care. A prospective study, using standardized and validated questionnaires, was carried out from May 2006 till April 2007 for all cases admitted and referred to the participating hospitals. During the one year period, 123,916 trauma patients were registered, of which 933 (0.75%) were classified as major trauma. Patients with blunt injury made up for 83.9% of cases and RTA accounted for 72.6% of injuries with 64.9% involving motorcyclist and pillion rider. 42.8% had severe head injury with an admission Glasgow Coma Scale (GCS) of 3-8 and the Revised Trauma Score (RTS) of 5-6 were recorded in 28.8% of patients. The distribution of Injury Severity Score (ISS) showed that 42.9% of cases were in the range of 16-24. Only 1.9% and 6.3% of the patients were reviewed by the Emergency Physician and Surgeon respectively. Patients with admission systolic blood pressure of less than 90 mmHg had a death rate of 54.6%. Patients with severe head injury (GCS report has successfully demonstrated its significance in giving essential data on major trauma in Malaysia, however further expansion of the study may reflect more comprehensive trauma database in this country.

  15. Improving a Database Management Systems Course Through Student Learning Styles: A Pilot Study

    Directory of Open Access Journals (Sweden)

    Adem UZUN

    2011-06-01

    Full Text Available This is a pilot study, which aims to reorganize a course to better serve learners’ learning styles. In essence, this study is a case study to improve the performance of the Database Management Systems Course in the department of Computer Education and Instructional Technologies (CEIT at Uludag University. Learning styles of students were analyzed through Felder-Soloman's Index of Learning Styles (ILS. A part of data was conducted during the Spring 2009. The participants were the students of the respective course. Findings showed that participants were mostly visual, active and sensory type learners. They were balanced on sequential-global dimensions. No significant relationship was found between the learning styles and achievement scores. This result forms appropriate pre-study conditions for the upcoming study. It was decided for the upcoming study that different learning materials that suits characteristics of participants be developed and blended learning is proposed as a delivery method.

  16. Computing human image annotation.

    Science.gov (United States)

    Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Rubin, Daniel L

    2009-01-01

    An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human (or machine) observer. An image markup is the graphical symbols placed over the image to depict an annotation. In the majority of current, clinical and research imaging practice, markup is captured in proprietary formats and annotations are referenced only in free text radiology reports. This makes these annotations difficult to query, retrieve and compute upon, hampering their integration into other data mining and analysis efforts. This paper describes the National Cancer Institute's Cancer Biomedical Informatics Grid's (caBIG) Annotation and Image Markup (AIM) project, focusing on how to use AIM to query for annotations. The AIM project delivers an information model for image annotation and markup. The model uses controlled terminologies for important concepts. All of the classes and attributes of the model have been harmonized with the other models and common data elements in use at the National Cancer Institute. The project also delivers XML schemata necessary to instantiate AIMs in XML as well as a software application for translating AIM XML into DICOM S/R and HL7 CDA. Large collections of AIM annotations can be built and then queried as Grid or Web services. Using the tools of the AIM project, image annotations and their markup can be captured and stored in human and machine readable formats. This enables the inclusion of human image observation and inference as part of larger data mining and analysis activities. PMID:19964202

  17. The use of database linkage technique for the improvement of information on mother-infant deaths in Ceara, 2006

    OpenAIRE

    Augediva Maria Jucá Pordeus; Nádia Maria Girão Saraiva de Almeida; Luciano Pamplona de Góes Cavalcanti; José Gomes Bezerra Filho; Lindélia Sobreira Coriolano

    2009-01-01

    Objective: To assess the use of database linkage technique on the improvement of information on mother-infant deaths by recovering not registered and/or ignored variables from the deaths of children under one year old, in the city of Fortaleza, in 2006. Methods: The linkage of database SIM (Mortality Information System) and SINASC (Live births Information System) was done by selecting common variables of the two systems. Using the Reclink III software there were identified the perfect pairs “...

  18. T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools

    Science.gov (United States)

    Han, Na; Yu, Weiwen; Qiang, Yujun

    2016-01-01

    Type IV secretion system (T4SS) can mediate the passage of macromolecules across cellular membranes and is essential for virulent and genetic material exchange among bacterial species. The Type IV Secretion Project 2.0 (T4SP 2.0) database is an improved and extended version of the platform released in 2013 aimed at assisting with the detection of Type IV secretion systems (T4SS) in bacterial genomes. This advanced version provides users with web server tools for detecting the existence and variations of T4SS genes online. The new interface for the genome browser provides a user-friendly access to the most complete and accurate resource of T4SS gene information (e.g., gene number, name, type, position, sequence, related articles, and quick links to other webs). Currently, this online database includes T4SS information of 5239 bacterial strains. Conclusions. T4SS is one of the most versatile secretion systems necessary for the virulence and survival of bacteria and the secretion of protein and/or DNA substrates from a donor to a recipient cell. This database on virB/D genes of the T4SS system will help scientists worldwide to improve their knowledge on secretion systems and also identify potential pathogenic mechanisms of various microbial species.

  19. Evaluating techniques for metagenome annotation using simulated sequence data.

    Science.gov (United States)

    Randle-Boggis, Richard J; Helgason, Thorunn; Sapp, Melanie; Ashton, Peter D

    2016-07-01

    The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. PMID:27162180

  20. A Radiocarbon Database for Improving Understanding of Global Soil Carbon Dynamics: Part I

    Science.gov (United States)

    Torn, M. S.; Trumbore, S.; Smith, L. J.; Nave, L. E.; Sierra, C. A.; Harden, J. W.; Agarwal, D.; van Ingen, C.; Radiocarbon Database Workshop 2011

    2011-12-01

    Soils play a large role in the global carbon cycle, but soil carbon stocks and dynamics remain highly uncertain. Radiocarbon (14C) observations from soils and soil respiration provide one of the only ways to infer terrestrial carbon turnover times or to test ecosystem carbon models. Although a wealth of such observations exists, they are scattered in small data sets held by individual researchers, and have not been compiled in a form easy to use for multi-site analysis, global assessments, or model testing. Here we introduce a new, global radiocarbon database that will synthesize datasets from multiple contributors to facilitate research on three broad questions: (1) What are current patterns of soil carbon dynamics, and what factors influence these patterns? (2) What is the sequestration capacity of different soils? (3) What are likely impacts of global change on the soil resource? (4) How well do models represent important carbon cycle processes, and how can they be improved? In addition to assembling data in a common format for analyses, this database will offer query capabilities and the ability to combine data with gridded global products, such as temporally resolved temperature and precipitation, NPP and GPP, and a climate-based decomposition index. Some of the near-term synthesis goals include analyzing depth profiles of 14C for across gradients in ecosystem state factors (climate, organisms, relief, parent material, time, and human influence) and soil orders; mapping surface-soil 14C values on soil temperature and moisture; and comparing soil carbon turnover times to NPP and soil carbon stocks. We are currently incorporating data from 18 contributors and six continents, with 14C measurements from soils representing nine soil orders, plant and microbial tissues, and respiration fluxes. Our intention is to grow the database and make it available to a wide community of scientists. For example, observations for different disturbance, experimental treatment, or

  1. Genome Annotation Transfer Utility (GATU: rapid annotation of viral genomes using a closely related reference genome

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2006-06-01

    Full Text Available Abstract Background Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics – Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU, to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task. Results GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences. Conclusion GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome

  2. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation.

    Directory of Open Access Journals (Sweden)

    Dongjun Chung

    2014-11-01

    Full Text Available Results from Genome-Wide Association Studies (GWAS have shown that complex diseases are often affected by many genetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem. There is a need to develop more powerful statistical methods to leverage available information to improve upon traditional approaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novel statistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation, to increase statistical power to identify risk variants through joint analysis of multiple GWAS data sets and annotation information because: (1 accumulating evidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2 functionally annotated variants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasets and functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence of pleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking is achieved through an EM algorithm that can handle genome-wide markers efficiently. When we applied GPA to jointly analyze five psychiatric disorders with annotation information, not only did GPA identify many weak signals missed by the traditional single phenotype analysis, but it also revealed relationships in the genetic architecture of these disorders. Using our hypothesis testing framework, statistically significant pleiotropic effects were detected among these psychiatric disorders, and the markers annotated in the central nervous system genes and eQTLs from the Genotype-Tissue Expression (GTEx database were significantly enriched. We also applied GPA to a bladder cancer GWAS data set with the ENCODE DNase-seq data from 125 cell lines. GPA was able to detect cell

  3. Retrieval-based face annotation by weak label regularized local coordinate coding.

    Science.gov (United States)

    Wang, Dayong; Hoi, Steven C H; He, Ying; Zhu, Jianke; Mei, Tao; Luo, Jiebo

    2014-03-01

    Auto face annotation, which aims to detect human faces from a facial image and assign them proper human names, is a fundamental research problem and beneficial to many real-world applications. In this work, we address this problem by investigating a retrieval-based annotation scheme of mining massive web facial images that are freely available over the Internet. In particular, given a facial image, we first retrieve the top $(n)$ similar instances from a large-scale web facial image database using content-based image retrieval techniques, and then use their labels for auto annotation. Such a scheme has two major challenges: 1) how to retrieve the similar facial images that truly match the query, and 2) how to exploit the noisy labels of the top similar facial images, which may be incorrect or incomplete due to the nature of web images. In this paper, we propose an effective Weak Label Regularized Local Coordinate Coding (WLRLCC) technique, which exploits the principle of local coordinate coding by learning sparse features, and employs the idea of graph-based weak label regularization to enhance the weak labels of the similar facial images. An efficient optimization algorithm is proposed to solve the WLRLCC problem. Moreover, an effective sparse reconstruction scheme is developed to perform the face annotation task. We conduct extensive empirical studies on several web facial image databases to evaluate the proposed WLRLCC algorithm from different aspects. The experimental results validate its efficacy. We share the two constructed databases "WDB" (714,454 images of 6,025 people) and "ADB" (126,070 images of 1,200 people) with the public. To further improve the efficiency and scalability, we also propose an offline approximation scheme (AWLRLCC) which generally maintains comparable results but significantly reduces the annotation time. PMID:24457510

  4. Annotation Method (AM): SE41_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available -MS Fragment Viewer (http://webs2.kazusa.or.jp/msmsfragmentviewer/) are used for annotation and identification of the compounds. ... ...e used for primary database search. Peaks with no hit to these databases are then selected to secondary sear...ch using EX-HR2 (http://webs2.kazusa.or.jp/mfsearcher/) databases. After the database search processes, each

  5. Annotating Honorifics Denoting Social Ranking of Referents

    OpenAIRE

    Nariyama, Shigeko; Nakaiwa, Hiromi; Siegel, Melanie

    2011-01-01

    This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the predicate, calibrating the ranks, and co...

  6. Feature Selection in Detection of Adverse Drug Reactions from the Health Improvement Network (THIN Database

    Directory of Open Access Journals (Sweden)

    Yihui Liu

    2015-02-01

    Full Text Available Adverse drug reaction (ADR is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.

  7. Improving the Computational Performance of Ontology-Based Classification Using Graph Databases

    Directory of Open Access Journals (Sweden)

    Thomas J. Lampoltshammer

    2015-07-01

    Full Text Available The increasing availability of very high-resolution remote sensing imagery (i.e., from satellites, airborne laser scanning, or aerial photography represents both a blessing and a curse for researchers. The manual classification of these images, or other similar geo-sensor data, is time-consuming and leads to subjective and non-deterministic results. Due to this fact, (semi- automated classification approaches are in high demand in affected research areas. Ontologies provide a proper way of automated classification for various kinds of sensor data, including remotely sensed data. However, the processing of data entities—so-called individuals—is one of the most cost-intensive computational operations within ontology reasoning. Therefore, an approach based on graph databases is proposed to overcome the issue of a high time consumption regarding the classification task. The introduced approach shifts the classification task from the classical Protégé environment and its common reasoners to the proposed graph-based approaches. For the validation, the authors tested the approach on a simulation scenario based on a real-world example. The results demonstrate a quite promising improvement of classification speed—up to 80,000 times faster than the Protégé-based approach.

  8. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  9. Personnalisation de Syst\\`emes OLAP Annot\\'es

    CERN Document Server

    Jerbi, Houssem; Ravat, Franck; Teste, Olivier

    2010-01-01

    This paper deals with personalization of annotated OLAP systems. Data constellation is extended to support annotations and user preferences. Annotations reflect the decision-maker experience whereas user preferences enable users to focus on the most interesting data. User preferences allow annotated contextual recommendations helping the decision-maker during his/her multidimensional navigations.

  10. Quality standards for DNA sequence variation databases to improve clinical management under development in Australia

    Directory of Open Access Journals (Sweden)

    B. Bennetts

    2014-09-01

    Full Text Available Despite the routine nature of comparing sequence variations identified during clinical testing to database records, few databases meet quality requirements for clinical diagnostics. To address this issue, The Royal College of Pathologists of Australasia (RCPA in collaboration with the Human Genetics Society of Australasia (HGSA, and the Human Variome Project (HVP is developing standards for DNA sequence variation databases intended for use in the Australian clinical environment. The outputs of this project will be promoted to other health systems and accreditation bodies by the Human Variome Project to support the development of similar frameworks in other jurisdictions.

  11. ProGMap: an integrated annotation resource for protein orthology

    NARCIS (Netherlands)

    Kuzniar, A.; Lin, K.; He, Y.; Nijveen, H.; Pongor, S.; Leunissen, J.A.M.

    2009-01-01

    Current protein sequence databases employ different classification schemes that often provide conflicting annotations, especially for poorly characterized proteins. ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap) is a web-tool designed to help researchers and database annotato

  12. An Introduction to Genome Annotation.

    Science.gov (United States)

    Campbell, Michael S; Yandell, Mark

    2015-12-17

    Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation.

  13. On Anomalies in Annotation Systems

    CERN Document Server

    Brust, Matthias R

    2007-01-01

    Today's computer-based annotation systems implement a wide range of functionalities that often go beyond those available in traditional paper-and-pencil annotations. Conceptually, annotation systems are based on thoroughly investigated psycho-sociological and pedagogical learning theories. They offer a huge diversity of annotation types that can be placed in textual as well as in multimedia format. Additionally, annotations can be published or shared with a group of interested parties via well-organized repositories. Although highly sophisticated annotation systems exist both conceptually as well as technologically, we still observe that their acceptance is somewhat limited. In this paper, we argue that nowadays annotation systems suffer from several fundamental problems that are inherent in the traditional paper-and-pencil annotation paradigm. As a solution, we propose to shift the annotation paradigm for the implementation of annotation system.

  14. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    were assigned to the 3 root terms. The Version 5 GO annotation is publically queryable via the GO site http://amigo.geneontology.org/cgi-bin/amigo/go.cgi. Additionally, the genome of M. oryzae is constantly being refined and updated as new information is incorporated. For the latest GO annotation of Version 6 genome, please visit our website http://scotland.fgl.ncsu.edu/smeng/GoAnnotationMagnaporthegrisea.html. The preliminary GO annotation of Version 6 genome is placed at a local MySql database that is publically queryable via a user-friendly interface Adhoc Query System. Conclusion Our analysis provides comprehensive and robust GO annotations of the M. oryzae genome assemblies that will be solid foundations for further functional interrogation of M. oryzae.

  15. Semantic annotation of mutable data.

    Directory of Open Access Journals (Sweden)

    Robert A Morris

    Full Text Available Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.

  16. Improved Search of Principal Component Analysis Databases for Spectro-polarimetric Inversion

    Science.gov (United States)

    Casini, R.; Asensio Ramos, A.; Lites, B. W.; López Ariste, A.

    2013-08-01

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 24n bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of "compatible" models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 24n as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  17. FINDING GENERIFS VIA GENE ONTOLOGY ANNOTATIONS

    OpenAIRE

    Lu, Zhiyong; Cohen, K Bretonnel; Hunter, Lawrence

    2006-01-01

    A Gene Reference Into Function (GeneRIF) is a concise phrase describing a function of a gene in the Entrez Gene database. Applying techniques from the area of natural language processing known as automatic summarization, it is possible to link the Entrez Gene database, the Gene Ontology, and the biomedical literature. A system was implemented that automatically suggests a sentence from a PubMed/MEDLINE abstract as a candidate GeneRIF by exploiting a gene’s GO annotations along with location f...

  18. Human Genome Annotation

    Science.gov (United States)

    Gerstein, Mark

    A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.

  19. Computational evaluation of TIS annotation for prokaryotic genomes

    OpenAIRE

    Zhu Huaiqiu; Ju Li-Ning; Zheng Xiaobin; Hu Gang-Qing; She Zhen-Su

    2008-01-01

    Abstract Background Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. Results Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment o...

  20. Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis

    DEFF Research Database (Denmark)

    Bakke, Peter; Carney, Nick; DeLoache, Will;

    2009-01-01

    Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in...... databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology...... and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species...

  1. An annotation system for 3D fluid flow visualization

    Science.gov (United States)

    Loughlin, Maria M.; Hughes, John F.

    1995-01-01

    Annotation is a key activity of data analysis. However, current systems for data analysis focus almost exclusively on visualization. We propose a system which integrates annotations into a visualization system. Annotations are embedded in 3D data space, using the Post-it metaphor. This embedding allows contextual-based information storage and retrieval, and facilitates information sharing in collaborative environments. We provide a traditional database filter and a Magic Lens filter to create specialized views of the data. The system has been customized for fluid flow applications, with features which allow users to store parameters of visualization tools and sketch 3D volumes.

  2. The UCSC Genome Browser database: 2016 update.

    Science.gov (United States)

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  3. The UCSC Genome Browser database: 2016 update.

    Science.gov (United States)

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

  4. Using database technology to improve STEM student retention: A total quality management approach to early alert and intervention

    OpenAIRE

    Sam Khoury; Kouroush Jenab; Donald Staub; Mark Rajai

    2012-01-01

    Students at risk of dropping out of Science, Technology, Engineering, and Mathematics (STEM) programs often display signs that indicate they are at risk. A need exists to identify at risk STEM students early and to develop and implement effective intervention strategies that utilize the Total Quality Management (TQM) approach. Most of all, a database system is needed to track this early intervention process, if retention rates are to be improved. To address this need at a small community coll...

  5. Database Constraints Applied to Metabolic Pathway Reconstruction Tools

    Directory of Open Access Journals (Sweden)

    Jordi Vilaplana

    2014-01-01

    Full Text Available Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (reannotation of proteomes, to properly identify both the individual proteins involved in the process(es of interest and their function. It also enables the sets of proteins involved in the process(es in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.

  6. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  7. The GATO gene annotation tool for research laboratories.

    Science.gov (United States)

    Fujita, A; Massirer, K B; Durham, A M; Ferreira, C E; Sogayar, M C

    2005-11-01

    Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO) is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB. PMID:16258624

  8. Improvements to SFCOMPO - a database on isotopic composition of spent nuclear fuel

    International Nuclear Information System (INIS)

    Isotopic composition is one of the most relevant data to be used in the calculation of burnup of irradiated nuclear fuel. Since autumn 2002, the Organisation for Economic Co-operation and Development/Nuclear Energy Agency (OECD/NEA) has operated a database of isotopic composition - SFCOMPO, initially developed in Japan Atomic Energy Research Institute. This paper describes the latest version of SFCOMPO and the future development plan in OECD/NEA. (author)

  9. Selectome update: quality control and computational improvements to a database of positive selection.

    OpenAIRE

    Moretti S; Laurenczy B.; Gharib W.H.; Castella B; Kuzniar A.; Schabauer H.; Studer R.A.; Valle M.; Salamin N.; Stockinger H.; Robinson-Rechavi M.

    2014-01-01

    Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive ...

  10. Improved Integrity Constraints Checking in Distributed Databases by Exploiting Local Checking

    Institute of Scientific and Technical Information of China (English)

    Ali A.Alwan; Hamidah Ibrahim; Nur Izura Udzir

    2009-01-01

    Most of the previous studies concerning checking the integrity constraints in distributed database derive simplified forms of the initial integrity constraints with the sufficiency property, since the sufficient test is known to be cheaper than the complete test and its initial integrity constraint as it involves less data to be transferred across the network and can always be evaluated at the target site (single site). Their studies are limited as they depend strictly on the assumption that an update operation will be executed at a site where the relation specified in the update operation is located, which is not always true. Hence, the sufficient test, which is proven to be local test by previous study, is no longer appropriate. This paper proposes an approach to checking integrity constraints in a distributed database by utilizing as much as possible the local information stored at the target site. The proposed approach derives support tests as an alternative to the existing complete and sufficient tests proposed by previous researchers with the intention to increase the number of local checking regardless the location of the submitted update operation. Several analyses have been performed to evaluate the proposed approach, and the results show that support tests can benefit the distributed database, where local constraint checking can be achieved.

  11. Getting Priorities Straight: Improving Linux Support for Database I/O

    DEFF Research Database (Denmark)

    Hall, Christoffer; Bonnet, Philippe

    2005-01-01

    The Linux 2.6 kernel supports asynchronous I/O as a result of propositions from the database industry. This is a positive evolution but is it a panacea? In the context of the Badger project, a collaboration between MySQL AB and University of Copenhagen, we evaluate how MySQL/InnoDB can best take...... advantage of Linux asynchronous I/O and how Linux can help MySQL/InnoDB best take advantage of the underlying I/O bandwidth. This is a crucial problem for the increasing number of MySQL servers deployed for very large database applications. In this paper, we rst show that the conservative I/O submission...... policy used by InnoDB (as well as Oracle 9.2) leads to an under-utilization of the available I/O bandwidth. We then show that introducing prioritized asynchronous I/O in Linux will allow MySQL/InnoDB and the other Linux databases to fully utilize the available I/O bandwith using a more aggressive I...

  12. Apollo2Go: a web service adapter for the Apollo genome viewer to enable distributed genome annotation

    Directory of Open Access Journals (Sweden)

    Mayer Klaus FX

    2007-08-01

    Full Text Available Abstract Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. Conclusion This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.

  13. Annotated bibliography traceability

    NARCIS (Netherlands)

    Narain, G.

    2006-01-01

    This annotated bibliography contains summaries of articles and chapters of books, which are relevant to traceability. After each summary there is a part about the relevancy of the paper for the LEI project. The aim of the LEI-project is to gain insight in several aspects of traceability in order to

  14. Collaborative Movie Annotation

    Science.gov (United States)

    Zad, Damon Daylamani; Agius, Harry

    In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.

  15. Annotation of Regular Polysemy

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector

    Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...

  16. Intellectuals in China: Annotations.

    Science.gov (United States)

    Parker, Franklin

    This annotated bibliography of 72 books, journal articles, government reports, and newspaper feature stories focuses on the changing role of intellectuals in China, primarily since the 1949 Chinese Revolution. Particular attention is given to the Hundred Flowers Movement of 1957 and the Cultural Revolution. Most of the cited works are in English,…

  17. Annotation: The Savant Syndrome

    Science.gov (United States)

    Heaton, Pamela; Wallace, Gregory L.

    2004-01-01

    Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…

  18. Annotation of Ehux ESTs

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-06-12

    22 percent ESTs do no align with scaffolds. EST Pipeleine assembles 17126 consensi from the noaligned ESTs. Annotation Pipeline predicts 8564 ORFS on the consensi. Domain analysis of ORFs reveals missing genes. Cluster analysis reveals missing genes. Expression analysis reveals potential strain specific genes.

  19. Mouse genome database 2016.

    Science.gov (United States)

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data.

  20. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  1. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  2. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  3. Effective function annotation through catalytic residue conservation.

    Science.gov (United States)

    George, Richard A; Spriggs, Ruth V; Bartlett, Gail J; Gutteridge, Alex; MacArthur, Malcolm W; Porter, Craig T; Al-Lazikani, Bissan; Thornton, Janet M; Swindells, Mark B

    2005-08-30

    Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-d-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target.

  4. Linguistic variations and morphosyntactic annotation of Latin classical texts Variations langagières et annotation morphosyntaxique du latin classique

    Directory of Open Access Journals (Sweden)

    Céline Poudat

    2010-01-01

    Full Text Available This paper assesses the performance of three taggers (MBT, TnT and TreeTagger when used for the morphosyntactic annotation of classical Latin texts. With this aim in view, we selected the training corpora, -as well as the samples used for tests-, from the texts of the LASLA database. The texts were chosen according to their ability to allow testing of the taggers sensitivity to stylistic, diachronic, generic or discursive variations. On the one hand, this research pinpoints the achievements of each tagger according to the various corpora. On the other hand, the paper proves that these taggers can be used as true heuristic instruments and can help to improve significantly the description of the corpus.

  5. Measuring frailty in population-based healthcare databases: multi-dimensional prognostic indices for the improvement of geriatric care

    Directory of Open Access Journals (Sweden)

    Janet Sultana

    2016-04-01

    Full Text Available The prognostic evaluation of geriatric patients is critical in helping clinicians to weigh the risks versus the benefits of available therapeutic options. Frailty contributes significantly to the risk of mortality in older patients and is already known to have implications on the outcome of treatment in a clinical context. The multi-dimensional prognostic index (MPI is a prognostic tool based on a comprehensive geriatric assessment and includes detailed information on patient cognition, functionality, disease and drug burden. The value of the MPI in predicting mortality has already been shown in hospital and community settings but never in a population- based healthcare database setting. One of the aims of the ongoing EU-funded MPI_Age project is to improve our understanding of how geriatric frailty data can be identified in healthcare databases and whether this can be used to predict serious adverse events associated with pharmacotherapy. Our findings suggest that data on functionality in elderly patients is poorly registered in The Health Improvement Network (THIN, a UK nationwide general practice database, and only few of the functionality domains could be used in a population-based analysis. The most commonly registered functionality information was related to mobility, dressing, accommodation and cognition. Our results suggest that some of these functionality domains are predictive of short- and long-term mortality in community-dwelling patients. This may have implications in observational research where frailty is an unmeasured confounder.

  6. Documenting and Improving the Hourly Wage Measure in the Danish IDA Database

    DEFF Research Database (Denmark)

    Lund, Christian Giødesen; Vejlin, Rune Majlund

    This paper overhauls the hourly wage measure that is most often used in Danish research, the TIMELON variable in the IDA database. Based on a replication that we have constructed, we provide a documentation of the wage variable, the first of its kind, then continue with a performance analysis. We...... year. We analyse these puzzles in depth and solve almost all of them. Finally, we propose a new hourly wage measure that incorporates all the solutions and we show that it performs much better....

  7. Animal QTLdb:an improved database tool for livestock animal QTL/association data dissemination in the post-genome era

    OpenAIRE

    Hu, Zhi-Liang; Park, Carissa A.; Wu, Xiao-Lin; Reecy, James M.

    2013-01-01

    The Animal QTL database (QTLdb; http://www.animalgenome.org/QTLdb) is designed to house all publicly available QTL and single-nucleotide polymorphism/gene association data on livestock animal species. An earlier version was published in the Nucleic Acids Research Database issue in 2007. Since then, we have continued our efforts to develop new and improved database tools to allow more data types, parameters and functions. Our efforts have transformed the Animal QTLdb into a tool that actively ...

  8. Automated analysis and annotation of basketball video

    Science.gov (United States)

    Saur, Drew D.; Tan, Yap-Peng; Kulkarni, Sanjeev R.; Ramadge, Peter J.

    1997-01-01

    Automated analysis and annotation of video sequences are important for digital video libraries, content-based video browsing and data mining projects. A successful video annotation system should provide users with useful video content summary in a reasonable processing time. Given the wide variety of video genres available today, automatically extracting meaningful video content for annotation still remains hard by using current available techniques. However, a wide range video has inherent structure such that some prior knowledge about the video content can be exploited to improve our understanding of the high-level video semantic content. In this paper, we develop tools and techniques for analyzing structured video by using the low-level information available directly from MPEG compressed video. Being able to work directly in the video compressed domain can greatly reduce the processing time and enhance storage efficiency. As a testbed, we have developed a basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times. We expect our approach can also be extended to structured video in other domains.

  9. Nutrition & Adolescent Pregnancy: A Selected Annotated Bibliography.

    Science.gov (United States)

    National Agricultural Library (USDA), Washington, DC.

    This annotated bibliography on nutrition and adolescent pregnancy is intended to be a source of technical assistance for nurses, nutritionists, physicians, educators, social workers, and other personnel concerned with improving the health of teenage mothers and their babies. It is divided into two major sections. The first section lists selected…

  10. Performance Improvement with Web Based Database on Library Information System of SMK Yadika 5

    Directory of Open Access Journals (Sweden)

    Pualam Dipa Nusantara

    2015-12-01

    Full Text Available The difficulty in managing the data of books collection in the library is a problem that is often faced by the librarian that effect the quality of service. Arrangement and recording a collection of books in the file system of separate applications Word and Excel, as well as transaction handling borrowing and returning books, there has been no integrated records. Library system can manage the book collection. This system can reduce the problems often experienced by library staff when serving students in borrowing books. There so frequent difficulty in managing the books that still in borrowed state. This system will also record a collection of late fees or lost library conducted by students (borrowers. The conclusion of this study is library performance can be better with the library system using web database.

  11. Functional annotation and ENU

    OpenAIRE

    Gunn, Teresa M.

    2012-01-01

    Functional annotation of every gene in the mouse genome is a herculean task that requires a multifaceted approach. Many large-scale initiatives are contributing to this undertaking. The International Knockout Mouse Consortium (IKMC) plans to mutate every protein-coding gene, using a combination of gene trapping and gene targeting in embryonic stem cells. Many other groups are performing using the chemical mutagen ethylnitrosourea (ENU) or transpon-based systems to induce mutations, screening ...

  12. Using the Homes Energy Efficiency Database as a research resource for residential insulation improvements

    International Nuclear Information System (INIS)

    In devising viable energy efficiency policies that can reduce the greenhouse gas emissions of existing dwellings (e.g. UK's Green Deal), data are required on current insulation levels and its influences. One such data source is the seldom used UK Energy Saving Trust's Homes Energy Efficiency Database (HEED), which this paper investigates using Norfolk UK local authorities as a case study. The HEED's reactive and longitudinal data collection strategies contribute to underlying biases, which is the likely reasoning for its differences with the English Housing Survey and UK 2001 Census. These differences had a cascading effect in that they manifested themselves in the indicative financial and carbon assessments undertaken. Similarly, sampling concerns also implicated correlations surrounding influences of current dwelling insulation levels. Providing one is transparent about potential biases and data concerns, the HEED can play a substantial role in guiding policy decisions and understanding dwelling stock characteristics (e.g. what makes dwellings ‘Hard to Treat'). In particular, its vast (national) geographic coverage yet high resolution enables local context to be explored: a factor that this study shows to significantly shape insulation levels. - Highlights: • The Homes Energy Efficiency Database's (HEED) integrity and role is investigated. • HEED biases exist due to reactive and longitudinal data collection strategies. • Biases contribute to differences with Census and English Housing Survey. • Its high resolution and national coverage can bring local context to the fore. • It can play a substantial role in shaping residential energy efficiency policies

  13. Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

    Directory of Open Access Journals (Sweden)

    Jianfang Cao

    2015-01-01

    Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.

  14. Semantic Based Image Annotation Using Descriptive Features and Retagging approach

    Directory of Open Access Journals (Sweden)

    P.Nagarani

    2012-02-01

    Full Text Available The Semantic based annotation of an image is very important and a difficult task in content-based image retrieval (CBIR. The low-level features of the images are described using color and the texture features and the proposed model is used for semantic annotation of images. Also the textual annotations or the tags with multimedia content are the most effective approaches to organize and to support search over digital images and multimedia databases. The quality of the tags was refined using Image retagging method. The process is given as a multi-path graph based problem, which in parallel identifies the visual content of the images, semantic correlation of the tags as well as the primary information provided by users. The image annotation preferred because as the countless images exist in our lives it is not possible to annotate them all by hand. And so annotation by computer is a potential and promising solution to this problem precisely. The ability to annotate images semantically based on the objects that they contain is essential in image retrieval as it provides the mechanism to take advantage of existing text retrieval system

  15. Semantic Based Image Annotation Using Descriptive Features and Retagging approach

    Directory of Open Access Journals (Sweden)

    P.Nagarani

    2012-03-01

    Full Text Available The Semantic based annotation of an image is very important and a difficult task in content-based image retrieval (CBIR. The low-level features of the images are described using color and the texture features and the proposed model is used for semantic annotation of images. Also the textual annotations or the tags with multimedia content are the most effective approaches to organize and to support search over digital images and multimedia databases. The quality of the tags was refined using Image retagging method. Theprocess is given as a multi-path graph based problem, which in parallel identifies the visual content of the images, semantic correlation of the tags as well as the primary information provided by users. The image annotation preferred because as the countless images exist in our lives it is not possible to annotate them all by hand. And so annotation by computer is a potential and promising solution to thisproblem precisely. The ability to annotate images semantically based on the objects that they contain is essential in image retrieval as it provides the mechanism to take advantage of existing text retrieval systems.

  16. Annotations of Mexican bullfighting videos for semantic index

    Science.gov (United States)

    Montoya Obeso, Abraham; Oropesa Morales, Lester Arturo; Fernando Vázquez, Luis; Cocolán Almeda, Sara Ivonne; Stoian, Andrei; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Montiel Perez, Jesús Yalja; de la O Torres, Saul; Ramírez Acosta, Alejandro Alvaro

    2015-09-01

    The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.

  17. NOAA's Integrated Tsunami Database: Data for improved forecasts, warnings, research, and risk assessments

    Science.gov (United States)

    Stroker, Kelly; Dunbar, Paula; Mungov, George; Sweeney, Aaron; McCullough, Heather; Carignan, Kelly

    2015-04-01

    The National Oceanic and Atmospheric Administration (NOAA) has primary responsibility in the United States for tsunami forecast, warning, research, and supports community resiliency. NOAA's National Geophysical Data Center (NGDC) and co-located World Data Service for Geophysics provide a unique collection of data enabling communities to ensure preparedness and resilience to tsunami hazards. Immediately following a damaging or fatal tsunami event there is a need for authoritative data and information. The NGDC Global Historical Tsunami Database (http://www.ngdc.noaa.gov/hazard/) includes all tsunami events, regardless of intensity, as well as earthquakes and volcanic eruptions that caused fatalities, moderate damage, or generated a tsunami. The long-term data from these events, including photographs of damage, provide clues to what might happen in the future. NGDC catalogs the information on global historical tsunamis and uses these data to produce qualitative tsunami hazard assessments at regional levels. In addition to the socioeconomic effects of a tsunami, NGDC also obtains water level data from the coasts and the deep-ocean at stations operated by the NOAA/NOS Center for Operational Oceanographic Products and Services, the NOAA Tsunami Warning Centers, and the National Data Buoy Center (NDBC) and produces research-quality data to isolate seismic waves (in the case of the deep-ocean sites) and the tsunami signal. These water-level data provide evidence of sea-level fluctuation and possible inundation events. NGDC is also building high-resolution digital elevation models (DEMs) to support real-time forecasts, implemented at 75 US coastal communities. After a damaging or fatal event NGDC begins to collect and integrate data and information from many organizations into the hazards databases. Sources of data include our NOAA partners, the U.S. Geological Survey, the UNESCO Intergovernmental Oceanographic Commission (IOC) and International Tsunami Information Center

  18. IMPROVING EMISSIONS ESTIMATES WITH COMPUTATIONAL INTELLIGENCE, DATABASE EXPANSION, AND COMPREHENSIVE VALIDATION

    Science.gov (United States)

    The report discusses an EPA investigation of techniques to improve methods for estimating volatile organic compound (VOC) emissions from area sources. Using the automobile refinishing industry for a detailed area source case study, an emission estimation method is being developed...

  19. Improving the Discoverability and Availability of Sample Data and Imagery in NASA's Astromaterials Curation Digital Repository Using a New Common Architecture for Sample Databases

    Science.gov (United States)

    Todd, N. S.; Evans, C.

    2015-06-01

    NASA’s Astromaterials Curation Office is leading an initiative to create a common framework for its databases to improve discoverability and access to data and imagery from NASA-curated extraterrestrial samples for planetary science researchers.

  20. Effects of Annotations and Homework on Learning Achievement: An Empirical Study of Scratch Programming Pedagogy

    Science.gov (United States)

    Su, Addison Y. S.; Huang, Chester S. J.; Yang, Stephen J. H.; Ding, T. J.; Hsieh, Y. Z.

    2015-01-01

    In Taiwan elementary schools, Scratch programming has been taught for more than four years. Previous studies have shown that personal annotations is a useful learning method that improve learning performance. An annotation-based Scratch programming (ASP) system provides for the creation, share, and review of annotations and homework solutions in…

  1. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    Energy Technology Data Exchange (ETDEWEB)

    Dogget, N.; Myers, G. [Los Alamos National Lab., NM (United States); Wills, C.J. [Univ. of California, San Diego, CA (United States)

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  2. Using database technology to improve STEM student retention: A total quality management approach to early alert and intervention

    Directory of Open Access Journals (Sweden)

    Sam Khoury

    2012-04-01

    Full Text Available Students at risk of dropping out of Science, Technology, Engineering, and Mathematics (STEM programs often display signs that indicate they are at risk. A need exists to identify at risk STEM students early and to develop and implement effective intervention strategies that utilize the Total Quality Management (TQM approach. Most of all, a database system is needed to track this early intervention process, if retention rates are to be improved. To address this need at a small community college in North Carolina, a system was developed and underwent a pilot study in Fall 2009 and Spring 2010. The two pilot groups were compared to the two control groups to identify differences in retention, course credit completion rates, and grade point averages (GPA. The first pilot group displayed no significant differences, while the second pilot group displayed significant differences in most of the areas analyzed in the study, indicating a database system can be used to improve STEM student retention. While the second of the two pilot groups displayed promising results, managerial and logistical issues, such as less than optimal instructor involvement, impeded success were identified. This paper will describe the design, implementation, and the preliminary results of this study and outlines the need for further research that confirms these preliminary findings.

  3. CHIANTI—AN ATOMIC DATABASE FOR EMISSION LINES. XIII. SOFT X-RAY IMPROVEMENTS AND OTHER CHANGES

    International Nuclear Information System (INIS)

    The CHIANTI spectral code consists of two parts: an atomic database and a suite of computer programs in Python and IDL. Together, they allow the calculation of the optically thin spectrum of astrophysical objects and provide spectroscopic plasma diagnostics for the analysis of astrophysical spectra. The database includes atomic energy levels, wavelengths, radiative transition probabilities, collision excitation rate coefficients, ionization, and recombination rate coefficients, as well as data to calculate free-free, free-bound, and two-photon continuum emission. Version 7.1 has been released, which includes improved data for several ions, recombination rates, and element abundances. In particular, it provides a large expansion of the CHIANTI models for key Fe ions from Fe VIII to Fe XIV to improve the predicted emission in the 50-170 Å wavelength range. All data and programs are freely available at http://www.chiantidatabase.org and in SolarSoft, while the Python interface to CHIANTI can be found at http://chiantipy.sourceforge.net.

  4. 多级安全数据库BLP模型分析与改进%Analysis and Improvement of Multilevel Secure Database BLP Model

    Institute of Scientific and Technical Information of China (English)

    赵海燕; 赵静

    2013-01-01

    In view of the defects of current multi-level secure database BLP model, improved methods of its security classification, scope and access rules were proposed, in order to improve the security of the database.%针对目前多级安全数据库BLP模型的缺陷,提出了对其密级、范围和访问规则的改进方法,以提高数据库的安全性.

  5. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2004-06-01

    probability threshold of 0.9 during cross-validation using known reactions in computationally-predicted pathway databases. After applying our method to 513 pathway holes in 333 pathways from three Pathway/Genome databases, we increased the number of complete pathways by 42%. We made putative assignments to 46% of the holes, including annotation of 17 sequences of previously unknown function. Conclusions Our pathway hole filler can be used not only to increase the utility of Pathway/Genome databases to both experimental and computational researchers, but also to improve predictions of protein function.

  6. GANESH: Software for Customized Annotation of Genome Regions

    OpenAIRE

    Huntley, Derek; Hummerich, Holger; Smedley, Damian; Kittivoravitkul, Sasivimol; McCarthy, Mark; Little, Peter; Sergot, Marek

    2003-01-01

    GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching an...

  7. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    OpenAIRE

    Hamilton John P; Campbell Matthew; Thibaud-Nissen Françoise; Zhu Wei; Buell C

    2007-01-01

    Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging k...

  8. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    Science.gov (United States)

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  9. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    Directory of Open Access Journals (Sweden)

    Liang-Tsung Huang

    2015-01-01

    Full Text Available Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs. In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20 and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  10. neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

    Science.gov (United States)

    Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

    2016-01-01

    The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being

  11. neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

    Science.gov (United States)

    Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

    2016-01-01

    The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being

  12. On Semantic Annotation in Clarin-PL Parallel Corpora

    Directory of Open Access Journals (Sweden)

    Violetta Koseska-Toszewa

    2015-12-01

    Full Text Available On Semantic Annotation in Clarin-PL Parallel CorporaIn the article, the authors present a proposal for semantic annotation in Clarin-PL parallel corpora: Polish-Bulgarian-Russian and Polish-Lithuanian ones. Semantic annotation of quantification is a novum in developing sentence level semantics in multilingual parallel corpora. This is why our semantic annotation is manual. The authors hope it will be interesting to IT specialists working on automatic processing of the given natural languages. Semantic annotation defined the way it is defined here will make contrastive studies of natural languages more efficient, which in turn will help verify the results of those studies, and will certainly improve human and machine translations.

  13. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  14. A Factor Graph Approach to Automated GO Annotation.

    Science.gov (United States)

    Spetale, Flavio E; Tapia, Elizabeth; Krsticevic, Flavia; Roda, Fernando; Bulacio, Pilar

    2016-01-01

    As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum. PMID:26771463

  15. Predicting word sense annotation agreement

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier;

    2015-01-01

    High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty...

  16. MEPD: medaka expression pattern database, genes and more.

    Science.gov (United States)

    Alonso-Barba, Juan I; Rahman, Raza-Ur; Wittbrodt, Joachim; Mateo, Juan L

    2016-01-01

    The Medaka Expression Pattern Database (MEPD; http://mepd.cos.uni-heidelberg.de/) is designed as a repository of medaka expression data for the scientific community. In this update we present two main improvements. First, we have changed the previous clone-centric view for in situ data to a gene-centric view. This is possible because now we have linked all the data present in MEPD to the medaka gene annotation in ENSEMBL. In addition, we have also connected the medaka genes in MEPD to their corresponding orthologous gene in zebrafish, again using the ENSEMBL database. Based on this, we provide a link to the Zebrafish Model Organism Database (ZFIN) to allow researches to compare expression data between these two fish model organisms. As a second major improvement, we have modified the design of the database to enable it to host regulatory elements, promoters or enhancers, expression patterns in addition to gene expression. The combination of gene expression, by traditional in situ, and regulatory element expression, typically by fluorescence reporter gene, within the same platform assures consistency in terms of annotation. In our opinion, this will allow researchers to uncover new insights between the expression domain of genes and their regulatory landscape. PMID:26450962

  17. The IMPROVE_A temperature protocol for thermal/optical carbon analysis: maintaining consistency with a long-term database.

    Science.gov (United States)

    Chow, Judith C; Watson, John G; Chen, L W Antony; Chang, M C Oliver; Robinson, Norman F; Trimble, Dana; Kohl, Steven

    2007-09-01

    Thermally derived carbon fractions including organic carbon (OC) and elemental carbon (EC) have been reported for the U.S. Interagency Monitoring of PROtected Visual Environments (IMPROVE) network since 1987 and have been found useful in source apportionment studies and to evaluate quartz-fiber filter adsorption of organic vapors. The IMPROVE_A temperature protocol defines temperature plateaus for thermally derived carbon fractions of 140 degrees C for OC1, 280 degrees C for OC2, 480 degrees C for OC3, and 580 degrees C for OC4 in a helium (He) carrier gas and 580 degrees C for EC1, 740 degrees C for EC2, and 840 degrees C for EC3 in a 98% He/2% oxygen (O2) carrier gas. These temperatures differ from those used previously because new hardware used for the IMPROVE thermal/optical reflectance (IMPROVE_TOR) protocol better represents the sample temperature than did the old hardware. A newly developed temperature calibration method demonstrates that these temperatures better represent sample temperatures in the older units used to quantify IMPROVE carbon fractions from 1987 through 2004. Only the thermal fractions are affected by changes in temperature. The OC and EC by TOR are insensitive to the change in temperature protocol, and therefore the long-term consistency of the IMPROVE database is conserved. A method to detect small quantities of O2 in the pure He carrier gas shows that O2 levels above 100 ppmv also affect the comparability of thermal carbon fractions but have little effect on the IMPROVE_TOR split between OC and EC. PMID:17912920

  18. The use of database linkage technique for the improvement of information on mother-infant deaths in Ceara, 2006

    Directory of Open Access Journals (Sweden)

    Augediva Maria Jucá Pordeus

    2009-09-01

    Full Text Available Objective: To assess the use of database linkage technique on the improvement of information on mother-infant deaths by recovering not registered and/or ignored variables from the deaths of children under one year old, in the city of Fortaleza, in 2006. Methods: The linkage of database SIM (Mortality Information System and SINASC (Live births Information System was done by selecting common variables of the two systems. Using the Reclink III software there were identified the perfect pairs “Death certificate/Birth certificate” (DO/DN by means of the variables of DO: sex, race / color, birth weight, mother’s age, gestational age, type of pregnancy, type of birth, mother’s occupation and mother’s schooling. Results: There were registered 40,391 live births and 706 deaths of children under one year old. There were identified 516 (73.1% DO with their respective DN. The variables occupation and mother’sm schooling increased from 31.4% and 35.8% to 64.6% and 72.8% of filling, respectively. Regarding the mother’s age, the increase of information was 45.2%. Conclusion: The use of Reclink III software in the routine of health services enabled a large recovery of information that had not been fulfilled in the Death certificate (DO, and therefore, may promote a better understanding of infant mortality in studied populations.

  19. An improved tropospheric ozone database retrieved from SCIAMACHY Limb-Nadir-Matching method

    Science.gov (United States)

    Jia, Jia; Rozanov, Alexei; Ladstätter-Weißenmayer, Annette; Ebojie, Felix; Rahpoe, Nabiz; Bötel, Stefan; Burrows, John

    2015-04-01

    Tropospheric ozone is one of the most important green-house gases and the main component of photochemical smog. It is either transported from the stratosphere or photochemically produced during pollution events in the troposphere that threaten the respiratory system. To investigate sources, transport mechanisms of tropospheric ozone in a global view, limb nadir matching (LNM) technique applied with SCIAMACHY instrument is used to retrieve tropospheric ozone. With the fact that 90% ozone is located in the stratosphere and only about 10% can be observed in the troposphere, the usage of satellite data requires highly qualified nadir and limb data. In this study we show an improvement of SCIAMACHY limb data as well as its influence on tropospheric ozone results. The limb nadir matching technique is also refined to increase the quality of the tropospheric ozone. The results are validated with ozone sonde measurements.

  20. EST-PAC a web package for EST annotation and protein sequence prediction

    Directory of Open Access Journals (Sweden)

    Strahm Yvan

    2006-10-01

    Full Text Available Abstract With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1 searching local or remote biological databases for sequence similarities using Blast services, 2 predicting protein coding sequence from EST data and, 3 annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics.

  1. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  2. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  3. The UCSC Genome Browser Database: 2008 update

    DEFF Research Database (Denmark)

    Karolchik, D; Kuhn, R M; Baertsch, R;

    2007-01-01

    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate...... and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human...... variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving...

  4. Aldo-keto reductase (AKR) superfamily: genomics and annotation.

    Science.gov (United States)

    Mindnich, Rebekka D; Penning, Trevor M

    2009-07-01

    Aldo-keto reductases (AKRs) are phase I metabolising enzymes that catalyse the reduced nicotinamide adenine dinucleotide (phosphate) (NAD(P)H)-dependent reduction of carbonyl groups to yield primary and secondary alcohols on a wide range of substrates, including aliphatic and aromatic aldehydes and ketones, ketoprostaglandins, ketosteroids and xenobiotics. In so doing they functionalise the carbonyl group for conjugation (phase II enzyme reactions). Although functionally diverse, AKRs form a protein superfamily based on their high sequence identity and common protein fold, the (alpha/beta) 8 -barrel structure. Well over 150 AKR enzymes, from diverse organisms, have been annotated so far and given systematic names according to a nomenclature that is based on multiple protein sequence alignment and degree of identity. Annotation of non-vertebrate AKRs at the National Center for Biotechnology Information or Vertebrate Genome Annotation (vega) database does not often include the systematic nomenclature name, so the most comprehensive overview of all annotated AKRs is found on the AKR website (http://www.med.upenn.edu/akr/). This site also hosts links to more detailed and specialised information (eg on crystal structures, gene expression and single nucleotide polymorphisms [SNPs]). The protein-based AKR nomenclature allows unambiguous identification of a given enzyme but does not reflect the wealth of genomic and transcriptomic variation that exists in the various databases. In this context, identification of putative new AKRs and their distinction from pseudogenes are challenging. This review provides a short summary of the characteristic features of AKR biochemistry and structure that have been reviewed in great detail elsewhere, and focuses mainly on nomenclature and database entries of human AKRs that so far have not been subject to systematic annotation. Recent developments in the annotation of SNP and transcript variance in AKRs are also summarised. PMID:19706366

  5. Aldo-keto reductase (AKR superfamily: Genomics and annotation

    Directory of Open Access Journals (Sweden)

    Mindnich Rebekka D

    2009-07-01

    Full Text Available Abstract Aldo-keto reductases (AKRs are phase I metabolising enzymes that catalyse the reduced nicotinamide adenine dinucleotide (phosphate (NAD(PH-dependent reduction of carbonyl groups to yield primary and secondary alcohols on a wide range of substrates, including aliphatic and aromatic aldehydes and ketones, ketoprostaglan-dins, ketosteroids and xenobiotics. In so doing they functionalise the carbonyl group for conjugation (phase II enzyme reactions. Although functionally diverse, AKRs form a protein superfamily based on their high sequence identity and common protein fold, the (α/(β8-barrel structure. Well over 150 AKR enzymes, from diverse organisms, have been annotated so far and given systematic names according to a nomenclature that is based on multiple protein sequence alignment and degree of identity. Annotation of non-vertebrate AKRs at the National Center for Biotechnology Information or Vertebrate Genome Annotation (vega database does not often include the systematic nomenclature name, so the most comprehensive overview of all annotated AKRs is found on the AKR website (http://www.med.upenn.edu/akr/. This site also hosts links to more detailed and specialised information (eg on crystal structures, gene expression and single nucleotide polymorphisms [SNPs]. The protein-based AKR nomenclature allows unambiguous identification of a given enzyme but does not reflect the wealth of genomic and transcriptomic variation that exists in the various databases. In this context, identification of putative new AKRs and their distinction from pseudogenes are challenging. This review provides a short summary of the characteristic features of AKR biochemistry and structure that have been reviewed in great detail elsewhere, and focuses mainly on nomenclature and database entries of human AKRs that so far have not been subject to systematic annotation. Recent developments in the annotation of SNP and transcript variance in AKRs are also summarised.

  6. Fire-induced water-repellent soils, an annotated bibliography

    Science.gov (United States)

    Kalendovsky, M.A.; Cannon, S.H.

    1997-01-01

    The development and nature of water-repellent, or hydrophobic, soils are important issues in evaluating hillslope response to fire. The following annotated bibliography was compiled to consolidate existing published research on the topic. Emphasis was placed on the types, causes, effects and measurement techniques of water repellency, particularly with respect to wildfires and prescribed burns. Each annotation includes a general summary of the respective publication, as well as highlights of interest to this focus. Although some references on the development of water repellency without fires, the chemistry of hydrophobic substances, and remediation of water-repellent conditions are included, coverage of these topics is not intended to be comprehensive. To develop this database, the GeoRef, Agricola, and Water Resources Abstracts databases were searched for appropriate references, and the bibliographies of each reference were then reviewed for additional entries. Additional references will be added to this bibliography as they become available. The annotated bibliography can be accessed on the Web at http://geohazards.cr.usgs.gov/html_files/landslides/ofr97-720/biblio.html. A database consisting of the references and keywords is available through a link at the above address. This database was compiled using EndNote2 plus software by Niles and Associates, and is necessary to search the database.

  7. The volatile compound BinBase mass spectral database

    Directory of Open Access Journals (Sweden)

    Barupal Dinesh K

    2011-08-01

    Full Text Available Abstract Background Volatile compounds comprise diverse chemical groups with wide-ranging sources and functions. These compounds originate from major pathways of secondary metabolism in many organisms and play essential roles in chemical ecology in both plant and animal kingdoms. In past decades, sampling methods and instrumentation for the analysis of complex volatile mixtures have improved; however, design and implementation of database tools to process and store the complex datasets have lagged behind. Description The volatile compound BinBase (vocBinBase is an automated peak annotation and database system developed for the analysis of GC-TOF-MS data derived from complex volatile mixtures. The vocBinBase DB is an extension of the previously reported metabolite BinBase software developed to track and identify derivatized metabolites. The BinBase algorithm uses deconvoluted spectra and peak metadata (retention index, unique ion, spectral similarity, peak signal-to-noise ratio, and peak purity from the Leco ChromaTOF software, and annotates peaks using a multi-tiered filtering system with stringent thresholds. The vocBinBase algorithm assigns the identity of compounds existing in the database. Volatile compound assignments are supported by the Adams mass spectral-retention index library, which contains over 2,000 plant-derived volatile compounds. Novel molecules that are not found within vocBinBase are automatically added using strict mass spectral and experimental criteria. Users obtain fully annotated data sheets with quantitative information for all volatile compounds for studies that may consist of thousands of chromatograms. The vocBinBase database may also be queried across different studies, comprising currently 1,537 unique mass spectra generated from 1.7 million deconvoluted mass spectra of 3,435 samples (18 species. Mass spectra with retention indices and volatile profiles are available as free download under the CC-BY agreement (http

  8. Reliability Of A Surgeon-Reported Morbidity And Mortality Database: A Comparison Of Short-Term Morbidity Between The Scoliosis Research Society And National Surgical Quality Improvement Program Databases

    Science.gov (United States)

    Martin, Christopher T.; Pugely, Andrew J.; Gao, Yubo; Skovrlj, Branko; Lee, Nathan J.; Cho, Samuel K.; Mendoza-Lattes, Sergio

    2016-01-01

    Background There exists a lack of comparison between large national healthcare databases reporting surgical morbidity and mortality. Prior authors have expressed concern that the Scoliosis Research Society (SRS) membership may have underreported complications in spinal surgery. Thus, the purpose of the present study was to compare the incidence of morbidity between the SRS and National Surgical Quality Improvement Program (NSQIP) databases. Methods We reviewed patients enrolled between 2012 and 2013, with a total of 96,875 patients identified in the SRS dataset and 15,909 in the combined adult and pediatric NSQIP dataset. Patients were matched based on diagnostic category,and a univariate analysis was used to compare reported complication rates in the categories of perioperative infection, neurologic injury, and mortality. The SRS database only requires detailed demographic data reporting on patients that have had a complication event. We compared the demographics and comorbidities of this subgroup, and used this as a surrogate to assess the potential magnitude of confounders. Results Small differences existed between the SRS and NSQIP databases in terms of mortality (0.1% v. 0.2%), infection (1.2% v. 2%), and neurologic injury (0.8% v. 0.1%) (p<0.001 for each comparison). Infection rates were consistently lower across multiple diagnostic sub-categories in the SRS database, whereas neurologic injury rates were consistently lower in the NSQIP database. These differences reached statistical significance across several diagnostic subcategories, but the clinical magnitude of the differences was small. Amongst the patients with a complication, modest differences in comorbidities existed between the two cohorts. Conclusion Overall, the incidence of short-term morbidity and mortality was similar between the two databases. There were modest differences in comorbidities, which may explain the small differences observed in morbidity. Concerns regarding possible under

  9. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse.

    Science.gov (United States)

    Blake, Judith A; Bult, Carol J; Eppig, Janan T; Kadin, James A; Richardson, Joel E

    2014-01-01

    The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is the community model organism database resource for the laboratory mouse, a premier animal model for the study of genetic and genomic systems relevant to human biology and disease. MGD maintains a comprehensive catalog of genes, functional RNAs and other genome features as well as heritable phenotypes and quantitative trait loci. The genome feature catalog is generated by the integration of computational and manual genome annotations generated by NCBI, Ensembl and Vega/HAVANA. MGD curates and maintains the comprehensive listing of functional annotations for mouse genes using the Gene Ontology, and MGD curates and integrates comprehensive phenotype annotations including associations of mouse models with human diseases. Recent improvements include integration of the latest mouse genome build (GRCm38), improved access to comparative and functional annotations for mouse genes with expanded representation of comparative vertebrate genomes and new loads of phenotype data from high-throughput phenotyping projects. All MGD resources are freely available to the research community.

  10. Surgery Risk Assessment (SRA) Database

    Data.gov (United States)

    Department of Veterans Affairs — The Surgery Risk Assessment (SRA) database is part of the VA Surgical Quality Improvement Program (VASQIP). This database contains assessments of selected surgical...

  11. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    Directory of Open Access Journals (Sweden)

    Nupoor Chowdhary

    Full Text Available Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2 production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs. Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs, 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach, we strongly

  12. BIOFILTER AS A FUNCTIONAL ANNOTATION PIPELINE FOR COMMON AND RARE COPY NUMBER BURDEN.

    Science.gov (United States)

    Kim, Dokyoon; Lucas, Anastasia; Glessner, Joseph; Verma, Shefali S; Bradford, Yuki; Li, Ruowang; Frase, Alex T; Hakonarson, Hakon; Peissig, Peggy; Brilliant, Murray; Ritchie, Marylyn D

    2016-01-01

    Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter - a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record - total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to

  13. Annotating Speech Corpus for Prosody Modeling in Indian Language Text to Speech Systems

    Directory of Open Access Journals (Sweden)

    Kiruthiga S

    2012-01-01

    Full Text Available A spoken language system, it may either be a speech synthesis or a speech recognition system, starts with building a speech corpora. We give a detailed survey of issues and a methodology that selects the appropriate speech unit in building a speech corpus for Indian language Text to Speech systems. The paper ultimately aims to improve the intelligibility of the synthesized speech in Text to Speech synthesis systems. To begin with, an appropriate text file should be selected for building the speech corpus. Then a corresponding speech file is generated and stored. This speech file is the phonetic representation of the selected text file. The speech file is processed in different levels viz., paragraphs, sentences, phrases, words, syllables and phones. These are called the speech units of the file. Researches have been done taking these units as the basic unit for processing. This paper analyses the researches done using phones, diphones, triphones, syllables and polysyllables as their basic unit for speech synthesis. The paper also provides a recommended set of combinations for polysyllables. Concatenative speech synthesis involves the concatenation of these basic units to synthesize an intelligent, natural sounding speech. The speech units are annotated with relevant prosodic information about each unit, manually or automatically, based on an algorithm. The database consisting of the units along with their annotated information is called as the annotated speech corpus. A Clustering technique is used in the annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit.

  14. Semantic annotation for live and posterity logging of video documents

    Science.gov (United States)

    Bertini, Marco; Del Bimbo, Alberto; Nunziati, W.

    2003-06-01

    Broadcasters usually envision two basic applications for video databases: Live Logging and Posterity Logging. The former aims at providing effective annotation of video in quasi-real time and supports extraction of meaningful clips from the live stream; it is usually performed by assistant producers working at the same location of the event. The latter provides annotation for later reuse of video material and is the prerequisite for retrieval by content from video digital libraries; it is performed by trained librarians. Both require that annotation is performed, at a great extent, automatically. Video information structure must encompass both low-intermediate level video organization and event relationships that define specific highlights and situations. Analysis of the visual data of the video stream permits to extract hints, identify events and detect highlights. All of this must be supported by a-priori knowledge of the video domain and effective reasoning engines capable to capture the inherent semantics of the visual events.

  15. Image Annotation by Latent Community Detection and Multikernel Learning.

    Science.gov (United States)

    Gu, Yun; Qian, Xueming; Li, Qing; Wang, Meng; Hong, Richang; Tian, Qi

    2015-11-01

    Automatic image annotation is an attractive service for users and administrators of online photo sharing websites. In this paper, we propose an image annotation approach that exploits latent semantic community of labels and multikernel learning (LCMKL). First, a concept graph is constructed for labels indicating the relationship between the concepts. Based on the concept graph, semantic communities are explored using an automatic community detection method. For an image to be annotated, a multikernel support vector machine is used to determine the image's latent community from its visual features. Then, a candidate label ranking based approach is determined by intracommunity and intercommunity ranking. Experiments on the NUS-WIDE database and IAPR TC-12 data set demonstrate that LCMKL outperforms some state-of-the-art approaches. PMID:26068319

  16. Improving database design teaching in secondary education: action research implementation for documentation of didactic requirements and strategies

    OpenAIRE

    Fessakis, George; Dimitracopoulou, Angelique; Komis, Vassilis

    2005-01-01

    This is the authors version of the journal article, published in Computers in Human Behavior © Elsevier Ltd. 2005 available at http://dx.doi.org/10.1016/j.chb.2004.06.006 Database design and use has educational interest for utilitarian and learning reasons. Database technology has significant economic impact and the demand for database design can not be covered by the existent educated experts. Furthermore the database management systems available at schools could be used for the design an...

  17. Sentiment Analysis of Document Based on Annotation

    CERN Document Server

    Shukla, Archana

    2011-01-01

    I present a tool which tells the quality of document or its usefulness based on annotations. Annotation may include comments, notes, observation, highlights, underline, explanation, question or help etc. comments are used for evaluative purpose while others are used for summarization or for expansion also. Further these comments may be on another annotation. Such annotations are referred as meta-annotation. All annotation may not get equal weightage. My tool considered highlights, underline as well as comments to infer the collective sentiment of annotators. Collective sentiments of annotators are classified as positive, negative, objectivity. My tool computes collective sentiment of annotations in two manners. It counts all the annotation present on the documents as well as it also computes sentiment scores of all annotation which includes comments to obtain the collective sentiments about the document or to judge the quality of document. I demonstrate the use of tool on research paper.

  18. Fast T Wave Detection Calibrated by Clinical Knowledge with Annotation of P and T Waves

    Directory of Open Access Journals (Sweden)

    Mohamed Elgendi

    2015-07-01

    Full Text Available Background: There are limited studies on the automatic detection of T waves in arrhythmic electrocardiogram (ECG signals. This is perhaps because there is no available arrhythmia dataset with annotated T waves. There is a growing need to develop numerically-efficient algorithms that can accommodate the new trend of battery-driven ECG devices. Moreover, there is also a need to analyze long-term recorded signals in a reliable and time-efficient manner, therefore improving the diagnostic ability of mobile devices and point-of-care technologies. Methods: Here, the T wave annotation of the well-known MIT-BIH arrhythmia database is discussed and provided. Moreover, a simple fast method for detecting T waves is introduced. A typical T wave detection method has been reduced to a basic approach consisting of two moving averages and dynamic thresholds. The dynamic thresholds were calibrated using four clinically known types of sinus node response to atrial premature depolarization (compensation, reset, interpolation, and reentry. Results: The determination of T wave peaks is performed and the proposed algorithm is evaluated on two well-known databases, the QT and MIT-BIH Arrhythmia databases. The detector obtained a sensitivity of 97.14% and a positive predictivity of 99.29% over the first lead of the validation databases (total of 221,186 beats. Conclusions: We present a simple yet very reliable T wave detection algorithm that can be potentially implemented on mobile battery-driven devices. In contrast to complex methods, it can be easily implemented in a digital filter design.

  19. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  20. Caliper Context Annotation Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-09-30

    To understand the performance of parallel programs, developers need to be able to relate performance measurement data with context information, such as the call path / line numbers or iteration numbers where measurements were taken. Caliper provides a generic way to specify and collect multi-dimensional context information across the software stack, and provide ti to third-party measurement tools or write it into a file or database in the form of context streams.

  1. Representing NCBO Annotator results in standard RDF with the Annotation Ontology

    OpenAIRE

    Melzi, Soumia; Jonquet, Clement

    2014-01-01

    International audience; Semantic annotation is part of the Semantic Web vision. The Annotation Ontology is a model that have been proposed to represent any annotations in standard RDF. The NCBO Annotator Web service is a broadly used service for annotations in the biomedical domain, offered within the BioPortal platform and giving access to more than 350+ ontologies. This paper presents a new output format to represent the NCBO Annotator results in RDF with the Annotation Ontology. We briefly...

  2. The Disease and Gene Annotations (DGA): an annotation resource for human disease.

    Science.gov (United States)

    Peng, Kai; Xu, Wei; Zheng, Jianyong; Huang, Kegui; Wang, Huisong; Tong, Jiansong; Lin, Zhifeng; Liu, Jun; Cheng, Wenqing; Fu, Dong; Du, Pan; Kibbe, Warren A; Lin, Simon M; Xia, Tian

    2013-01-01

    Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.

  3. STUDY ON THE CONCEPT DESIGN PROCESS FOR THE CRASHWORTHINESS IMPROVEMENT OF AN AUTOMOTIVE BODY STRUCTURE USING A KNOWLEDGEBASED DATABASE

    Directory of Open Access Journals (Sweden)

    W. Y. Ki

    2016-04-01

    Full Text Available The purpose of this study is to propose a concept design process for crashworthiness improvement of automotive body structure using technical information on the major joints and members of vehicles. First, in order to collect the technical information on major joints and members, 43 vehicles were selected using benchmark data. The collected technical information for the selected vehicles was the cross sectional shapes of each joint and member which were used for the analysis of joint stiffness, crashworthiness and static stiffness of the member to make a database along with cross section properties. This study applied a statistical technique to perform a concept design of an automotive body using the analyzed information and selected cross section meeting the design objectives. The criteria for the selection of the cross section were defined by subdividing the defined design objectives of an automotive body structure and constraints into members and joints. In order to configure an analysis model of an automotive body structure using the selected cross section, a shape parametric model was used and crashworthiness was assessed to evaluate the configured automotive body structure. The evaluation result showed that the crashworthiness was improved by 15% and 12% respectively compared to an existing body structure. In addition, the weight of the body structure was reduced by 4.2%. Through this study, the process that can rapidly and effectively derive and evaluate the concept design of an automotive body structure was defined. It is expected that, henceforth, this process will be helpful for the study of automotive body structures.

  4. Meteor showers an annotated catalog

    CERN Document Server

    Kronk, Gary W

    2014-01-01

    Meteor showers are among the most spectacular celestial events that may be observed by the naked eye, and have been the object of fascination throughout human history. In “Meteor Showers: An Annotated Catalog,” the interested observer can access detailed research on over 100 annual and periodic meteor streams in order to capitalize on these majestic spectacles. Each meteor shower entry includes details of their discovery, important observations and orbits, and gives a full picture of duration, location in the sky, and expected hourly rates. Armed with a fuller understanding, the amateur observer can better view and appreciate the shower of their choice. The original book, published in 1988, has been updated with over 25 years of research in this new and improved edition. Almost every meteor shower study is expanded, with some original minor showers being dropped while new ones are added. The book also includes breakthroughs in the study of meteor showers, such as accurate predictions of outbursts as well ...

  5. Collaborative Semantic Annotation of Images : Ontology-Based Model

    Directory of Open Access Journals (Sweden)

    Damien E. ZOMAHOUN

    2015-12-01

    Full Text Available In the quest for models that could help to represen t the meaning of images, some approaches have used contextual knowledge by building semantic hierarchi es. Others have resorted to the integration of imag es analysis improvement knowledge and images interpret ation using ontologies. The images are often annotated with a set of keywords (or ontologies, w hose relevance remains highly subjective and relate d to only one interpretation (one annotator. However , an image can get many associated semantics because annotators can interpret it differently. Th e purpose of this paper is to propose a collaborati ve annotation system that brings out the meaning of im ages from the different interpretations of annotato rs. The different works carried out in this paper lead to a semantic model of an image, i.e. the different means that a picture may have. This method relies o n the different tools of the Semantic Web, especial ly ontologies.

  6. Enriching a biomedical event corpus with meta-knowledge annotation

    Directory of Open Access Journals (Sweden)

    Thompson Paul

    2011-10-01

    information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.

  7. NCBI prokaryotic genome annotation pipeline.

    Science.gov (United States)

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/. PMID:27342282

  8. NCBI prokaryotic genome annotation pipeline.

    Science.gov (United States)

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

  9. Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

    Energy Technology Data Exchange (ETDEWEB)

    Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana; Purvine, Samuel O.; Sanford, James; Monroe, Matthew E.; Brewer, Heather M.; Payne, Samuel H.; Ansong, Charles; Frank, Bryan C.; Smith, Richard D.; Peterson, Scott; Motin, Vladimir L.; Adkins, Joshua N.

    2012-03-27

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent

  10. The UCSC Genome Browser database: 2015 update

    OpenAIRE

    Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert

    2014-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple al...

  11. Improvement of the efficiency of artificial insemination services through the use of radioimmunoassay and a computer database application

    International Nuclear Information System (INIS)

    A study was conducted at several locations in four provinces of Indonesia to evaluate and increase the efficiency of artificial insemination (AI) services provided to cattle farmers and to improve the feeding and reproductive management practices. Radioimmunoassay (RIA) for progesterone measurement was used together with the computer program Artificial Insemination Database Application (AIDA) to monitor the success of AI and for the early diagnosis of non-pregnancy and reproductive disorders in dairy and beef cattle. Baseline surveys showed that the average calving to first service interval (CFSI) ranged from 121.3 ± 78.2 days in West Java to 203.5 ± 118.3 in West Sumatra, and the conception rate (CR) to first AI ranged from 27% in South Sulawesi to 44% in West Java. Supplementary feeding with urea-molasses multi-nutrient blocks (UMMB) combined with training of farmers on improved husbandry practices reduced the CFSI from 150.6 ± 66.3 days to 102.3 ± 36.5 days and increased the CR from 27% to 49% in South Sulawesi. Similar interventions in West Java reduced the CFSI from 121.3 ± 78.2 days to 112.1 ± 80.9 days and increased the CR from 34% to 37%. Results from measurement of progesterone in milk or blood samples collected on days 0, 10-12 and 22-24 after AI showed that 25% of the animals were non-cyclic or anovulatory, while 8.7% were pregnant at the time of AI. Investigation of cows with breeding problems using measurement of progesterone in combination with clinical examination revealed a range of problems, including true anoestrus, sub-oestrus or missed oestrus, persistent CL and luteal cysts. The ability to make an accurate diagnosis enabled the provision of appropriate advice or treatment for overcoming the problems. Anti-progesterone serum and 125I-Progesterone tracer for use in RIA were produced locally and were found to have acceptable characteristics. The tracer had good specific activity and stability for up to 12 weeks. The production of standards

  12. DBGC: A Database of Human Gastric Cancer

    OpenAIRE

    Chao Wang; Jun Zhang; Mingdeng Cai; Zhenggang Zhu; Wenjie Gu; Yingyan Yu; Xiaoyan Zhang

    2015-01-01

    The Database of Human Gastric Cancer (DBGC) is a comprehensive database that integrates various human gastric cancer-related data resources. Human gastric cancer-related transcriptomics projects, proteomics projects, mutations, biomarkers and drug-sensitive genes from different sources were collected and unified in this database. Moreover, epidemiological statistics of gastric cancer patients in China and clinicopathological information annotated with gastric cancer cases were also integrated...

  13. Uncertainty modeling for ontology-based mammography annotation with intelligent BI-RADS scoring.

    Science.gov (United States)

    Bulu, Hakan; Alpkocak, Adil; Balci, Pinar

    2013-05-01

    This paper presents an ontology-based annotation system and BI-RADS (Breast Imaging Reporting and Data System) score reasoning with Semantic Web technologies in mammography. The annotation system is based on the Mammography Annotation Ontology (MAO) where the BI-RADS score reasoning works. However, ontologies are based on crisp logic and they cannot handle uncertainty. Consequently, we propose a Bayesian-based approach to model uncertainty in mammography ontology and make reasoning possible using BI-RADS scores with SQWRL (Semantic Query-enhanced Web Rule Language). First, we give general information about our system and present details of mammography annotation ontology, its main concepts and relationships. Then, we express uncertainty in mammography and present approaches to handle uncertainty issues. System is evaluated with a manually annotated dataset DEMS (Dokuz Eylul University Mammography Set) and DDSM (Digital Database for Screening Mammography). We give the result of experimentations in terms of accuracy, sensitivity, precision and uncertainty level measures.

  14. eID: A System for Exploration of Image Databases.

    Science.gov (United States)

    Stan, Daniela; Sethi, Ishwar K.

    2003-01-01

    Describes an exploration system for large image databases. The system, which consists of three stages, allows user to interpret and annotate an image in the context in which that image appears, dramatically reducing the time taken to annotate a large collection of images. Includes 25 figures and two tables. (AEF)

  15. A Human-Curated Annotation of the Candida albicans Genome.

    Directory of Open Access Journals (Sweden)

    2005-07-01

    Full Text Available Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications.

  16. The UCSC Genome Browser Database: update 2006

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R;

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, m......RNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety...... of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering...

  17. AphidBase: A centralized bioinformatic resource for annotation of the pea aphid genome

    OpenAIRE

    Legeai, Fabrice; Shigenobu, Shuji; Gauthier, Jean-Pierre; Colbourne, John; Rispe, Claude; Collin, Olivier; Richards, Stephen; Wilson, Alex C. C.; Tagu, Denis

    2010-01-01

    AphidBase is a centralized bioinformatic resource that was developed to facilitate community annotation of the pea aphid genome by the International Aphid Genomics Consortium (IAGC). The AphidBase Information System designed to organize and distribute genomic data and annotations for a large international community was constructed using open source software tools from the Generic Model Organism Database (GMOD). The system includes Apollo and GBrowse utilities as well as a wiki, blast search c...

  18. 提高数据库系统教学质量的几点思考%Several Thoughts about Improving Quality of Database System

    Institute of Scientific and Technical Information of China (English)

    程录庆

    2011-01-01

    通过对数据库技术专业人才能力结构和特征的研究,提出数据库系统的知识结构,分析当前高校数据库系统相关课程的教学现状,从数据库系统知识结构、教学层次的区分、实践教学的创新等几个方面,探讨提高数据库系统教学质量的途径。%Studied the ability structure and characteristics of Database System technology professionals,proposed database system's knowledge structure,it's analyzed the current status related to university teaching about database system knowledge,discussed the ways of improving teaching quality of database systems from several aspects of the knowledge structure of database system,the distinction of teaching level,practice teaching innovation,and so on.

  19. Comparative validation of the D. melanogaster modENCODE transcriptome annotation

    OpenAIRE

    Chen, Zhen-Xia; Sturgill, David; Qu, Jiaxin; Jiang, Huaiyang; Park, Soo; Boley, Nathan; Suzuki, Ana Maria; Anthony R. Fletcher; David C Plachetzki; FitzGerald, Peter C.; Artieri, Carlo G.; Atallah, Joel; Barmina, Olga; Brown, James B.; Blankenburg, Kerstin P

    2014-01-01

    Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the ...

  20. Data mining of public SNP databases for the selection of intragenic SNPs

    NARCIS (Netherlands)

    Aerts, J.; Wetzels, Y.; Cohen, N.; Aerssens, J.

    2002-01-01

    Different strategies to search public single nucleotide polymorphism (SNP) databases for intragenic SNPs were evaluated. First, we assembled a strategy to annotate SNPs onto candidate genes based on a BLAST search of public SNP databases (Intragenic SNP Annotation by BLAST, ISAB). Only BLAST hits th

  1. HPIDB 2.0: a curated database for host–pathogen interactions

    Science.gov (United States)

    Ammari, Mais G.; Gresham, Cathy R.; McCarthy, Fiona M.; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host–pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host–pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  2. HPIDB 2.0: a curated database for host-pathogen interactions.

    Science.gov (United States)

    Ammari, Mais G; Gresham, Cathy R; McCarthy, Fiona M; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host-pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host-pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  3. : a database of ciliate genome rearrangements.

    Science.gov (United States)

    Burns, Jonathan; Kukushkin, Denys; Lindblad, Kelsi; Chen, Xiao; Jonoska, Nataša; Landweber, Laura F

    2016-01-01

    Ciliated protists exhibit nuclear dimorphism through the presence of somatic macronuclei (MAC) and germline micronuclei (MIC). In some ciliates, DNA from precursor segments in the MIC genome rearranges to form transcriptionally active genes in the mature MAC genome, making these ciliates model organisms to study the process of somatic genome rearrangement. Similar broad scale, somatic rearrangement events occur in many eukaryotic cells and tumors. The (http://oxytricha.princeton.edu/mds_ies_db) is a database of genome recombination and rearrangement annotations, and it provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila.

  4. Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB

    Directory of Open Access Journals (Sweden)

    Antoni Hermoso

    2007-01-01

    Full Text Available Loops represent an important part of protein structures. The study of loop is critical for two main reasons: First, loops are often involved in protein function, stability and folding. Second, despite improvements in experimental and computational structure prediction methods, modeling the conformation of loops remains problematic. Here, we present a structural classification of loops, ArchDB, a mine of information with application in both mentioned fields: loop structure prediction and function prediction. ArchDB (http://sbi.imim.es/archdb is a database of classified protein loop motifs. The current database provides four different classification sets tailored for different purposes. ArchDB-40, a loop classification derived from SCOP40, well suited for modeling common loop motifs. Since features relevant to loop structure or function can be more easily determined on well-populated clusters, we have developed ArchDB-95, a loop classification derived from SCOP95. This new classification set shows a ∼40% increase in the number of subclasses, and a large 7-fold increase in the number of putative structure/function-related subclasses. We also present ArchDB-EC, a classification of loop motifs from enzymes, and ArchDB-KI, a manually annotated classification of loop motifs from kinases. Information about ligand contacts and PDB sites has been included in all classification sets. Improvements in our classification scheme are described, as well as several new database features, such as the ability to query by conserved annotations, sequence similarity, or uploading 3D coordinates of a protein. The lengths of classified loops range between 0 and 36 residues long. ArchDB offers an exhaustive sampling of loop structures. Functional information about loops and links with related biological databases are also provided. All this information and the possibility to browse/query the database through a web-server outline an useful tool with application in the

  5. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database

    NARCIS (Netherlands)

    Ritari, Jarmo; Salojärvi, Jarkko; Lahti, Leo; Vos, de Willem M.

    2015-01-01

    Background: Current sequencing technology enables taxonomic profiling of microbial ecosystems at high resolution and depth by using the 16S rRNA gene as a phylogenetic marker. Taxonomic assignation of newly acquired data is based on sequence comparisons with comprehensive reference databases to f

  6. Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies.

    NARCIS (Netherlands)

    G.P. Patrinos (George); B. Giardine (Belinda); C. Riemer (Cathy); W. Miller (Webb); D.H. Chui (David); N.P. Anagnou (Nicholas); H. Wajcman (Henri); R.C. Hardison (Ross)

    2004-01-01

    textabstractHbVar (http://globin.cse.psu.edu/globin/hbvar/) is a relational database developed by a multi-center academic effort to provide up-to-date and high quality information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and hemogl

  7. Nationwide quality improvement of groin hernia repair from the Danish Hernia Database of 87,840 patients from 1998 to 2005

    DEFF Research Database (Denmark)

    Kehlet, H.; Bay-Nielsen, M.

    2008-01-01

    BACKGROUND: Increased focus and research on surgical technique and anaesthesia in groin hernia repair have improved outcomes from centres of interest in hernia surgery, but little information is available from nationwide data to document the incorporation of scientific evidence into general...... clinical practice. AIM: To review outcomes after groin hernia repair in Denmark from the Danish Hernia Database 1998-2005 in 87,840 patients. RESULTS: The nationwide Danish hernia collaboration with two annual meetings discussing own results and those of others has led to >50% reduction in reoperation...... rates, increased use of the Lichtenstein hernia technique, higher rate of outpatient surgery, near elimination of regional anaesthesia, and documentation and focus on incidence and mechanisms of chronic pain. CONCLUSION: Establishment of nationwide groin hernia databases leads to general improvement...

  8. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    Directory of Open Access Journals (Sweden)

    Mohit Verma

    Full Text Available Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB, which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology search and comparative gene expression analysis. The current release of CTDB (v2.0 hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.

  9. Analysis of the National Surgical Quality Improvement Program Database in 19,100 Patients Undergoing Implant-Based Breast Reconstruction: Complication Rates With Acellular Dermal Matrix

    OpenAIRE

    Shuster, Marina

    2015-01-01

    Background: The use of acellular dermal matrices has become increasingly popular in immediate and delayed tissue expander/implant–based breast reconstruction. However, it is unclear whether their use is associated with increased postoperative complication rates. Using the American College of Surgeons National Surgical Quality Improvement Program database, the authors assessed baseline differences in demographics and comorbidities with and without acellular dermal matrix and determined whether...

  10. Annotated Bibliography, Grades K-6.

    Science.gov (United States)

    Massachusetts Dept. of Education, Boston. Bureau of Nutrition Education and School Food Services.

    This annotated bibliography on nutrition is for the use of teachers at the elementary grade level. It contains a list of books suitable for reading about nutrition and foods for pupils from kindergarten through the sixth grade. Films and audiovisual presentations for classroom use are also listed. The names and addresses from which these materials…

  11. Bioinformatics for plant genome annotation

    NARCIS (Netherlands)

    Fiers, M.W.E.J.

    2006-01-01

    Large amounts of genome sequence data are available and much more will become available in the near future. A DNA sequence alone has, however, limited use. Genome annotation is required to assign biological interpretation to the DNA sequence. This thesis describ

  12. Annotated Bibliography on Humanistic Education

    Science.gov (United States)

    Ganung, Cynthia

    1975-01-01

    Part I of this annotated bibliography deals with books and articles on such topics as achievement motivation, process education, transactional analysis, discipline without punishment, role-playing, interpersonal skills, self-acceptance, moral education, self-awareness, values clarification, and non-verbal communication. Part II focuses on…

  13. Multicultural Education. An Annotated Bibliography.

    Science.gov (United States)

    Narang, H. L.

    This annotated bibliography contains references to books, journal articles, ERIC documents, doctoral dissertations, and audio-visual materials on the subject of multicultural education. Topics include integrating multiculturalism in school subjects, prejudice and discrimination, intercultural communication, ethnic identity and ethnic bias.…

  14. Nikos Kazantzakis: An Annotated Bibliography.

    Science.gov (United States)

    Qiu, Kui

    This research paper consists of an annotated bibliography about Nikos Kazantzakis, one of the major modern Greek writers and author of "The Last Temptation of Christ,""Zorba the Greek," and many other works. Because of Kazantzakis' position in world literature there are many critical works about him; however, bibliographical control of these works…

  15. HBVRegDB: Annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences

    Directory of Open Access Journals (Sweden)

    Firth Andrew E

    2007-12-01

    Full Text Available Abstract Background The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (~3.2 kb but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. Results These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters and comparative genome analysis results (e.g. blastn, tblastx. It also contains analyses based on curated HBV alignments. Information about conserved regions – including primary conservation (e.g. CDS-Plotcon and RNA secondary structure predictions (e.g. Alidot – is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. Conclusion HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.

  16. MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.

    Science.gov (United States)

    Lugli, Gabriele Andrea; Milani, Christian; Mancabelli, Leonardo; van Sinderen, Douwe; Ventura, Marco

    2016-04-01

    Genome annotation is one of the key actions that must be undertaken in order to decipher the genetic blueprint of organisms. Thus, a correct and reliable annotation is essential in rendering genomic data valuable. Here, we describe a bioinformatics pipeline based on freely available software programs coordinated by a multithreaded script named MEGAnnotator (Multithreaded Enhanced prokaryotic Genome Annotator). This pipeline allows the generation of multiple annotated formats fulfilling the NCBI guidelines for assembled microbial genome submission, based on DNA shotgun sequencing reads, and minimizes manual intervention, while also reducing waiting times between software program executions and improving final quality of both assembly and annotation outputs. MEGAnnotator provides an efficient way to pre-arrange the assembly and annotation work required to process NGS genome sequence data. The script improves the final quality of microbial genome annotation by reducing ambiguous annotations. Moreover, the MEGAnnotator platform allows the user to perform a partial annotation of pre-assembled genomes and includes an option to accomplish metagenomic data set assemblies. MEGAnnotator platform will be useful for microbiologists interested in genome analyses of bacteria as well as those investigating the complexity of microbial communities that do not possess the necessary skills to prepare their own bioinformatics pipeline.

  17. Systems Theory and Communication. Annotated Bibliography.

    Science.gov (United States)

    Covington, William G., Jr.

    This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)

  18. Stackfile Database

    Science.gov (United States)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  19. Database Publication Practices

    DEFF Research Database (Denmark)

    Bernstein, P.A.; DeWitt, D.; Heuer, A.;

    2005-01-01

    There has been a growing interest in improving the publication processes for database research papers. This panel reports on recent changes in those processes and presents an initial cut at historical data for the VLDB Journal and ACM Transactions on Database Systems.......There has been a growing interest in improving the publication processes for database research papers. This panel reports on recent changes in those processes and presents an initial cut at historical data for the VLDB Journal and ACM Transactions on Database Systems....

  20. Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning.

    Science.gov (United States)

    Panwar, Bharat; Menon, Rajasree; Eksi, Ridvan; Li, Hong-Dong; Omenn, Gilbert S; Guan, Yuanfang

    2016-06-01

    The vast majority of human multiexon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform level. In the present study, we used a multiple instance learning (MIL)-based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based 5-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes, and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called "IsoFunc", which is freely available for the global scientific community through http://guanlab.ccmb.med.umich.edu/isofunc . PMID:27142340

  1. A Method of Gene-Function Annotation Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    It is very important in the field of bioinformatics to apply computer to perform the function annotation for new sequenced bio-sequences. Based on GO database and BLAST program, a novel method for the function annotation of new biological sequences is presented by using the variable-precision rough set theory. The proposed method is applied to the real data in GO database to examine its effectiveness. Numerical results show that the proposed method has better precision, recall-rate and harmonic mean value compared with existing methods.

  2. Development and annotation of perennial Triticeae ESTs and SSR markers.

    Science.gov (United States)

    Bushman, B Shaun; Larson, Steve R; Mott, Ivan W; Cliften, Paul F; Wang, Richard R-C; Chatterton, N Jerry; Hernandez, Alvaro G; Ali, Shahjahan; Kim, Ryan W; Thimmapuram, Jyothi; Gong, George; Liu, Lei; Mikel, Mark A

    2008-10-01

    Triticeae contains hundreds of species of both annual and perennial types. Although substantial genomic tools are available for annual Triticeae cereals such as wheat and barley, the perennial Triticeae lack sufficient genomic resources for genetic mapping or diversity research. To increase the amount of sequence information available in the perennial Triticeae, three expressed sequence tag (EST) libraries were developed and annotated for Pseudoroegneria spicata, a mixture of both Elymus wawawaiensis and E. lanceolatus, and a Leymus cinereus x L. triticoides interspecific hybrid. The ESTs were combined into unigene sets of 8 780 unigenes for P. spicata, 11 281 unigenes for Leymus, and 7 212 unigenes for Elymus. Unigenes were annotated based on putative orthology to genes from rice, wheat, barley, other Poaceae, Arabidopsis, and the non-redundant database of the NCBI. Simple sequence repeat (SSR) markers were developed, tested for amplification and polymorphism, and aligned to the rice genome. Leymus EST markers homologous to rice chromosome 2 genes were syntenous on Leymus homeologous groups 6a and 6b (previously 1b), demonstrating promise for in silico comparative mapping. All ESTs and SSR markers are available on an EST information management and annotation database (http://titan.biotec.uiuc.edu/triticeae/). PMID:18923529

  3. Use of Annotations for Component and Framework Interoperability

    Science.gov (United States)

    David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

    2009-12-01

    The popular programming languages Java and C# provide annotations, a form of meta-data construct. Software frameworks for web integration, web services, database access, and unit testing now take advantage of annotations to reduce the complexity of APIs and the quantity of integration code between the application and framework infrastructure. Adopting annotation features in frameworks has been observed to lead to cleaner and leaner application code. The USDA Object Modeling System (OMS) version 3.0 fully embraces the annotation approach and additionally defines a meta-data standard for components and models. In version 3.0 framework/model integration previously accomplished using API calls is now achieved using descriptive annotations. This enables the framework to provide additional functionality non-invasively such as implicit multithreading, and auto-documenting capabilities while achieving a significant reduction in the size of the model source code. Using a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework. Since models and modeling components are not directly bound to framework by the use of specific APIs and/or data types they can more easily be reused both within the framework as well as outside of it. To study the effectiveness of an annotation based framework approach with other modeling frameworks, a framework-invasiveness study was conducted to evaluate the effects of framework design on model code quality. A monthly water balance model was implemented across several modeling frameworks and several software metrics were collected. The metrics selected were measures of non-invasive design methods for modeling frameworks from a software engineering perspective. It appears that the use of annotations positively impacts several software quality measures. In a next step, the PRMS model was implemented in OMS 3.0 and is currently being implemented for water supply forecasting in the

  4. Annotating images by mining image search results

    NARCIS (Netherlands)

    X.J. Wang; L. Zhang; X. Li; W.Y. Ma

    2008-01-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results

  5. Are clickthrough data reliable as image annotations?

    NARCIS (Netherlands)

    Tsikrika, T.; Diou, C.; Vries, A.P. de; Delopoulos, A.

    2009-01-01

    We examine the reliability of clickthrough data as concept-based image annotations, by comparing them against manual annotations, for different concept categories. Our analysis shows that, for many concepts, the image annotations generated by using clickthrough data are reliable, with up to 90% of t

  6. The Ideal Characteristics and Content of a Database and Its Useful Application in Improving the Effectiveness of Direct Marketing Campaigns

    Institute of Scientific and Technical Information of China (English)

    QIN Zhi-chao

    2013-01-01

    Direct marketing is now a wel-known discipline and widely used in almost every industry al around the world. The mid to late 2000s saw a huge growth of direct marketing due to the development of technology and the increasing number of wel-educated marketers(Tapp,2008). According to the UK's Institute of Direct Marketing(as cited in Sargeant &West,2001,p.7), Direct marketing is" the planned recording, analysis and tracking of customer's direct response behaviour over time. .in order to develop future marketing strategies for long term customer loyalty and to ensure continued business growth". As Tapp(2008) points out that database is the core of direct marketing. So what is a database in the field of direct marketing? A definition is given by Tapp(2008,p.32)"A marketing database is a list of customers' and prospects' records that enables strategic analysis, and individual selections for communication and customer service support . The data is organized around the customer".

  7. Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays

    Directory of Open Access Journals (Sweden)

    Cam Margaret C

    2007-03-01

    Full Text Available Abstract Background Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis. Results Using AceView, a comprehensive human transcript database, we have reannotated the probes by matching them to RNA transcripts instead of genes. Based on this transcript-level annotation, a new probe set definition was created in which every probe in a probe set maps to a common set of AceView gene transcripts. In addition, using artificial data sets we identified that a minimal probe set size of 4 is necessary for reliable statistical summarization. We further demonstrate that applying the new probe set definition can detect specific transcript variants contributing to differential expression and it also improves cross-platform concordance. Conclusion We conclude that our transcript-level reannotation and redefinition of probe sets complement the original Affymetrix design. Redefinitions introduce probe sets whose sizes may not support reliable statistical summarization; therefore, we advocate using our transcript-level mapping redefinition in a secondary analysis step rather than as a replacement. Knowing which specific transcripts are differentially expressed is important to properly design probe/primer pairs for validation purposes. For convenience, we have created custom chip-description-files (CDFs and annotation files for our new probe set definitions that are compatible with Bioconductor, Affymetrix Expression Console or third party software.

  8. Enabling Ontology Based Semantic Queries in Biomedical Database Systems.

    Science.gov (United States)

    Zheng, Shuai; Wang, Fusheng; Lu, James; Saltz, Joel

    2012-01-01

    While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations. PMID:23404054

  9. Context, Dependency and Annotation Analysis in Java EE

    OpenAIRE

    Božidar, Darko

    2012-01-01

    The goal of this bachelor’s thesis is to analyze two of Java EE’s features, CDI and annotations, and to use the acquired knowledge to build a simple web application based on CDI and developed annotations. For this purpose it was necessary to clarify what CDI does and what it offers. Previously mentioned features were therefore firstly thoroughly examined to find out what improvements to the Java EE platform, if any, they provide. The main purpose of this thesis is to explore and analyse how t...

  10. Supply Chain Initiatives Database

    Energy Technology Data Exchange (ETDEWEB)

    None

    2012-11-01

    The Supply Chain Initiatives Database (SCID) presents innovative approaches to engaging industrial suppliers in efforts to save energy, increase productivity and improve environmental performance. This comprehensive and freely-accessible database was developed by the Institute for Industrial Productivity (IIP). IIP acknowledges Ecofys for their valuable contributions. The database contains case studies searchable according to the types of activities buyers are undertaking to motivate suppliers, target sector, organization leading the initiative, and program or partnership linkages.

  11. Effects of Teaching Strategies in Annotated Bibliography Writing

    Science.gov (United States)

    Tan-de Ramos, Jennifer

    2015-01-01

    The study examines the effect of teaching strategies to improved writing of students in the tertiary level. Specifically, three teaching approaches--the use of modelling, grammar-based, and information element-focused--were tested on their effect on the writing of annotated bibliography in three research classes at a university in Manila.…

  12. High-fidelity data embedding for image annotation.

    Science.gov (United States)

    He, Shan; Kirovski, Darko; Wu, Min

    2009-02-01

    High fidelity is a demanding requirement for data hiding, especially for images with artistic or medical value. This correspondence proposes a high-fidelity image watermarking for annotation with robustness to moderate distortion. To achieve the high fidelity of the embedded image, we introduce a visual perception model that aims at quantifying the local tolerance to noise for arbitrary imagery. Based on this model, we embed two kinds of watermarks: a pilot watermark that indicates the existence of the watermark and an information watermark that conveys a payload of several dozen bits. The objective is to embed 32 bits of metadata into a single image in such a way that it is robust to JPEG compression and cropping. We demonstrate the effectiveness of the visual model and the application of the proposed annotation technology using a database of challenging photographic and medical images that contain a large amount of smooth regions.

  13. Relational databases

    CERN Document Server

    Bell, D A

    1986-01-01

    Relational Databases explores the major advances in relational databases and provides a balanced analysis of the state of the art in relational databases. Topics covered include capture and analysis of data placement requirements; distributed relational database systems; data dependency manipulation in database schemata; and relational database support for computer graphics and computer aided design. This book is divided into three sections and begins with an overview of the theory and practice of distributed systems, using the example of INGRES from Relational Technology as illustration. The

  14. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

    Directory of Open Access Journals (Sweden)

    Norihiro Maeda

    2006-04-01

    Full Text Available The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts, providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

  15. Design and Evaluation of Data Annotation Workflows for CAVE-like Virtual Environments.

    Science.gov (United States)

    Pick, Sebastian; Weyers, Benjamin; Hentschel, Bernd; Kuhlen, Torsten W

    2016-04-01

    Data annotation finds increasing use in Virtual Reality applications with the goal to support the data analysis process, such as architectural reviews. In this context, a variety of different annotation systems for application to immersive virtual environments have been presented. While many interesting interaction designs for the data annotation workflow have emerged from them, important details and evaluations are often omitted. In particular, we observe that the process of handling metadata to interactively create and manage complex annotations is often not covered in detail. In this paper, we strive to improve this situation by focusing on the design of data annotation workflows and their evaluation. We propose a workflow design that facilitates the most important annotation operations, i.e., annotation creation, review, and modification. Our workflow design is easily extensible in terms of supported annotation and metadata types as well as interaction techniques, which makes it suitable for a variety of application scenarios. To evaluate it, we have conducted a user study in a CAVE-like virtual environment in which we compared our design to two alternatives in terms of a realistic annotation creation task. Our design obtained good results in terms of task performance and user experience.

  16. IMG ER: A System for Microbial Genome Annotation Expert Review and Curation

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Mavromatis, Konstantinos; Ivanova, Natalia N.; Chen, I-Min A.; Chu, Ken; Kyrpides, Nikos C.

    2009-05-25

    A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.

  17. The UCSC genome browser database: update 2007

    DEFF Research Database (Denmark)

    Kuhn, R M; Karolchik, D; Zweig, A S;

    2006-01-01

    The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up...... to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data. The database is optimized for fast interactive performance with web tools that provide powerful visualization and querying capabilities......; an expanded SNP annotation track; and many new display options. The Genome Browser, other tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/....

  18. A new Holocene sea-level database for the US Gulf Coast: Improving constraints for past and future sea levels

    Science.gov (United States)

    Hijma, M.; Tornqvist, T. E.; Hu, P.; Gonzalez, J.; Hill, D. F.; Horton, B. P.; Engelhart, S. E.

    2011-12-01

    The interpretation of present-day sea-level change, as well as the prediction of future relative sea-level (RSL) rise and its spatial variability, depend increasingly on the ability of glacial isostatic adjustment (GIA) models to reveal non-eustatic components of RSL change. GIA results from the redistribution of mass due to the growth and decay of ice sheets. As a consequence, formerly ice-covered areas are still rebounding and currently experience RSL fall, while in other areas the rate of RSL rise is enhanced due to glacial forebulge collapse. The development of GIA models relies to a large extent on the availability of quality-controlled Holocene RSL data. There is thus an urgent need for systematically compiled and publicly available databases of geological RSL data that can be used not only for the purposes mentioned above, but also can serve to underpin coastal management and policy decisions. We have focused our efforts to develop a Holocene sea-level database for the Atlantic and Gulf coasts of the US. Many of the research problems that can be addressed with this sea-level database revolve around the identification of crustal motions due to glacial forebulge collapse that affects the entire region and likely extends beyond South Florida. For the east coast, GIA-related subsidence rates have been calculated with unprecedented precision: age and elevation errors. Many sea-level indicators are related to a specific tide level (e.g., peat that formed between highest astronomical tide and mean high water level). We use paleotidal modeling to account for any changes during the Holocene. We furthermore highlight a number of errors associated with 14C dating that have rarely, if ever, been considered in previous studies of this nature. Second, we show the spatially variable RSL history along the US Gulf Coast. The rates of RSL rise reflect differential GIA, augmented in the Mississippi Delta region by enhanced rates of subsidence due to sediment loading. Similar

  19. Improving quality of breast cancer surgery through development of a national breast cancer surgical outcomes (BRCASO) research database

    International Nuclear Information System (INIS)

    Common measures of surgical quality are 30-day morbidity and mortality, which poorly describe breast cancer surgical quality with extremely low morbidity and mortality rates. Several national quality programs have collected additional surgical quality measures; however, program participation is voluntary and results may not be generalizable to all surgeons. We developed the Breast Cancer Surgical Outcomes (BRCASO) database to capture meaningful breast cancer surgical quality measures among a non-voluntary sample, and study variation in these measures across providers, facilities, and health plans. This paper describes our study protocol, data collection methods, and summarizes the strengths and limitations of these data. We included 4524 women ≥18 years diagnosed with breast cancer between 2003-2008. All women with initial breast cancer surgery performed by a surgeon employed at the University of Vermont or three Cancer Research Network (CRN) health plans were eligible for inclusion. From the CRN institutions, we collected electronic administrative data including tumor registry information, Current Procedure Terminology codes for breast cancer surgeries, surgeons, surgical facilities, and patient demographics. We supplemented electronic data with medical record abstraction to collect additional pathology and surgery detail. All data were manually abstracted at the University of Vermont. The CRN institutions pre-filled 30% (22 out of 72) of elements using electronic data. The remaining elements, including detailed pathology margin status and breast and lymph node surgeries, required chart abstraction. The mean age was 61 years (range 20-98 years); 70% of women were diagnosed with invasive ductal carcinoma, 20% with ductal carcinoma in situ, and 10% with invasive lobular carcinoma. The BRCASO database is one of the largest, multi-site research resources of meaningful breast cancer surgical quality data in the United States. Assembling data from electronic

  20. BBP: Brucella genome annotation with literature mining and curation

    Directory of Open Access Journals (Sweden)

    He Yongqun

    2006-07-01

    Full Text Available Abstract Background Brucella species are Gram-negative, facultative intracellular bacteria that cause brucellosis in humans and animals. Sequences of four Brucella genomes have been published, and various Brucella gene and genome data and analysis resources exist. A web gateway to integrate these resources will greatly facilitate Brucella research. Brucella genome data in current databases is largely derived from computational analysis without experimental validation typically found in peer-reviewed publications. It is partially due to the lack of a literature mining and curation system able to efficiently incorporate the large amount of literature data into genome annotation. It is further hypothesized that literature-based Brucella gene annotation would increase understanding of complicated Brucella pathogenesis mechanisms. Results The Brucella Bioinformatics Portal (BBP is developed to integrate existing Brucella genome data and analysis tools with literature mining and curation. The BBP InterBru database and Brucella Genome Browser allow users to search and analyze genes of 4 currently available Brucella genomes and link to more than 20 existing databases and analysis programs. Brucella literature publications in PubMed are extracted and can be searched by a TextPresso-powered natural language processing method, a MeSH browser, a keywords search, and an automatic literature update service. To efficiently annotate Brucella genes using the large amount of literature publications, a literature mining and curation system coined Limix is developed to integrate computational literature mining methods with a PubSearch-powered manual curation and management system. The Limix system is used to quickly find and confirm 107 Brucella gene mutations including 75 genes shown to be essential for Brucella virulence. The 75 genes are further clustered using COG. In addition, 62 Brucella genetic interactions are extracted from literature publications. These

  1. Multimedia database retrieval technology and applications

    CERN Document Server

    Muneesawang, Paisarn; Guan, Ling

    2014-01-01

    This book explores multimedia applications that emerged from computer vision and machine learning technologies. These state-of-the-art applications include MPEG-7, interactive multimedia retrieval, multimodal fusion, annotation, and database re-ranking. The application-oriented approach maximizes reader understanding of this complex field. Established researchers explain the latest developments in multimedia database technology and offer a glimpse of future technologies. The authors emphasize the crucial role of innovation, inspiring users to develop new applications in multimedia technologies

  2. NUREBASE: database of nuclear hormone receptors

    OpenAIRE

    Duarte, Jorge; Perrière, Guy; Laudet, Vincent; Robinson-Rechavi, Marc

    2002-01-01

    Nuclear hormone receptors are an abundant class of ligand activated transcriptional regulators, found in varying numbers in all animals. Based on our experience of managing the official nomenclature of nuclear receptors, we have developed NUREBASE, a database containing protein and DNA sequences, reviewed protein alignments and phylogenies, taxonomy and annotations for all nuclear receptors. The reviewed NUREBASE is completed by NUREBASE_DAILY, automatically updated every 24 h. Both databases...

  3. Computational annotation of genes differentially expressed along olive fruit development

    Directory of Open Access Journals (Sweden)

    Martinelli Federico

    2009-10-01

    Full Text Available Abstract Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1, completed pit hardening (stage 2 and veraison (stage 3] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B, and 2 and 3 (libraries C and D. All sequenced clones (1,132 in total were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO: cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was

  4. Effective and Efficient Multi-Facet Web Image Annotation

    Institute of Scientific and Technical Information of China (English)

    Jia Chen; Yi-He Zhu; Hao-Fen Wang; Wei Jin; Yong Yu

    2012-01-01

    The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images.The prevalent way is to provide a keyword interface for users to submit queries.However,the amount of images without any tags or annotations are beyond the reach of manual efforts.To overcome this,automatic image annotation techniques emerge,which are generally a process of selecting a suitable set of tags for a given image without user intervention.However,there are three main challenges with respect to Web-scale image annotation:scalability,noiseresistance and diversity.Scalability has a twofold meaning:first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster.Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags.Diversity represents that image content may include both scenes and objects,which are further described by multiple different image features constituting different facets in annotation.In this paper,we propose a unified framework to tackle the above three challenges for automatic Web image annotation.It mainly involves two components:tag candidate retrieval and multi-facet annotation.In the former content-based indexing and concept-based codebook are leveraged to solve scalability and noise-resistance issues.In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets.Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map.Millions of images from Flickr are used in our evaluation.Experimental results show that we have achieved 33% performance

  5. Restauro-G: A Rapid Genome Re-Annotation System for Comparative Genomics

    Institute of Scientific and Technical Information of China (English)

    Satoshi Tamaki; Kazuharu Arakawa; Nobuaki Kono; Masaru Tomita

    2007-01-01

    Annotations of complete genome sequences submitted directly from sequencing projects are diverse in terms of annotation strategies and update frequencies. These inconsistencies make comparative studies difficult. To allow rapid data preparation of a large number of complete genomes, automation and speed are important for genome re-annotation. Here we introduce an open-source rapid genome re-annotation software system, Restauro-G, specialized for bacterial genomes. Restauro-G re-annotates a genome by similarity searches utilizing the BLAST-Like Alignment Tool, referring to protein databases such as UniProt KB, NCBI nr, NCBI COGs, Pfam, and PSORTb. Re-annotation by Restauro-G achieved over 98% accuracy for most bacterial chromosomes in comparison with the original manually curated annotation of EMBL releases. Restauro-G was developed in the generic bioinformatics workbench G-language Genome Analysis Environment and is distributed at http://restauro-g.iab.keio.ac.jp/ under the GNU General Public License.

  6. Biofuel Database

    Science.gov (United States)

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  7. Community Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases...

  8. SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation

    DEFF Research Database (Denmark)

    Panitz, Frank; Stengaard, Henrik; Hornshoj, Henrik;

    2007-01-01

    in public repositories makes it feasible to evaluate SNP predictions on the DNA chromatogram level. MAVIANT, a platform-independent Multipurpose Alignment VIewing and Annotation Tool, provides DNA chromatogram and alignment views and facilitates evaluation of predictions. In addition, it supports direct...... manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non......-synonymous SNPs were analyzed for their potential effect on the protein structure/function using the PolyPhen and SIFT prediction programs. Predicted SNPs and annotations are stored in a web-based database. Using MAVIANT SNPs can visually be verified based on the DNA sequencing traces. A subset of candidate SNPs...

  9. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4).

    Science.gov (United States)

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Tennessen, Kristin; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2016-01-01

    The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provided via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation is followed by functional annotation including assignment of protein product names and connection to various protein family databases. PMID:26918089

  10. Database Administrator

    Science.gov (United States)

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  11. rVarBase: an updated database for regulatory features of human variants

    OpenAIRE

    Guo, Liyuan; Du, Yang; Qu, Susu; Wang, Jing

    2015-01-01

    We present here the rVarBase database (http://rv.psych.ac.cn), an updated version of the rSNPBase database, to provide reliable and detailed regulatory annotations for known and novel human variants. This update expands the database to include additional types of human variants, such as copy number variations (CNVs) and novel variants, and include additional types of regulatory features. Now rVarBase annotates variants in three dimensions: chromatin states of the surrounding regions, overlapp...

  12. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  13. VHLdb: A database of von Hippel-Lindau protein interactors and mutations.

    Science.gov (United States)

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C E

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL's biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  14. [Analysis of the Cochrane Review: Interventions for Improving Upper Limb Function after Stroke. Cochrane Database Syst Rev. 2014,11:CD010820].

    Science.gov (United States)

    Sousa Nanji, Liliana; Torres Cardoso, André; Costa, João; Vaz-Carneiro, António

    2015-01-01

    Impairment of the upper limbs is quite frequent after stroke, making rehabilitation an essential step towards clinical recovery and patient empowerment. This review aimed to synthetize existing evidence regarding interventions for upper limb function improvement after Stroke and to assess which would bring some benefit. The Cochrane Database of Systematic Reviews, the Database of Reviews of Effects and PROSPERO databases were searched until June 2013 and 40 reviews have been included, covering 503 studies, 18 078 participants and 18 interventions, as well as different doses and settings of interventions. The main results were: 1- Information currently available is insufficient to assess effectiveness of each intervention and to enable comparison of interventions; 2- Transcranial direct current stimulation brings no benefit for outcomes of activities of daily living; 3- Moderate-quality evidence showed a beneficial effect of constraint-induced movement therapy, mental practice, mirror therapy, interventions for sensory impairment, virtual reality and repetitive task practice; 4- Unilateral arm training may be more effective than bilateral arm training; 5- Moderate-quality evidence showed a beneficial effect of robotics on measures of impairment and ADLs; 6- There is no evidence of benefit or harm for technics such as repetitive transcranial magnetic stimulation, music therapy, pharmacological interventions, electrical stimulation and other therapies. Currently available evidence is insufficient and of low quality, not supporting clear clinical decisions. High-quality studies are still needed. PMID:26667856

  15. Study of fatigue failures to improve safety based on the analyses of failure database in Japanese nuclear power plant

    International Nuclear Information System (INIS)

    After Fukushima-Daiichi accident, nuclear power plants are highly requested to enhance safety against severe accident. It is important to prevent fatigue failure for operating nuclear power plants, especially pressure boundary of components to ensure safety. This paper studies the failure database of nuclear power plant in Japan (NUCIA) and extracts the events caused by fatigue failure. Fatigue failure can be classified according to fatigue mechanism such as mechanical vibration, fluid vibration, high-cycle thermal fatigue, low-cycle fatigue, and fretting fatigue. The study focuses on the classification of significance in safety function of the system, the trend and the countermeasures taken for each kind of fatigue failure. The detailed analysis of fatigue failure in safety-related components proposes important modification of maintenance program and screening criteria of components to be evaluated to ensure safety in system level. (author)

  16. Studying Oogenesis in a Non-model Organism Using Transcriptomics: Assembling, Annotating, and Analyzing Your Data.

    Science.gov (United States)

    Carter, Jean-Michel; Gibbs, Melanie; Breuker, Casper J

    2016-01-01

    This chapter provides a guide to processing and analyzing RNA-Seq data in a non-model organism. This approach was implemented for studying oogenesis in the Speckled Wood Butterfly Pararge aegeria. We focus in particular on how to perform a more informative primary annotation of your non-model organism by implementing our multi-BLAST annotation strategy. We also provide a general guide to other essential steps in the next-generation sequencing analysis workflow. Before undertaking these methods, we recommend you familiarize yourself with command line usage and fundamental concepts of database handling. Most of the operations in the primary annotation pipeline can be performed in Galaxy (or equivalent standalone versions of the tools) and through the use of common database operations (e.g. to remove duplicates) but other equivalent programs and/or custom scripts can be implemented for further automation. PMID:27557578

  17. An Efficient Annotation of Search Results Based on Feature

    Directory of Open Access Journals (Sweden)

    A. Jebha

    2015-10-01

    Full Text Available  With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB. A result page returned from WDB has multiple search records (SRR.Data units obtained from these databases are encoded into the dynamic resultant pages for manual processing. In order to make these units to be machine process able, relevant information are extracted and labels of data are assigned meaningfully. In this paper, feature ranking is proposed to extract the relevant information of extracted feature from WDB. Feature ranking is practical to enhance ideas of data and identify relevant features. This research explores the performance of feature ranking process by using the linear support vector machines with various feature of WDB database for annotation of relevant results. Experimental result of proposed system provides better result when compared with the earlier methods.

  18. An editing environment for DNA sequence analysis and annotation

    Energy Technology Data Exchange (ETDEWEB)

    Uberbacher, E.C.; Xu, Y.; Shah, M.B.; Olman, V.; Parang, M.; Mural, R.

    1998-12-31

    This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.

  19. MIDST: Interoperability for Semantic Annotations

    Science.gov (United States)

    Atzeni, Paolo; Del Nostro, Pierluigi; Paolozzi, Stefano

    In the last years, interoperability of ontologies and databases has received a lot of attention. However, most of the work has concentrated on specific problems (such as storing an ontology in a database or making database data available to ontologies) and referred to specific models for each of the two. Here, we propose an approach that aims at being more general and model independent. In fact, it works for different dialects for ontologies and for various data models for databases. Also, it supports translations in both directions (ontologies to databases and vice versa) and it allows for flexibility in the translations, so that customization is possible. The proposal extends recent work for schema and data translation (the MIDST project, which implements the ModelGen operator proposed in model management), which relies on a metamodel approach, where data models and variations thereof are described in a common framework and translations are built as compositions of elementary ones.

  20. Adding Value to Large Multimedia Collections through Annotation Technologies and Tools: Serving Communities of Interest.

    Science.gov (United States)

    Shabajee, Paul; Miller, Libby; Dingley, Andy

    A group of research projects based at HP-Labs Bristol, the University of Bristol (England) and ARKive (a new large multimedia database project focused on the worlds biodiversity based in the United Kingdom) are working to develop a flexible model for the indexing of multimedia collections that allows users to annotate content utilizing extensible…

  1. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes

    Science.gov (United States)

    Bhawna; Bonthala, V.S.; Gajula, MNV Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131

  2. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    Science.gov (United States)

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/. PMID:27465131

  3. Human cancer databases (review).

    Science.gov (United States)

    Pavlopoulou, Athanasia; Spandidos, Demetrios A; Michalopoulos, Ioannis

    2015-01-01

    Cancer is one of the four major non‑communicable diseases (NCD), responsible for ~14.6% of all human deaths. Currently, there are >100 different known types of cancer and >500 genes involved in cancer. Ongoing research efforts have been focused on cancer etiology and therapy. As a result, there is an exponential growth of cancer‑associated data from diverse resources, such as scientific publications, genome‑wide association studies, gene expression experiments, gene‑gene or protein‑protein interaction data, enzymatic assays, epigenomics, immunomics and cytogenetics, stored in relevant repositories. These data are complex and heterogeneous, ranging from unprocessed, unstructured data in the form of raw sequences and polymorphisms to well‑annotated, structured data. Consequently, the storage, mining, retrieval and analysis of these data in an efficient and meaningful manner pose a major challenge to biomedical investigators. In the current review, we present the central, publicly accessible databases that contain data pertinent to cancer, the resources available for delivering and analyzing information from these databases, as well as databases dedicated to specific types of cancer. Examples for this wealth of cancer‑related information and bioinformatic tools have also been provided. PMID:25369839

  4. DBMLoc: a Database of proteins with multiple subcellular localizations

    Directory of Open Access Journals (Sweden)

    Zhou Yun

    2008-02-01

    Full Text Available Abstract Background Subcellular localization information is one of the key features to protein function research. Locating to a specific subcellular compartment is essential for a protein to function efficiently. Proteins which have multiple localizations will provide more clues. This kind of proteins may take a high proportion, even more than 35%. Description We have developed a database of proteins with multiple subcellular localizations, designated DBMLoc. The initial release contains 10470 multiple subcellular localization-annotated entries. Annotations are collected from primary protein databases, specific subcellular localization databases and literature texts. All the protein entries are cross-referenced to GO annotations and SwissProt. Protein-protein interactions are also annotated. They are classified into 12 large subcellular localization categories based on GO hierarchical architecture and original annotations. Download, search and sequence BLAST tools are also available on the website. Conclusion DBMLoc is a protein database which collects proteins with more than one subcellular localization annotation. It is freely accessed at http://www.bioinfo.tsinghua.edu.cn/DBMLoc/index.htm.

  5. Project Aloha:indexing, highlighting and annotation

    OpenAIRE

    Fallahkhair, Sanaz; Kennedy, Ian

    2010-01-01

    Lifelong learning requires many skills that are often not taught or are poorly taught. Such skills include speed reading, critical analysis, creative thinking, active reading and even a “little” skill like annotation. There are many ways that readers annotate. A short classification of some ways that reader may annotate includes underlining, using coloured highlighters, interlinear notes, marginal notes, and disassociated notes. This paper presents an investigation into the use of a tool for ...

  6. Circulation Improvement of Articles in Journals written by Non-English Language - Development of a Special Journal Titles Translation List of Journals written in Japanese for the International Bibliographical Database

    OpenAIRE

    Gonda, Mayuki (JAEA); Ikeda, Kiyoshi; Kunii, Katsuhiko (JAEA); NAKAJIMA, Hidemitsu; Itabashi, Keizo (JAEA); Koike, Akemi (TOSS); Igarashi, Ayumi; GreyNet, Grey Literature Network Service

    2010-01-01

    Non-English articles are still "Grey Literature" due to language barriers even though material circulation has improved like English articles with the expansion of the Internet era. In the INIS Database, bibliographic information such as titles and abstracts etc. is written in English. This feature of the INIS Database contributes to improvement of international circulation of scientific information from the nuclear field. However, titles of journals written in non-English languages were desc...

  7. Semantic annotation of Web data applied to risk in food.

    Science.gov (United States)

    Hignette, Gaëlle; Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Doussot, David; Haemmerlé, Ollivier; Mettler, Eric; Soler, Lydie

    2008-11-30

    A preliminary step to risk in food assessment is the gathering of experimental data. In the framework of the Sym'Previus project (http://www.symprevius.org), a complete data integration system has been designed, grouping data provided by industrial partners and data extracted from papers published in the main scientific journals of the domain. Those data have been classified by means of a predefined vocabulary, called ontology. Our aim is to complement the database with data extracted from the Web. In the framework of the WebContent project (www.webcontent.fr), we have designed a semi-automatic acquisition tool, called @WEB, which retrieves scientific documents from the Web. During the @WEB process, data tables are extracted from the documents and then annotated with the ontology. We focus on the data tables as they contain, in general, a synthesis of data published in the documents. In this paper, we explain how the columns of the data tables are automatically annotated with data types of the ontology and how the relations represented by the table are recognised. We also give the results of our experimentation to assess the quality of such an annotation.

  8. Knowledge Annotation maknig implicit knowledge explicit

    CERN Document Server

    Dingli, Alexiei

    2011-01-01

    Did you ever read something on a book, felt the need to comment, took up a pencil and scribbled something on the books' text'? If you did, you just annotated a book. But that process has now become something fundamental and revolutionary in these days of computing. Annotation is all about adding further information to text, pictures, movies and even to physical objects. In practice, anything which can be identified either virtually or physically can be annotated. In this book, we will delve into what makes annotations, and analyse their significance for the future evolutions of the web. We wil

  9. ANNOTATION SUPPORTED OCCLUDED OBJECT TRACKING

    Directory of Open Access Journals (Sweden)

    Devinder Kumar

    2012-08-01

    Full Text Available Tracking occluded objects at different depths has become as extremely important component of study for any video sequence having wide applications in object tracking, scene recognition, coding, editing the videos and mosaicking. The paper studies the ability of annotation to track the occluded object based on pyramids with variation in depth further establishing a threshold at which the ability of the system to track the occluded object fails. Image annotation is applied on 3 similar video sequences varying in depth. In the experiment, one bike occludes the other at a depth of 60cm, 80cm and 100cm respectively. Another experiment is performed on tracking humans with similar depth to authenticate the results. The paper also computes the frame by frame error incurred by the system, supported by detailed simulations. This system can be effectively used to analyze the error in motion tracking and further correcting the error leading to flawless tracking. This can be of great interest to computer scientists while designing surveillance systems etc.

  10. The introduction of the personnel dosimetry information system in Greece designed as a relational database and the improvements achieved

    International Nuclear Information System (INIS)

    Dose record keeping is the making and keeping of personnel dose records for radiation workers. It is an essential part of the process of monitoring the exposure of individuals to radiation and shares in the same objectives. The dose record keeping is becoming more and more critical because of the importance of statistical analysis and epidemiological studies in radiation protection, and of the increasing cooperation and exchange of personnel between countries.The GAEC's personnel dosimetry laboratory assures the personnel dosimetry all over the country and keeps the official central dose record.The personnel dosimetry information system had been established in an electronic form on 1989 in Cobol language. Since then appeared various arguments that imposed the change of the data base used. Some of them are: 1. There was no distinction between establishments and their laboratories. 2. The workers did not have a unique code number. consequently, the total dose of a person working in more than one place could not't be estimated. The workers were directly related to their workplace, so if somebody changed his working place he was treated as a new entry, resulting an overestimation of the number of monitored workers introducing an source of errors in the collective and average dose calculations. 3. With the increasing applications of the ionising radiations many types of dosemeters became indispensable e.g. for beta and gamma, for neutrons and for the extremities. Also, the new category of outside workers appeared requesting a special treatment. All these distinctions were not achievable with the previous system. 4. Last years appeared an increasing, interesting in statistical analysis of the personal doses. A program written in Cobol does not't offer many possibilities and has no flexibility for such analysis. The new information system has been rebuilt under the design of a relational database with more possibilities and more flexibility. (authors)

  11. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  12. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  13. Development and Evaluation of the Emotional Slovenian Speech Database – EmoLUKS

    Directory of Open Access Journals (Sweden)

    Tadej Justin

    2015-12-01

    Full Text Available The paper describes development of the Slovenian emotional speech database for its primary use in speech synthesis. We also explore the potential of additional annotation for extending it for the use in emotion recognition tasks. The paper focus in methodology for annotating paralingual speaker information on the example of annotating speaker emotions in Slovenian radio dramas. Emotional speech database EmoLUKS was built from speech material of 17 Slovenian radio dramas. We obtained them from the national radio-and-television station (RTV Slovenia, which were given to the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states. The annotation of the emotional states was conducted in two stages with our own web-based application for crowdsourcing. Annotating assessments in different time periods with same volunteers allows us to compare the obtained annotator’s decisions, therefore we report about annotator’s decisions consistency. Based on annotators majority vote of each annotated utterance we label speech material and join it to emotional speech database named Emo-LUKS. The material currently consists of 1385 recordings from one male (975 recordings and one female (410 recordings speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of used annotation methodology. We evaluate the consistency of the annotated speech material with the speaker dependent automatic emotion recognition system. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments. Results additionally conrms our presumption, that emotional speech database despite its

  14. Optimizing high performance computing workflow for protein functional annotation.

    Science.gov (United States)

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  15. Functional Annotations of Paralogs: A Blessing and a Curse.

    Science.gov (United States)

    Zallot, Rémi; Harrison, Katherine J; Kolaczkowski, Bryan; de Crécy-Lagard, Valérie

    2016-01-01

    Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines. PMID:27618105

  16. Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data.

    Science.gov (United States)

    Lohse, Marc; Nagel, Axel; Herter, Thomas; May, Patrick; Schroda, Michael; Zrenner, Rita; Tohge, Takayuki; Fernie, Alisdair R; Stitt, Mark; Usadel, Björn

    2014-05-01

    Next-generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan 'BIN' ontology, which is tailored for functional annotation of plant 'omics' data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan-to-GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator.

  17. Database Manager

    Science.gov (United States)

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  18. Functional annotation of human cytomegalovirus gene products: an update

    Directory of Open Access Journals (Sweden)

    Ellen eVan Damme

    2014-05-01

    Full Text Available Human Cytomegalovirus is an opportunistic double-stranded DNA virus with one of the largest viral genomes known. The 235kB genome is divided in a unique long (UL and a unique short (US region which are flanked by terminal and internal repeats. The expression of HCMV genes is highly complex and involves the production of protein coding transcripts, polyadenylated long non-coding RNAs, polyadenylated anti-sense transcripts and a variety of non-polyadenylated RNAs such as microRNAs. Although the function of many of these transcripts is unknown, they are suggested play a direct or regulatory role in the delicately orchestrated processes that ensure HCMV replication and life-long persistence. This review focuses on annotating the complete viral genome based on three sources of information. First, previous reviews were used as a template for the functional keywords to ensure continuity; second, the Uniprot database was used to further enrich the functional database; and finally, the literature was manually curated for novel functions of HCMV gene products. Novel discoveries were discussed in light of the viral life cycle. This functional annotation highlights still poorly understood regions of the genome but most importantly it can give insight in functional clusters and/or may be helpful in the analysis of transcriptomics and proteomics studies.

  19. Re: Pregabalin prescriptions in the United Kingdom – a drug utilisation study of The Health Improvement Network (THIN) primary care database by Asomaning et al

    DEFF Research Database (Denmark)

    Pottegård, Anton; Tjäderborn, M.; Schjerning, Ole;

    2016-01-01

    general practice database. Methods This observational drug utilisation study (DUS) analysed pregabalin prescription data from the UK Health Improvement Network primary care database between September 2004 and July 2009. Patient demographics, diagnoses (by READ codes) and pregabalin dosing data were......) prescribed average daily dose (ADD) of pregabalin for all patients was 150.0 (162.5) mg/day; this was highest in patients with epilepsy (191.9 mg/day), followed by neuropathic pain (158.0 mg/day) and GAD (150.0 mg/day). Only 1.0% (136/13,480) of patients were prescribed an ADD of pregabalin over the maximum...... approved dose of 600 mg/day. Of these, 18.4% (25/136) of patients had a history of substance abuse compared with 14.0% (1884/13,480) in the full population. Conclusion Data from this DUS indicated that the majority of pregabalin prescribing in the UK was consistent with product labelling. The proportion...

  20. DRAM BASED PARAMETER DATABASE OPTIMIZATION

    OpenAIRE

    Marcinkevicius, Tadas

    2012-01-01

    This thesis suggests an improved parameter database implementation for one of Ericsson products. The parameter database is used during the initialization of the system as well as during the later operation. The database size is constantly growing because the parameter database is intended to be used with different hardware configurations. When a new technology platform is released, multiple revisions with additional features and functionalities are later created, resulting in introduction of ...

  1. Hydrogen Leak Detection Sensor Database

    Science.gov (United States)

    Baker, Barton D.

    2010-01-01

    This slide presentation reviews the characteristics of the Hydrogen Sensor database. The database is the result of NASA's continuing interest in and improvement of its ability to detect and assess gas leaks in space applications. The database specifics and a snapshot of an entry in the database are reviewed. Attempts were made to determine the applicability of each of the 65 sensors for ground and/or vehicle use.

  2. Annotation of regular polysemy and underspecification

    DEFF Research Database (Denmark)

    Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

    2013-01-01

    We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...

  3. Ground Truth Annotation in T Analyst

    DEFF Research Database (Denmark)

    2015-01-01

    This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...

  4. Creating Gaze Annotations in Head Mounted Displays

    DEFF Research Database (Denmark)

    Mardanbeigi, Diako; Qvarfordt, Pernilla

    2015-01-01

    To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion, ...

  5. DIMA – Annotation guidelines for German intonation

    DEFF Research Database (Denmark)

    Kügler, Frank; Smolibocki, Bernadett; Arnold, Denis;

    2015-01-01

    easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue...

  6. Using Rhetorical Annotations for Generating Video Documentaries

    NARCIS (Netherlands)

    Bocconi, S.; Nack, F.-M.; Hardman, L.

    2005-01-01

    We use rhetorical annotations to specify a generation process that can assemble meaningful video sequences with a communicative goal and an argumentative progression. Our annotation schema encodes the verbal information contained in the audio channel, identifying the claims the interviewees make and

  7. Using rhetorical annotations for generating video documentaries

    NARCIS (Netherlands)

    Bocconi, S.; Nack, F.-M.; Hardman, L.

    2005-01-01

    We use rhetorical annotations to specify a generation process that can assemble meaningful video sequences with a communicative goal and an argumentative progression. Our annotation schema encodes the verbal information contained in the audio channel, identifying the claims the interviewees make and

  8. GAMOLA: a new local solution for sequence annotation and analyzing draft and finished prokaryotic genomes.

    Science.gov (United States)

    Altermann, Eric; Klaenhammer, Todd R

    2003-01-01

    Laboratories working with draft phase genomes have specific software needs, such as the unattended processing of hundreds of single scaffolds and subsequent sequence annotation. In addition, it is critical to follow the "movement" and the manual annotation of single open reading frames (ORFs) within the successive sequence updates. Even with finished genomes, regular database updates can lead to significant changes in the annotation of single ORFs. In functional genomics it is important to mine data and identify new genetic targets rapidly and easily. Often there is no need for sophisticated relational databases (RDB) that greatly reduce the system-independent access of the results. Another aspect is the internet dependency of most software packages. If users are working with confidential data, this dependency poses a security issue. GAMOLA was designed to handle the numerous scaffolds and changing contents of draft phase genomes in an automated process and stores the results for each predicted ORF in flatfile databases. In addition, annotation transfers, ORF designation tracking, Blast comparisons, and primer design for whole genome microarrays have been implemented. The software is available under the license of North Carolina State University. A website and a downloadable example are accessible under (http://fsweb2.schaub. ncsu.edu/TRKwebsite/index.htm). PMID:14506845

  9. BG7: a new approach for bacterial genome annotation designed for next generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Pablo Pareja-Tobes

    Full Text Available BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version - which is developed in Java, takes advantage of Amazon Web Services (AWS cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.

  10. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  11. The Listeria monocytogenes strain 10403S BioCyc database.

    Science.gov (United States)

    Orsi, Renato H; Bergholz, Teresa M; Wiedmann, Martin; Boor, Kathryn J

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations

  12. The STRING database in 2011

    DEFF Research Database (Denmark)

    Szklarczyk, Damian; Franceschini, Andrea; Kuhn, Michael;

    2011-01-01

    An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and...... present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and...

  13. The integrated web service and genome database for agricultural plants with biotechnology information

    OpenAIRE

    Kim, ChangKug; Park, DongSuk; Seol, YoungJoo; Hahn, JangHo

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information...

  14. Manual Annotation of Translational Equivalence The Blinker Project

    CERN Document Server

    Melamed, I D

    1998-01-01

    Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed . The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially-designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably rel...

  15. Ontology Based Document Annotation: Trends and Open Research Problems

    OpenAIRE

    Corcho, Oscar

    2006-01-01

    Metadata is used to describe documents and applications, improving information seeking and retrieval and its understanding and use. Metadata can be expressed in a wide variety of vocabularies and languages, and can be created and maintained with a variety of tools. Ontology based annotation refers to the process of creating metadata using ontologies as their vocabularies. We present similarities and differences with respect to other approaches for metadata creation, and describe languages and...

  16. Database Replication

    CERN Document Server

    Kemme, Bettina

    2010-01-01

    Database replication is widely used for fault-tolerance, scalability and performance. The failure of one database replica does not stop the system from working as available replicas can take over the tasks of the failed replica. Scalability can be achieved by distributing the load across all replicas, and adding new replicas should the load increase. Finally, database replication can provide fast local access, even if clients are geographically distributed clients, if data copies are located close to clients. Despite its advantages, replication is not a straightforward technique to apply, and

  17. Probabilistic Databases

    CERN Document Server

    Suciu, Dan; Koch, Christop

    2011-01-01

    Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for rep

  18. Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots.

    Science.gov (United States)

    Kumar, Sujai; Jones, Martin; Koutsovoulos, Georgios; Clarke, Michael; Blaxter, Mark

    2013-01-01

    Generating the raw data for a de novo genome assembly project for a target eukaryotic species is relatively easy. This democratization of access to large-scale data has allowed many research teams to plan to assemble the genomes of non-model organisms. These new genome targets are very different from the traditional, inbred, laboratory-reared model organisms. They are often small, and cannot be isolated free of their environment - whether ingested food, the surrounding host organism of parasites, or commensal and symbiotic organisms attached to or within the individuals sampled. Preparation of pure DNA originating from a single species can be technically impossible, but assembly of mixed-organism DNA can be difficult, as most genome assemblers perform poorly when faced with multiple genomes in different stoichiometries. This class of problem is common in metagenomic datasets that deliberately try to capture all the genomes present in an environment, but replicon assembly is not often the goal of such programs. Here we present an approach to extracting, from mixed DNA sequence data, subsets that correspond to single species' genomes and thus improving genome assembly. We use both numerical (proportion of GC bases and read coverage) and biological (best-matching sequence in annotated databases) indicators to aid partitioning of draft assembly contigs, and the reads that contribute to those contigs, into distinct bins that can then be subjected to rigorous, optimized assembly, through the use of taxon-annotated GC-coverage plots (TAGC plots). We also present Blobsplorer, a tool that aids exploration and selection of subsets from TAGC-annotated data. Partitioning the data in this way can rescue poorly assembled genomes, and reveal unexpected symbionts and commensals in eukaryotic genome projects. The TAGC plot pipeline script is available from https://github.com/blaxterlab/blobology, and the Blobsplorer tool from https://github.com/mojones/Blobsplorer.

  19. Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots

    Directory of Open Access Journals (Sweden)

    Sujai eKumar

    2013-11-01

    Full Text Available Generating the raw data for a de novo genome assembly project for a target eukaryotic species is relatively easy. This democratisation of access to large-scale data has allowed many research teams to plan to assemble the genomes of non-model organisms. These new genome targets are very different from the traditional, inbred, laboratory reared model organisms. They are often small, and cannot be isolated free of their environment - whether ingested food, the surrounding host organism of parasites, or commensal and symbiotic organisms attached to or within the individuals sampled. Preparation of pure DNA originating from a single species can be technically impossible, but assembly of mixed-organism DNA can be difficult, as most genome assemblers perform poorly when faced with multiple genomes in different stoichiometries. This class of problem is common in metagenomic datasets that deliberately try to capture all the genomes present in an environment, but replicon assembly is not often the goal of such programmes. Here we present an approach to extracting from mixed DNA sequence data subsets that correspond to single species' genomes and thus improving genome assembly. We use both numerical (proportion of GC bases and read coverage and biological (best-matching sequence in annotated databases indicators to aid partitioning of draft assembly contigs, and the reads that contribute to those contigs, into distinct bins that can then be subjected to rigorous, optimised assembly, through the use of taxon-annotated GC-coverage plots (TAGC plots. We also present Blobsplorer, a tool that aids exploration and selection of subsets from TAGC annotated data. Partitioning the data in this way can rescue poorly assembled genomes, and reveal unexpected symbionts and commensals in eukaryotic genome projects. The TAGC plot pipeline script is available from http://github.com/blaxterlab/blobology, and the Blobsplorer tool from https://github.com/mojones/Blobsplorer.

  20. A database of immunoglobulins with integrated tools: DIGIT.

    KAUST Repository

    Chailyan, Anna

    2011-11-10

    The DIGIT (Database of ImmunoGlobulins with Integrated Tools) database (http://biocomputing.it/digit) is an integrated resource storing sequences of annotated immunoglobulin variable domains and enriched with tools for searching and analyzing them. The annotations in the database include information on the type of antigen, the respective germline sequences and on pairing information between light and heavy chains. Other annotations, such as the identification of the complementarity determining regions, assignment of their structural class and identification of mutations with respect to the germline, are computed on the fly and can also be obtained for user-submitted sequences. The system allows customized BLAST searches and automatic building of 3D models of the domains to be performed.

  1. Conserved Domain Database (CDD)

    Data.gov (United States)

    U.S. Department of Health & Human Services — CDD is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.

  2. Dealer Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The dealer reporting databases contain the primary data reported by federally permitted seafood dealers in the northeast. Electronic reporting was implemented May...

  3. RDD Databases

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This database was established to oversee documents issued in support of fishery research activities including experimental fishing permits (EFP), letters of...

  4. An Approach to Mine Textual Information From Pubmed Database

    Directory of Open Access Journals (Sweden)

    G Charles Babu

    2012-05-01

    Full Text Available The web has greatly improved access to scientific literature. A wide spectrum of research data has been created and collected by researchers. However, textual information on the web are largely disorganized, with research articles being spread across archive sites, institution sites, journal sites and researcher homepages. Data was widely available over internet and many kinds of data pose the current challenge in storage and retrieval. Datasets can be made more accessible and user-friendly through annotation, aggregation, cross-linking to other datasets. Biomedical datasets are growing exponentially and new curative information appears regularly in research publications such as MedLine, PubMed, Science Direct etc. Therefore, a context based text mining was developed using python language to search huge database such as PubMed based on a given keyword which retrieves data between specified years.

  5. National database

    DEFF Research Database (Denmark)

    Kristensen, Helen Grundtvig; Stjernø, Henrik

    1995-01-01

    Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen.......Artikel om national database for sygeplejeforskning oprettet på Dansk Institut for Sundheds- og Sygeplejeforskning. Det er målet med databasen at samle viden om forsknings- og udviklingsaktiviteter inden for sygeplejen....

  6. Can inferred provenance and its visualisation be used to detect erroneous annotation? A case study using UniProtKB.

    Directory of Open Access Journals (Sweden)

    Michael J Bell

    Full Text Available A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge; during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB, at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over [Formula: see text] sentences. Over [Formula: see text] sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately [Formula: see text] are erroneous, whilst [Formula: see text] appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website at http://homepages.cs.ncl.ac.uk/m.j.bell1/sentence_analysis/.

  7. A high-performance spatial database based approach for pathology imaging algorithm evaluation

    Directory of Open Access Journals (Sweden)

    Fusheng Wang

    2013-01-01

    Full Text Available Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS data model. Aims: (1 Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2 Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3 Develop a set of queries to support data sampling and result comparisons; (4 Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1 algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2 algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The

  8. Making web annotations persistent over time

    Energy Technology Data Exchange (ETDEWEB)

    Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

    2010-01-01

    As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.

  9. Beginning Science Teachers' Use of a Digital Video Annotation Tool to Promote Reflective Practices

    Science.gov (United States)

    McFadden, Justin; Ellis, Joshua; Anwar, Tasneem; Roehrig, Gillian

    2014-06-01

    The development of teachers as reflective practitioners is a central concept in national guidelines for teacher preparation and induction (National Council for Accreditation of Teacher Education 2008). The Teacher Induction Network (TIN) supports the development of reflective practice for beginning secondary science teachers through the creation of online "communities of practice" (Barab et al. in Inf Soc, 237-256, 2003), which have been shown to have positive impacts on teacher collaboration, communication, and reflection. Specifically, TIN integrated the use of asynchronous, video annotation as an affordance to directly facilitate teachers' reflection on their classroom practices (Tripp and Rich in Teach Teach Educ 28(5):728-739, 2013). This study examines the use of video annotation as a tool for developing reflective practices for beginning secondary science teachers. Teachers were enrolled in an online teacher induction course designed to promote reflective practice and inquiry-based instruction. A modified version of the Learning to Notice Framework (Sherin and van Es in J Teach Educ 60(1):20-37, 2009) was used to classify teachers' annotations on video of their teaching. Findings from the study include the tendency of teachers to focus on themselves in their annotations, as well as a preponderance of annotations focused on lower-level reflective practices of description and explanation. Suggestions for utilizing video annotation tools are discussed, as well as design features, which could be improved to further the development of richer annotations and deeper reflective practices.

  10. Comparative validation of the D. melanogaster modENCODE transcriptome annotation.

    Science.gov (United States)

    Chen, Zhen-Xia; Sturgill, David; Qu, Jiaxin; Jiang, Huaiyang; Park, Soo; Boley, Nathan; Suzuki, Ana Maria; Fletcher, Anthony R; Plachetzki, David C; FitzGerald, Peter C; Artieri, Carlo G; Atallah, Joel; Barmina, Olga; Brown, James B; Blankenburg, Kerstin P; Clough, Emily; Dasgupta, Abhijit; Gubbala, Sai; Han, Yi; Jayaseelan, Joy C; Kalra, Divya; Kim, Yoo-Ah; Kovar, Christie L; Lee, Sandra L; Li, Mingmei; Malley, James D; Malone, John H; Mathew, Tittu; Mattiuzzo, Nicolas R; Munidasa, Mala; Muzny, Donna M; Ongeri, Fiona; Perales, Lora; Przytycka, Teresa M; Pu, Ling-Ling; Robinson, Garrett; Thornton, Rebecca L; Saada, Nehad; Scherer, Steven E; Smith, Harold E; Vinson, Charles; Warner, Crystal B; Worley, Kim C; Wu, Yuan-Qing; Zou, Xiaoyan; Cherbas, Peter; Kellis, Manolis; Eisen, Michael B; Piano, Fabio; Kionte, Karin; Fitch, David H; Sternberg, Paul W; Cutter, Asher D; Duff, Michael O; Hoskins, Roger A; Graveley, Brenton R; Gibbs, Richard A; Bickel, Peter J; Kopp, Artyom; Carninci, Piero; Celniker, Susan E; Oliver, Brian; Richards, Stephen

    2014-07-01

    Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community. PMID:24985915

  11. An integrated database to enhance the identification of SNP markers for rice

    OpenAIRE

    Kim, ChangKug; Yoon, UngHan; Lee, GangSeob; Park, SungHan; Seol, Young-Joo; Lee, HwanKi; Hahn, JangHo

    2009-01-01

    The National Academy of Agricultural Science (NAAS) has developed a web-based marker database to provide information about SNP markers in rice. The database consists of three major functional categories: map viewing, marker searching and gene annotation. It provides 12,829 SNP markers information including gene location information on 12 chromosomes in rice. The annotation of SNP marker provides information such as marker name, EST number, gene definition and general marker information. Users...

  12. DASHR: database of small human noncoding RNAs.

    Science.gov (United States)

    Leung, Yuk Yee; Kuksa, Pavel P; Amlie-Wolf, Alexandre; Valladares, Otto; Ungar, Lyle H; Kannan, Sampath; Gregory, Brian D; Wang, Li-San

    2016-01-01

    Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR). DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for ∼ 48,000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one or more of the curated tissues. DASHR is available at http://lisanwanglab.org/DASHR.

  13. Annotating user-defined abstractions for optimization

    Energy Technology Data Exchange (ETDEWEB)

    Quinlan, D; Schordan, M; Vuduc, R; Yi, Q

    2005-12-05

    This paper discusses the features of an annotation language that we believe to be essential for optimizing user-defined abstractions. These features should capture semantics of function, data, and object-oriented abstractions, express abstraction equivalence (e.g., a class represents an array abstraction), and permit extension of traditional compiler optimizations to user-defined abstractions. Our future work will include developing a comprehensive annotation language for describing the semantics of general object-oriented abstractions, as well as automatically verifying and inferring the annotated semantics.

  14. Crowdsourcing and annotating NER for Twitter #drift

    DEFF Research Database (Denmark)

    Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

    2014-01-01

    We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...

  15. The BioGRID interaction database: 2015 update.

    Science.gov (United States)

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Oughtred, Rose; Boucher, Lorrie; Heinicke, Sven; Chen, Daici; Stark, Chris; Breitkreutz, Ashton; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Nixon, Julie; Ramage, Lindsay; Winter, Andrew; Sellam, Adnane; Chang, Christie; Hirschman, Jodi; Theesfeld, Chandra; Rust, Jennifer; Livstone, Michael S; Dolinski, Kara; Tyers, Mike

    2015-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control. PMID:25428363

  16. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly......, some program properties are beyond reach of such analysis for theoretical and practical reasons - but can be described by programmers. Three aspects are explored. The first is annotation of the source code. Two annotations are introduced. These allow more accurate modeling of parallelism...

  17. The Danish Urogynaecological Database

    DEFF Research Database (Denmark)

    Guldberg, Rikke; Brostrøm, Søren; Hansen, Jesper Kjær;

    2013-01-01

    INTRODUCTION AND HYPOTHESIS: The Danish Urogynaecological Database (DugaBase) is a nationwide clinical database established in 2006 to monitor, ensure and improve the quality of urogynaecological surgery. We aimed to describe its establishment and completeness and to validate selected variables....... This is the first study based on data from the DugaBase. METHODS: The database completeness was calculated as a comparison between urogynaecological procedures reported to the Danish National Patient Registry and to the DugaBase. Validity was assessed for selected variables from a random sample of 200 women...... in the DugaBase from 1 January 2009 to 31 October 2010, using medical records as a reference. RESULTS: A total of 16,509 urogynaecological procedures were registered in the DugaBase by 31 December 2010. The database completeness has increased by calendar time, from 38.2 % in 2007 to 93.2 % in 2010 for public...

  18. Linking human diseases to animal models using ontology-based phenotype annotation.

    Directory of Open Access Journals (Sweden)

    Nicole L Washington

    2009-11-01

    Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify

  19. Modeling Social Annotation: a Bayesian Approach

    CERN Document Server

    Plangprasopchok, Anon

    2008-01-01

    Collaborative tagging systems, such as del.icio.us, CiteULike, and others, allow users to annotate objects, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations, contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of socially-generated data, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated objects. Unfortunately, our proposed approach had a number of shortcomings, including overfitting, local maxima and the requirement to specify values for some parameters. In this paper we address these shortcomings in two ways. First, we extend the model to a fully Bayesian framework. Second, we describe an infinite ver...

  20. Annotation of Scientific Summaries for Information Retrieval

    CERN Document Server

    Ibekwe-Sanjuan, Fidelia; Eric, Sanjuan; Eric, Charton

    2011-01-01

    We present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus.

  1. SASL: A Semantic Annotation System for Literature

    Science.gov (United States)

    Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin; Jin, Hai

    Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL a good performance.

  2. The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology.

    Science.gov (United States)

    Eppig, Janan T; Bult, Carol J; Kadin, James A; Richardson, Joel E; Blake, Judith A; Anagnostopoulos, A; Baldarelli, R M; Baya, M; Beal, J S; Bello, S M; Boddy, W J; Bradt, D W; Burkart, D L; Butler, N E; Campbell, J; Cassell, M A; Corbani, L E; Cousins, S L; Dahmen, D J; Dene, H; Diehl, A D; Drabkin, H J; Frazer, K S; Frost, P; Glass, L H; Goldsmith, C W; Grant, P L; Lennon-Pierce, M; Lewis, J; Lu, I; Maltais, L J; McAndrews-Hill, M; McClellan, L; Miers, D B; Miller, L A; Ni, L; Ormsby, J E; Qi, D; Reddy, T B K; Reed, D J; Richards-Smith, B; Shaw, D R; Sinclair, R; Smith, C L; Szauter, P; Walker, M B; Walton, D O; Washburn, L L; Witham, I T; Zhu, Y

    2005-01-01

    The Mouse Genome Database (MGD) forms the core of the Mouse Genome Informatics (MGI) system (http://www.informatics.jax.org), a model organism database resource for the laboratory mouse. MGD provides essential integration of experimental knowledge for the mouse system with information annotated from both literature and online sources. MGD curates and presents consensus and experimental data representations of genotype (sequence) through phenotype information, including highly detailed reports about genes and gene products. Primary foci of integration are through representations of relationships among genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse and to build and implement the data and semantic standards that are essential for comparative genome analysis. Recent improvements in MGD discussed here include the enhancement of phenotype resources, the re-development of the International Mouse Strain Resource, IMSR, the update of mammalian orthology datasets and the electronic publication of classic books in mouse genetics.

  3. Biological Databases

    Directory of Open Access Journals (Sweden)

    Kaviena Baskaran

    2013-12-01

    Full Text Available Biology has entered a new era in distributing information based on database and this collection of database become primary in publishing information. This data publishing is done through Internet Gopher where information resources easy and affordable offered by powerful research tools. The more important thing now is the development of high quality and professionally operated electronic data publishing sites. To enhance the service and appropriate editorial and policies for electronic data publishing has been established and editors of article shoulder the responsibility.

  4. The UCSC Genome Browser database: 2016 update

    OpenAIRE

    Speir, Matthew L; Zweig, Ann S.; Rosenbloom, Kate R.; Raney, Brian J.; Paten, Benedict; Nejad, Parisa; Rowe, Laurence D.; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S.; Heitner, Steve; Harte, Rachel A.; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A.

    2015-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for vari...

  5. Learning Object Annotation for Agricultural Learning Repositories

    OpenAIRE

    Ebner, Hannes; Manouselis, Nikos; Palmér, Matthias; Enoksson, Fredrik; Palavitsinis, Nikos; Kastrantas, Kostas; Naeve, Ambjörn

    2009-01-01

    This paper introduces a Web-based tool that has been developed to facilitate learning object annotation in agricultural learning repositories with IEEE LOM-compliant metadata. More specifically, it presents how an application profile of the IEEE LOM standard has been developed for the description of learning objects on organic agriculture and agroecology. Then, it describes the design and prototype development of the Organic.Edunet repository tool: a Web-based for annotating learning objects ...

  6. Services for annotation of biomedical text

    OpenAIRE

    Hakenberg, Jörg

    2008-01-01

    Motivation: Text mining in the biomedical domain in recent years has focused on the development of tools for recognizing named entities and extracting relations. Such research resulted from the need for such tools as basic components for more advanced solutions. Named entity recognition, entity mention normalization, and relationship extraction now have reached a stage where they perform comparably to human annotators (considering inter--annotator agreement, measured in many studies to be aro...

  7. Fluid Annotations in a Open World

    DEFF Research Database (Denmark)

    Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning;

    2001-01-01

    Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment to l...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....

  8. A Flexible Object-of-Interest Annotation Framework for Online Video Portals

    Directory of Open Access Journals (Sweden)

    Robert Sorschag

    2012-02-01

    Full Text Available In this work, we address the use of object recognition techniques to annotate what is shown where in online video collections. These annotations are suitable to retrieve specific video scenes for object related text queries which is not possible with the manually generated metadata that is used by current portals. We are not the first to present object annotations that are generated with content-based analysis methods. However, the proposed framework possesses some outstanding features that offer good prospects for its application in real video portals. Firstly, it can be easily used as background module in any video environment. Secondly, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Thirdly, we present an automatic selection approach to support the use of different recognition strategies for different objects. Last but not least, visual analysis can be performed efficiently on distributed, multi-processor environments and a database schema is presented to store the resulting video annotations as well as the off-line generated low-level features in a compact form. We achieve promising results in an annotation case study and the instance search task of the TRECVID 2011 challenge.

  9. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures

    Directory of Open Access Journals (Sweden)

    Dopazo Joaquín

    2007-05-01

    Full Text Available Abstract Background Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. Description AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions. Conclusion The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/.

  10. Towards the VWO Annotation Service: a Success Story of the IMAGE RPI Expert Rating System

    Science.gov (United States)

    Reinisch, B. W.; Galkin, I. A.; Fung, S. F.; Benson, R. F.; Kozlov, A. V.; Khmyrov, G. M.; Garcia, L. N.

    2010-12-01

    . Especially useful are queries of the annotation database for successive plasmagrams containing echo traces. Several success stories of the RPI ERS using this capability will be discussed, particularly in terms of how they may be extended to develop the VWO Annotation Service.

  11. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  12. MicrobesFlux: a web platform for drafting metabolic models from the KEGG database

    Directory of Open Access Journals (Sweden)

    Feng Xueyang

    2012-08-01

    Full Text Available Abstract Background Concurrent with the efforts currently underway in mapping microbial genomes using high-throughput sequencing methods, systems biologists are building metabolic models to characterize and predict cell metabolisms. One of the key steps in building a metabolic model is using multiple databases to collect and assemble essential information about genome-annotations and the architecture of the metabolic network for a specific organism. To speed up metabolic model development for a large number of microorganisms, we need a user-friendly platform to construct metabolic networks and to perform constraint-based flux balance analysis based on genome databases and experimental results. Results We have developed a semi-automatic, web-based platform (MicrobesFlux for generating and reconstructing metabolic models for annotated microorganisms. MicrobesFlux is able to automatically download the metabolic network (including enzymatic reactions and metabolites of ~1,200 species from the KEGG database (Kyoto Encyclopedia of Genes and Genomes and then convert it to a metabolic model draft. The platform also provides diverse customized tools, such as gene knockouts and the introduction of heterologous pathways, for users to reconstruct the model network. The reconstructed metabolic network can be formulated to a constraint-based flux model to predict and analyze the carbon fluxes in microbial metabolisms. The simulation results can be exported in the SBML format (The Systems Biology Markup Language. Furthermore, we also demonstrated the platform functionalities by developing an FBA model (including 229 reactions for a recent annotated bioethanol producer, Thermoanaerobacter sp. strain X514, to predict its biomass growth and ethanol production. Conclusion MicrobesFlux is an installation-free and open-source platform that enables biologists without prior programming knowledge to develop metabolic models for annotated microorganisms in the KEGG

  13. BGD: a database of bat genomes.

    Directory of Open Access Journals (Sweden)

    Jianfei Fang

    Full Text Available Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD. BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  14. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    Science.gov (United States)

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  15. An annotation based approach to support design communication

    CERN Document Server

    Hisarciklilar, Onur

    2007-01-01

    The aim of this paper is to propose an approach based on the concept of annotation for supporting design communication. In this paper, we describe a co-operative design case study where we analyse some annotation practices, mainly focused on design minutes recorded during project reviews. We point out specific requirements concerning annotation needs. Based on these requirements, we propose an annotation model, inspired from the Speech Act Theory (SAT) to support communication in a 3D digital environment. We define two types of annotations in the engineering design context, locutionary and illocutionary annotations. The annotations we describe in this paper are materialised by a set of digital artefacts, which have a semantic dimension allowing express/record elements of technical justifications, traces of contradictory debates, etc. In this paper, we first clarify the semantic annotation concept, and we define general properties of annotations in the engineering design context, and the role of annotations in...

  16. STRUCTURED WIKI WITH ANNOTATION FOR KNOWLEDGE MANAGEMENT: AN APPLICATION TO CULTURAL HERITAGE

    Directory of Open Access Journals (Sweden)

    Eric Leclercq

    2011-01-01

    Full Text Available In this paper, we highlight how semantic wikis can be relevant solutions for building cooperative data driven applications in domains characterized by a rapid evolution of knowledge. We will point out the semantic capabilities of annotated databases and structured wikis to provide better quality of content, to support complex queries and finally to carry on different type of users. Then we compare database application development with wiki for domains that encompass evolving knowledge. We detail the architecture of WikiBridge, a semantic wiki, which integrates templates forms and allows complex annotations as well as consistency checking. We describe the archaeological CARE project, and explain the conceptual modeling approach. A specific section is dedicated to ontology design, which is the compulsory foundational knowledge for the application. We finally report related works of the semantic wiki use for archaeological projects.

  17. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases.

    Science.gov (United States)

    Xie, Chen; Mao, Xizeng; Huang, Jiaju; Ding, Yang; Wu, Jianmin; Dong, Shan; Kong, Lei; Gao, Ge; Li, Chuan-Yun; Wei, Liping

    2011-07-01

    High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.

  18. Protein function annotation by local binding site surface similarity.

    Science.gov (United States)

    Spitzer, Russell; Cleves, Ann E; Varela, Rocco; Jain, Ajay N

    2014-04-01

    Hundreds of protein crystal structures exist for proteins whose function cannot be confidently determined from sequence similarity. Surflex-PSIM, a previously reported surface-based protein similarity algorithm, provides an alternative method for hypothesizing function for such proteins. The method now supports fully automatic binding site detection and is fast enough to screen comprehensive databases of protein binding sites. The binding site detection methodology was validated on apo/holo cognate protein pairs, correctly identifying 91% of ligand binding sites in holo structures and 88% in apo structures where corresponding sites existed. For correctly detected apo binding sites, the cognate holo site was the most similar binding site 87% of the time. PSIM was used to screen a set of proteins that had poorly characterized functions at the time of crystallization, but were later biochemically annotated. Using a fully automated protocol, this set of 8 proteins was screened against ∼60,000 ligand binding sites from the PDB. PSIM correctly identified functional matches that predated query protein biochemical annotation for five out of the eight query proteins. A panel of 12 currently unannotated proteins was also screened, resulting in a large number of statistically significant binding site matches, some of which suggest likely functions for the poorly characterized proteins.

  19. TOPSAN: a collaborative annotation environment for structural genomics

    Directory of Open Access Journals (Sweden)

    Weekes Dana

    2010-08-01

    Full Text Available Abstract Background Many protein structures determined in high-throughput structural genomics centers, despite their significant novelty and importance, are available only as PDB depositions and are not accompanied by a peer-reviewed manuscript. Because of this they are not accessible by the standard tools of literature searches, remaining underutilized by the broad biological community. Results To address this issue we have developed TOPSAN, The Open Protein Structure Annotation Network, a web-based platform that combines the openness of the wiki model with the quality control of scientific communication. TOPSAN enables research collaborations and scientific dialogue among globally distributed participants, the results of which are reviewed by experts and eventually validated by peer review. The immediate goal of TOPSAN is to harness the combined experience, knowledge, and data from such collaborations in order to enhance the impact of the astonishing number and diversity of structures being determined by structural genomics centers and high-throughput structural biology. Conclusions TOPSAN combines features of automated annotation databases and formal, peer-reviewed scientific research literature, providing an ideal vehicle to bridge a gap between rapidly accumulating data from high-throughput technologies and a much slower pace for its analysis and integration with other, relevant research.

  20. Multilingual Twitter Sentiment Classification: The Role of Human Annotators.

    Directory of Open Access Journals (Sweden)

    Igor Mozetič

    Full Text Available What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive as ordered.

  1. Multilingual Twitter Sentiment Classification: The Role of Human Annotators.

    Science.gov (United States)

    Mozetič, Igor; Grčar, Miha; Smailović, Jasmina

    2016-01-01

    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered. PMID:27149621

  2. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Directory of Open Access Journals (Sweden)

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  3. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    Science.gov (United States)

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  4. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data

    Directory of Open Access Journals (Sweden)

    Nozomu Sakurai

    2014-01-01

    Full Text Available A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal, where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  5. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    Science.gov (United States)

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. PMID:24949426

  6. MOCAT2: a metagenomic assembly, annotation and profiling framework

    Science.gov (United States)

    Kultima, Jens Roat; Coelho, Luis Pedro; Forslund, Kristoffer; Huerta-Cepas, Jaime; Li, Simone S.; Driessen, Marja; Voigt, Anita Yvonne; Zeller, Georg; Sunagawa, Shinichi; Bork, Peer

    2016-01-01

    Summary: MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes. Availability and Implementation: MOCAT2 is implemented in Perl 5 and Python 2.7, designed for 64-bit UNIX systems and offers support for high-performance computer usage via LSF, PBS or SGE queuing systems; source code is freely available under the GPL3 license at http://mocat.embl.de. Contact: bork@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153620

  7. SABER: The Searchable Annotated Bibliography of Education Research in Astronomy

    Science.gov (United States)

    Bruning, David H.; Bailey, J. M.; Brissenden, G.

    2006-12-01

    Starting a new research project in astronomy education is hard because the literature is scattered throughout many journals. Relevant astronomy education research may be in psychology journals, science education journals, physics education journals, or even in science journals themselves. Tracking the vast realm of literature is difficult, especially since libraries do not carry many of these journals and related abstracting services. SABER is an online resource (http://astronomy.uwp.edu/saber/) that was started in 2001 specifically to reduce this “scatter” by compiling into one place an annotated bibliography of relevant education research articles. The database now includes more than 150 articles specifically addressing astronomy education research. Visit SABER and see what it can do for you.

  8. PANADA: protein association network annotation, determination and analysis.

    Directory of Open Access Journals (Sweden)

    Alberto J M Martin

    Full Text Available Increasingly large numbers of proteins require methods for functional annotation. This is typically based on pairwise inference from the homology of either protein sequence or structure. Recently, similarity networks have been presented to leverage both the ability to visualize relationships between proteins and assess the transferability of functional inference. Here we present PANADA, a novel toolkit for the visualization and analysis of protein similarity networks in Cytoscape. Networks can be constructed based on pairwise sequence or structural alignments either on a set of proteins or, alternatively, by database search from a single sequence. The Panada web server, executable for download and examples and extensive help files are available at URL: http://protein.bio.unipd.it/panada/.

  9. MalaCards: an integrated compendium for diseases and their annotation

    Science.gov (United States)

    Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron

    2013-01-01

    Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http

  10. ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes

    OpenAIRE

    Hu, Gang-Qing; Zheng, Xiaobin; Yang, Yi-Fan; Ortet, Philippe; She, Zhen-Su; Zhu, Huaiqiu

    2007-01-01

    Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous gene...

  11. The CUTLASS database facilities

    International Nuclear Information System (INIS)

    The enhancement of the CUTLASS database management system to provide improved facilities for data handling is seen as a prerequisite to its effective use for future power station data processing and control applications. This particularly applies to the larger projects such as AGR data processing system refurbishments, and the data processing systems required for the new Coal Fired Reference Design stations. In anticipation of the need for improved data handling facilities in CUTLASS, the CEGB established a User Sub-Group in the early 1980's to define the database facilities required by users. Following the endorsement of the resulting specification and a detailed design study, the database facilities have been implemented as an integral part of the CUTLASS system. This paper provides an introduction to the range of CUTLASS Database facilities, and emphasises the role of Database as the central facility around which future Kit 1 and (particularly) Kit 6 CUTLASS based data processing and control systems will be designed and implemented. (author)

  12. Computer database of ambulatory EEG signals.

    Science.gov (United States)

    Jayakar, P B; Brusse, E; Patrick, J P; Shwedyk, E; Seshia, S S

    1987-01-01

    The paper describes an ambulatory EEG database. The database contains segments of AEEGs done on 45 subjects. Each epoch (1/8th second or more) of AEEG data has been annotated into 1 of 40 classes. The classes represent background activity, paroxysmal patterns and artifacts. The majority of classes have over 200 discrete epochs. The structure is flexible enough to allow additional epochs to be readily added. The database is stored on transportable media such as digital magnetic tape or hard disk and is thus available to other researchers in the field. The database can be used to design, evaluate and compare EEG signal processing algorithms and pattern recognition systems. It can also serve as an educational medium in EEG laboratories.

  13. Annotated chemical patent corpus: a gold standard for text mining.

    Directory of Open Access Journals (Sweden)

    Saber A Akhondi

    Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.

  14. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  15. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Science.gov (United States)

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  16. MitBASE: a comprehensive and integrated mitochondrial DNA database

    OpenAIRE

    Antimonelli, M.; Altamura, N.; Benne, R.; Boyen, C; Brennicke, A; Carone, A; Cooper, J. M.; D'Elia, D.; Montalvo, de, A.; Pinto, de, B.; Robertis, De, M.; Golik, P.; Grienenberger, J M; Knoop, V.; Lanave, C.

    1999-01-01

    MitBASE is an integrated and comprehensive database of mitochondrial DNA data which collects all available information from different organisms and from intraspecie variants and mutants. Research institutions from different countries are involved, each in charge of developing, collecting and annotating data for the organisms they are specialised in. The design of the actual structure of the database and its implementation in a user-friendly format are the care of the European Bioinformatics I...

  17. TrSDB: a proteome database of transcription factors

    OpenAIRE

    Hermoso, Antoni; Aguilar, Daniel; Aviles, Francesc X.; Querol, Enrique

    2004-01-01

    TrSDB—TranScout Database—(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression.

  18. Generation of Comprehensive Thoracic Oncology Database - Tool for Translational Research

    OpenAIRE

    Surati, Mosmi; Robinson, Matthew; Nandi, Suvobroto; Faoro, Leonardo; Demchuk, Carley; Kanteti, Rajani; Ferguson, Benjamin; Gangadhar, Tara; Hensing, Thomas; Hasina, Rifat; Husain, Aliya; Ferguson, Mark; Karrison, Theodore; Salgia, Ravi

    2011-01-01

    The Thoracic Oncology Program Database Project was created to serve as a comprehensive, verified, and accessible repository for well-annotated cancer specimens and clinical data to be available to researchers within the Thoracic Oncology Research Program. This database also captures a large volume of genomic and proteomic data obtained from various tumor tissue studies. A team of clinical and basic science researchers, a biostatistician, and a bioinformatics expert was convened to design the ...

  19. Lynx: a database and knowledge extraction engine for integrative medicine

    OpenAIRE

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

    2013-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provid...

  20. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    OpenAIRE

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2014-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integra...

  1. A Neotropical Miocene Pollen Database Employing Image-Based Search and Semantic Modeling

    Directory of Open Access Journals (Sweden)

    Jing Ginger Han

    2014-08-01

    Full Text Available Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery.

  2. Recent International Documents and Journal Articles from the ERIC Database.

    Science.gov (United States)

    International Journal of Early Years Education, 1998

    1998-01-01

    Annotates recent international documents and journal articles from the ERIC database. Document topics include racial equality, and balancing early childhood education and work. Journal article topics include foster care in Iraqi Kurdistan; child care in Sweden; teacher-child interaction in Australian centers; teacher education in Brazil, Iceland,…

  3. Mathematical Notation in Bibliographic Databases.

    Science.gov (United States)

    Pasterczyk, Catherine E.

    1990-01-01

    Discusses ways in which using mathematical symbols to search online bibliographic databases in scientific and technical areas can improve search results. The representations used for Greek letters, relations, binary operators, arrows, and miscellaneous special symbols in the MathSci, Inspec, Compendex, and Chemical Abstracts databases are…

  4. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available BACKGROUND: Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. RESULTS: Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. CONCLUSION: The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  5. The impact of blood transfusion on perioperative outcomes following gastric cancer resection: an analysis of the American College of Surgeons National Surgical Quality Improvement Program database

    Science.gov (United States)

    Elmi, Maryam; Mahar, Alyson; Kagedan, Daniel; Law, Calvin H.L.; Karanicolas, Paul J.; Lin, Yulia; Callum, Jeannie; Coburn, Natalie G.; Hallet, Julie

    2016-01-01

    Background Red blood cell transfusions (RBCT) carry risk of transfusion-related immunodulation that may impact postoperative recovery. This study examined the association between perioperative RBCT and short-term postoperative outcomes following gastrectomy for gastric cancer. Methods Using the American College of Surgeons National Surgical Quality Improvement Program database, we compared outcomes of patients (transfused v. nontransfused) undergoing elective gastrectomy for gastric cancer (2007–2012). Outcomes were 30-day major morbidity, mortality and length of stay. The association between perioperative RBCT and outcomes was estimated using modified Poisson, logistic, or negative binomial regression. Results Of the 3243 patients in the entire cohort, we included 2884 patients with nonmissing data, of whom 535 (18.6%) received RBCT. Overall 30-day major morbidity and mortality were 20% and 3.5%, respectively. After adjustment for baseline and clinical characteristics, RBCT was independently associated with increased 30-day mortality (relative risk [RR] 3.1, 95% confidence interval [CI] 1.9–5.0), major morbidity (RR 1.4, 95% CI 1.2–1.8), length of stay (RR 1.2, 95% CI 1.1–1.2), infections (RR 1.4, 95% CI 1.1–1.6), cardiac complications (RR 1.8, 95% CI 1.0–3.2) and respiratory failure (RR 2.3, 95% CI 1.6–3.3). Conclusion Red blood cell transfusions are associated with worse postoperative short-term outcomes in patients with gastric cancer. Blood management strategies are needed to reduce the use of RBCT after gastrectomy for gastric cancer. PMID:27668330

  6. Optimizing Spatial Databases

    Directory of Open Access Journals (Sweden)

    Anda VELICANU

    2010-01-01

    Full Text Available This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.

  7. Database Vs Data Warehouse

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available Data warehouse technology includes a set of concepts and methods that offer the users useful information for decision making. The necessity to build a data warehouse arises from the necessity to improve the quality of information in the organization. The date proceeding from different sources, having a variety of forms - both structured and unstructured, are filtered according to business rules and are integrated in a single large data collection. Using informatics solutions, managers have understood that data stored in operational systems - including databases, are an informational gold mine that must be exploited. Data warehouses have been developed to answer the increasing demands for complex analysis, which could not be properly achieved with operational databases. The present paper emphasizes some of the criteria that information application developers can use in order to choose between a database solution or a data warehouse one.

  8. dcGOR: an R package for analysing ontologies and protein domain annotations.

    Directory of Open Access Journals (Sweden)

    Hai Fang

    2014-10-01

    Full Text Available I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i domain-based enrichment analysis and visualisation; (ii construction of a domain (semantic similarity network according to ontology annotations; and (iii significance analysis for estimating a contact (statistical significance network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro and RNAs (from Rfam as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

  9. Critical Assessment of Function Annotation Meeting, 2011

    Energy Technology Data Exchange (ETDEWEB)

    Friedberg, Iddo

    2015-01-21

    The Critical Assessment of Function Annotation meeting was held July 14-15, 2011 at the Austria Conference Center in Vienna, Austria. There were 73 registered delegates at the meeting. We thank the DOE for this award. It helped us organize and support a scientific meeting AFP 2011 as a special interest group (SIG) meeting associated with the ISMB 2011 conference. The conference was held in Vienna, Austria, in July 2011. The AFP SIG was held on July 15-16, 2011 (immediately preceding the conference). The meeting consisted of two components, the first being a series of talks (invited and contributed) and discussion sections dedicated to protein function research, with an emphasis on the theory and practice of computational methods utilized in functional annotation. The second component provided a large-scale assessment of computational methods through participation in the Critical Assessment of Functional Annotation (CAFA).

  10. Annotation of selection strengths in viral genomes

    DEFF Research Database (Denmark)

    McCauley, Stephen; de Groot, Saskia; Mailund, Thomas;

    2007-01-01

    - and intergenomic regions. The presence of multiple coding regions complicates the concept of Ka/Ks ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley & Hein (2006), we develop a method for annotating a viral genome coding in overlapping...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...... obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag...

  11. Graph Annotations in Modeling Complex Network Topologies

    CERN Document Server

    Dimitropoulos, Xenofontas; Vahdat, Amin; Riley, George

    2007-01-01

    The coarsest approximation of the structure of a complex network, such as the Internet, is a simple undirected unweighted graph. This approximation, however, loses too much detail. In reality, objects represented by vertices and edges in such a graph possess some non-trivial internal structure that varies across and differentiates among distinct types of links or nodes. In this work, we abstract such additional information as network annotations. We introduce a network topology modeling framework that treats annotations as an extended correlation profile of a network. Assuming we have this profile measured for a given network, we present an algorithm to rescale it in order to construct networks of varying size that still reproduce the original measured annotation profile. Using this methodology, we accurately capture the network properties essential for realistic simulations of network applications and protocols, or any other simulations involving complex network topologies, including modeling and simulation ...

  12. I2Cnet medical image annotation service.

    Science.gov (United States)

    Chronaki, C E; Zabulis, X; Orphanoudakis, S C

    1997-01-01

    I2Cnet (Image Indexing by Content network) aims to provide services related to the content-based management of images in healthcare over the World-Wide Web. Each I2Cnet server maintains an autonomous repository of medical images and related information. The annotation service of I2Cnet allows specialists to interact with the contents of the repository, adding comments or illustrations to medical images of interest. I2Cnet annotations may be communicated to other users via e-mail or posted to I2Cnet for inclusion in its local repositories. This paper discusses the annotation service of I2Cnet and argues that such services pave the way towards the evolution of active digital medical image libraries.

  13. Corpus annotation for mining biomedical events from literature

    Directory of Open Access Journals (Sweden)

    Tsujii Jun'ichi

    2008-01-01

    Full Text Available Abstract Background Advanced Text Mining (TM such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech, syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1 to design a scheme of annotation which meets specific requirements of text annotation, (2 to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3 to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing-based TM in the bio-medical domain.

  14. The De Novo Transcriptome and Its Functional Annotation in the Seed Beetle Callosobruchus maculatus.

    Science.gov (United States)

    Sayadi, Ahmed; Immonen, Elina; Bayram, Helen; Arnqvist, Göran

    2016-01-01

    Despite their unparalleled biodiversity, the genomic resources available for beetles (Coleoptera) remain relatively scarce. We present an integrative and high quality annotated transcriptome of the beetle Callosobruchus maculatus, an important and cosmopolitan agricultural pest as well as an emerging model species in ecology and evolutionary biology. Using Illumina sequencing technology, we sequenced 492 million read pairs generated from 51 samples of different developmental stages (larvae, pupae and adults) of C. maculatus. Reads were de novo assembled using the Trinity software, into a single combined assembly as well as into three separate assemblies based on data from the different developmental stages. The combined assembly generated 218,192 transcripts and 145,883 putative genes. Putative genes were annotated with the Blast2GO software and the Trinotate pipeline. In total, 33,216 putative genes were successfully annotated using Blastx against the Nr (non-redundant) database and 13,382 were assigned to 34,100 Gene Ontology (GO) terms. We classified 5,475 putative genes into Clusters of Orthologous Groups (COG) and 116 metabolic pathways maps were predicted based on the annotation. Our analyses suggested that the transcriptional specificity increases with ontogeny. For example, out of 33,216 annotated putative genes, 51 were only expressed in larvae, 63 only in pupae and 171 only in adults. Our study illustrates the importance of including samples from several developmental stages when the aim is to provide an integrative and high quality annotated transcriptome. Our results will represent an invaluable resource for those working with the ecology, evolution and pest control of C. maculatus, as well for comparative studies of the transcriptomics and genomics of beetles more generally. PMID:27442123

  15. Versatile annotation and publication quality visualization of protein complexes using POLYVIEW-3D

    Directory of Open Access Journals (Sweden)

    Meller Jaroslaw

    2007-08-01

    Full Text Available Abstract Background Macromolecular visualization as well as automated structural and functional annotation tools play an increasingly important role in the post-genomic era, contributing significantly towards the understanding of molecular systems and processes. For example, three dimensional (3D models help in exploring protein active sites and functional hot spots that can be targeted in drug design. Automated annotation and visualization pipelines can also reveal other functionally important attributes of macromolecules. These goals are dependent on the availability of advanced tools that integrate better the existing databases, annotation servers and other resources with state-of-the-art rendering programs. Results We present a new tool for protein structure analysis, with the focus on annotation and visualization of protein complexes, which is an extension of our previously developed POLYVIEW web server. By integrating the web technology with state-of-the-art software for macromolecular visualization, such as the PyMol program, POLYVIEW-3D enables combining versatile structural and functional annotations with a simple web-based interface for creating publication quality structure rendering, as well as animated images for Powerpoint™, web sites and other electronic resources. The service is platform independent and no plug-ins are required. Several examples of how POLYVIEW-3D can be used for structural and functional analysis in the context of protein-protein interactions are presented to illustrate the available annotation options. Conclusion POLYVIEW-3D server features the PyMol image rendering that provides detailed and high quality presentation of macromolecular structures, with an easy to use web-based interface. POLYVIEW-3D also provides a wide array of options for automated structural and functional analysis of proteins and their complexes. Thus, the POLYVIEW-3D server may become an important resource for researches and educators in

  16. Subsidized optimal ART for HIV-positive temporary residents of Australia improves virological outcomes: results from the Australian HIV Observational Database Temporary Residents Access Study

    Directory of Open Access Journals (Sweden)

    Kathy Petoumenos

    2015-02-01

    Full Text Available Introduction: HIV-positive (HIV+ temporary residents living in Australia legally are unable to access government subsidized antiretroviral treatment (ART which is provided via Medicare to Australian citizens and permanent residents. Currently, there is no information systematically being collected on non-Medicare eligible HIV+ patients in Australia. The objectives of this study are to describe the population recruited to the Australian HIV Observational Database (AHOD Temporary Residents Access Study (ATRAS and to determine the short- and long-term outcomes of receiving (subsidized optimal ART and the impact on onwards HIV transmission. Methods: ATRAS was established in 2011. Eligible patients were recruited via the AHOD network. Key HIV-related characteristics were recorded at baseline and prospectively. Additional visa-related information was also recorded at baseline, and updated annually. Descriptive statistics were used to describe the ATRAS cohort in terms of visa status by key demographic characteristics, including sex, region of birth, and HIV disease status. CD4 cell count (mean and SD and the proportion with undetectable (<50 copies/ml HIV viral load are reported at baseline, 6 and 12 months of follow-up. We also estimate the proportion reduction of onward HIV transmission based on the reduction in proportion of people with detectable HIV viral load. Results: A total of 180 patients were recruited to ATRAS by June 2012, and by July 2013 39 patients no longer required ART via ATRAS, 35 of whom became eligible for Medicare-funded medication. At enrolment, 63% of ATRAS patients were receiving ART from alternative sources, 47% had an undetectable HIV viral load (<50 copies/ml and the median CD4 cell count was 343 cells/µl (IQR: 222–479. At 12 months of follow-up, 85% had an undetectable viral load. We estimated a 75% reduction in the risk of onward HIV transmission with the improved rate of undetectable viral load. Conclusions: The

  17. Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

    Directory of Open Access Journals (Sweden)

    Vonstein Veronika

    2010-06-01

    Full Text Available Abstract Background The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. Results The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. Conclusions We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  18. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  19. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  20. Annotation for information extraction from mammography reports.

    Science.gov (United States)

    Bozkurt, Selen; Gulkesen, Kemal Hakan; Rubin, Daniel

    2013-01-01

    Inter and intra-observer variability in mammographic interpretation is a challenging problem, and decision support systems (DSS) may be helpful to reduce variation in practice. Since radiology reports are created as unstructured text reports, Natural language processing (NLP) techniques are needed to extract structured information from reports in order to provide the inputs to DSS. Before creating NLP systems, producing high quality annotated data set is essential. The goal of this project is to develop an annotation schema to guide the information extraction tasks needed from free-text mammography reports. PMID:23823416

  1. SymGRASS: a database of sugarcane orthologous genes involved in arbuscular mycorrhiza and root nodule symbiosis

    Science.gov (United States)

    2013-01-01

    Background The rationale for gathering information from plants procuring nitrogen through symbiotic interactions controlled by a common genetic program for a sustainable biofuel production is the high energy demanding application of synthetic nitrogen fertilizers. We curated sequence information publicly available for the biofuel plant sugarcane, performed an analysis of the common SYM pathway known to control symbiosis in other plants, and provide results, sequences and literature links as an online database. Methods Sugarcane sequences and informations were downloaded from the nucEST database, cleaned and trimmed with seqclean, assembled with TGICL plus translating mapping method, and annotated. The annotation is based on BLAST searches against a local formatted plant Uniprot90 generated with CD-HIT for functional assignment, rpsBLAST to CDD database for conserved domain analysis, and BLAST search to sorghum's for Gene Ontology (GO) assignment. Gene expression was normalized according the Unigene standard, presented as ESTs/100 kb. Protein sequences known in the SYM pathway were used as queries to search the SymGRASS sequence database. Additionally, antimicrobial peptides described in the PhytAMP database served as queries to retrieve and generate expression profiles of these defense genes in the libraries compared to the libraries obtained under symbiotic interactions. Results We describe the SymGRASS, a database of sugarcane orthologous genes involved in arbuscular mycorrhiza (AM) and root nodule (RN) symbiosis. The database aggregates knowledge about sequences, tissues, organ, developmental stages and experimental conditions, and provides annotation and level of gene expression for sugarcane transcripts and SYM orthologous genes in sugarcane through a web interface. Several candidate genes were found for all nodes in the pathway, and interestingly a set of symbiosis specific genes was found. Conclusions The knowledge integrated in SymGRASS may guide studies on

  2. Benchmarking database performance for genomic data.

    Science.gov (United States)

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc.

  3. Solar Tutorial and Annotation Resource (STAR)

    Science.gov (United States)

    Showalter, C.; Rex, R.; Hurlburt, N. E.; Zita, E. J.

    2009-12-01

    We have written a software suite designed to facilitate solar data analysis by scientists, students, and the public, anticipating enormous datasets from future instruments. Our “STAR" suite includes an interactive learning section explaining 15 classes of solar events. Users learn software tools that exploit humans’ superior ability (over computers) to identify many events. Annotation tools include time slice generation to quantify loop oscillations, the interpolation of event shapes using natural cubic splines (for loops, sigmoids, and filaments) and closed cubic splines (for coronal holes). Learning these tools in an environment where examples are provided prepares new users to comfortably utilize annotation software with new data. Upon completion of our tutorial, users are presented with media of various solar events and asked to identify and annotate the images, to test their mastery of the system. Goals of the project include public input into the data analysis of very large datasets from future solar satellites, and increased public interest and knowledge about the Sun. In 2010, the Solar Dynamics Observatory (SDO) will be launched into orbit. SDO’s advancements in solar telescope technology will generate a terabyte per day of high-quality data, requiring innovation in data management. While major projects develop automated feature recognition software, so that computers can complete much of the initial event tagging and analysis, still, that software cannot annotate features such as sigmoids, coronal magnetic loops, coronal dimming, etc., due to large amounts of data concentrated in relatively small areas. Previously, solar physicists manually annotated these features, but with the imminent influx of data it is unrealistic to expect specialized researchers to examine every image that computers cannot fully process. A new approach is needed to efficiently process these data. Providing analysis tools and data access to students and the public have proven

  4. The Beltsville Human Nutrition Research Center's Porcine Immunology and Nutrition Resource Database

    Science.gov (United States)

    Several diverse genomics-based databases have developed to facilitate research with human and rodent models. Current porcine gene databases, however, lack the nutritional and immunological orientation and robust annotation to design effective molecular tools to study relevant pig models. To address ...

  5. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Science.gov (United States)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  6. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes

    DEFF Research Database (Denmark)

    Santos Delgado, Alberto; Wernersson, Rasmus; Jensen, Lars Juhl

    2015-01-01

    3.0, we have updated the content of the database to reflect changes to genome annotation, added new mRNAand protein expression data, and integrated cell-cycle phenotype information from high-content screens and model-organism databases. The new version of Cyclebase also features a new web interface...

  7. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

    Directory of Open Access Journals (Sweden)

    Bányai László

    2008-08-01

    Full Text Available Abstract Background Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii co-occurrence of extracellular and nuclear domains; (iv violation of domain integrity; (v chimeras encoded by two or more genes located on different chromosomes. Results Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis and two protostome species (Caenorhabditis elegans and Drosophila melanogaster have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON

  8. Ranking Biomedical Annotations with Annotator’s Semantic Relevancy

    Directory of Open Access Journals (Sweden)

    Aihua Wu

    2014-01-01

    Full Text Available Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator’s knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user’s vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large.

  9. Reactome: a database of reactions, pathways and biological processes.

    Science.gov (United States)

    Croft, David; O'Kelly, Gavin; Wu, Guanming; Haw, Robin; Gillespie, Marc; Matthews, Lisa; Caudy, Michael; Garapati, Phani; Gopinath, Gopal; Jassal, Bijay; Jupe, Steven; Kalatskaya, Irina; Mahajan, Shahana; May, Bruce; Ndegwa, Nelson; Schmidt, Esther; Shamovsky, Veronica; Yung, Christina; Birney, Ewan; Hermjakob, Henning; D'Eustachio, Peter; Stein, Lincoln

    2011-01-01

    Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice. PMID:21067998

  10. The Pfam protein families database: towards a more sustainable future.

    Science.gov (United States)

    Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex

    2016-01-01

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

  11. Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics

    Directory of Open Access Journals (Sweden)

    Dumontier Michel

    2011-07-01

    Full Text Available Abstract Background The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality. Results As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of

  12. Argo: enabling the development of bespoke workflows and services for disease annotation.

    Science.gov (United States)

    Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

    2016-01-01

    Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models

  13. Bibliografia de Aztlan: An Annotated Chicano Bibliography.

    Science.gov (United States)

    Barrios, Ernie, Ed.

    More than 300 books and articles published from 1920 to 1971 are reviewed in this annotated bibliography of literature on the Chicano. The citations and reviews are categorized by subject area and deal with contemporary Chicano history, education, health, history of Mexico, literature, native Americans, philosophy, political science, pre-Columbian…

  14. Annotated bibliography of psychomotor testing. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Ervin, C.

    1987-03-01

    An annotated bibliography of 67 publications in the field of psychomotor testing has been prepared. The collection includes technical reports, journal articles, presented at scientific meetings, books and conference proceedings. The publications were assembled as preliminary work in the development of a dexterity test battery designed to measure the effects of chemical-defense-treatment drugs.

  15. Annotated Bibliography of EDGE2D Use

    International Nuclear Information System (INIS)

    This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables

  16. Genotyping and annotation of Affymetrix SNP arrays

    DEFF Research Database (Denmark)

    Lamy, Philippe; Andersen, Claus Lindbjerg; Wikman, Friedrik;

    2006-01-01

    allows us to annotate SNPs that have poor performance, either because of poor experimental conditions or because for one of the alleles the probes do not behave in a dose-response manner. Generally, our method agrees well with a method developed by Affymetrix. When both methods make a call they agree...

  17. Organizational and Intercultural Communication: An Annotated Bibliography.

    Science.gov (United States)

    Constantinides, Helen; St. Amant, Kirk; Kampf, Connie

    2001-01-01

    Presents a 27-item annotated bibliography that overviews theories of organization from the viewpoint of culture, using five themes of organizational research as a framework. Notes that each section introduces specific theories of international, intercultural, or organizational communication, building upon them through a series of related articles,…

  18. Statistical mechanics of ontology based annotations

    CERN Document Server

    Hoyle, David C

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotate...

  19. Statistical mechanics of ontology based annotations

    Science.gov (United States)

    Hoyle, David C.; Brass, Andrew

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotated increases. In doing so we provide a further possible measure for assessment of ontologies.

  20. La Mujer Chicana: An Annotated Bibliography, 1976.

    Science.gov (United States)

    Chapa, Evey, Ed.; And Others

    Intended to provide interested persons, researchers, and educators with information about "la mujer Chicana", this annotated bibliography cites 320 materials published between 1916 and 1975, with the majority being between 1960 and 1975. The 12 sections cover the following subject areas: Chicana publications; Chicana feminism and "el movimiento";…

  1. Annotated Bibliography of EDGE2D Use

    Energy Technology Data Exchange (ETDEWEB)

    J.D. Strachan and G. Corrigan

    2005-06-24

    This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables.

  2. An Annotated Publications List on Homelessness.

    Science.gov (United States)

    Tutunjian, Beth Ann

    This annotated publications list on homelessness contains citations for 19 publications, most of which deal with problems of alcohol or drug abuse among homeless persons. Citations are listed alphabetically by author and cover the topics of homelessness and alcoholism, drug abuse, public policy, research methodologies, mental illness, alcohol- and…

  3. Teleconferencing, an annotated bibliography, volume 3

    Science.gov (United States)

    Shervis, K.

    1971-01-01

    In this annotated and indexed listing of works on teleconferencing, emphasis has been placed upon teleconferencing as real-time, two way audio communication with or without visual aids. However, works on the use of television in two-way or multiway nets, data transmission, regional communications networks and on telecommunications in general are also included.

  4. Postsecondary Peer Cooperative Learning Programs: Annotated Bibliography

    Science.gov (United States)

    Arendale, David R., Comp.

    2005-01-01

    Purpose: This annotated bibliography is focused intentionally on postsecondary peer cooperative learning programs that increasing student achievement. Peer learning has been popular in education for decades. As both a pedagogy and learning strategy, it has been frequently adapted for a wide range of academic content areas at the elementary,…

  5. Small Group Communication: An Annotated Bibliography.

    Science.gov (United States)

    Gouran, Dennis S.; Guadagnino, Christopher S.

    This annotated bibliography includes sources of information that are primarily concerned with problem solving, decision making, and processes of social influence in small groups, and secondarily deal with other aspects of communication and interaction in groups, such as conflict management and negotiation. The 57 entries, all dating from 1980…

  6. Ludwig von Mises: An Annotated Bibliography.

    Science.gov (United States)

    Gordon, David

    A 117-item annotated bibliography of books, articles, essays, lectures, and reviews by economist Ludwig von Mises is presented. The bibliography is arranged chronologicaly, and is followed by an alphabetical listing of the citations, excluding books. An index and information on the Ludwig von Mises Institute at Auburn University (Alabama) are…

  7. Kwanzaa: A Selective Annotated Bibliography for Teachers.

    Science.gov (United States)

    Dupree, Sandra K., Comp.; Gillum, Holly A., Comp.

    This annotated bibliography about Kwanzaa, an end-of-the-year holiday that emphasizes an appreciation for the culture of African Americans, aims to provide ready access to information for classroom teachers. Noting that Kwanzaa (celebrated from December 26 to January 1) is an important cultural event, the bibliography states that the festival…

  8. DNAVis: interactive visualization of comparative genome annotations

    NARCIS (Netherlands)

    Fiers, M.W.E.J.; Wetering, van de H.; Peeters, T.H.J.M.; Wijk, van J.J.; Nap, J.P.H.

    2006-01-01

    The software package DNAVis offers a fast, interactive and real-time visualization of DNA sequences and their comparative genome annotations. DNAVis implements advanced methods of information visualization such as linked views, perspective walls and semantic zooming, in addition to the display of he

  9. Semantic Annotation to Support Automatic Taxonomy Classification

    DEFF Research Database (Denmark)

    Kim, Sanghee; Ahmed, Saeema; Wallace, Ken

    2006-01-01

    This paper presents a new taxonomy classification method that generates classification criteria from a small number of important sentences identified through semantic annotations, e.g. cause-effect. Rhetorical Structure Theory (RST) is used to discover the semantics (Mann et al. 1988). Specifically...

  10. Skin Cancer Education Materials: Selected Annotations.

    Science.gov (United States)

    National Cancer Inst. (NIH), Bethesda, MD.

    This annotated bibliography presents 85 entries on a variety of approaches to cancer education. The entries are grouped under three broad headings, two of which contain smaller sub-divisions. The first heading, Public Education, contains prevention and general information, and non-print materials. The second heading, Professional Education,…

  11. FragKB: structural and literature annotation resource of conserved peptide fragments and residues.

    Directory of Open Access Journals (Sweden)

    Ashish V Tendulkar

    Full Text Available BACKGROUND: FragKB (Fragment Knowledgebase is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.

  12. A web-based platform for rice microarray annotation and data analysis

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Rice(Oryza sativa) feeds over half of the global population.A web-based integrated platform for rice microarray annotation and data analysis in various biological contexts is presented,which provides a convenient query for comprehensive annotation compared with similar databases.Coupled with existing rice microarray data,it provides online analysis methods from the perspective of bioinformatics.This comprehensive bioinformatics analysis platform is composed of five modules,including data retrieval,microarray annotation,sequence analysis,results visualization and data analysis.The BioChip module facilitates the retrieval of microarray data information via identifiers of "Probe Set ID","Locus ID" and "Analysis Name".The BioAnno module is used to annotate the gene or probe set based on the gene function,the domain information,the KEGG biochemical and regulatory pathways and the potential microRNA which regulates the genes.The BioSeq module lists all of the related sequence information by a microarray probe set.The BioView module provides various visual results for the microarray data.The BioAnaly module is used to analyze the rice microarray’s data set.

  13. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  14. Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

    Directory of Open Access Journals (Sweden)

    Anika Oellrich

    Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content

  15. OligoRAP - an Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity

    NARCIS (Netherlands)

    P.B.T. Neerincx; H. Rauwerda; H. Nie; M.A.M. Groenen; T.M. Breit; J.A.M. Leunissen

    2008-01-01

    Background: High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced,

  16. The Saccharomyces Genome Database Variant Viewer.

    Science.gov (United States)

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  17. Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics.

    Science.gov (United States)

    Fei, Zhangjun; Joung, Je-Gun; Tang, Xuemei; Zheng, Yi; Huang, Mingyun; Lee, Je Min; McQuinn, Ryan; Tieman, Denise M; Alba, Rob; Klee, Harry J; Giovannoni, James J

    2011-01-01

    Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu. PMID:20965973

  18. OAHG: an integrated resource for annotating human genes with multi-level ontologies

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-01-01

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e–16). PMID:27703231

  19. Ontology Learning and Semantic Annotation: a Necessary Symbiosis

    OpenAIRE

    Giovannetti, Emiliano; Marchi, Simone; Montemagni, Simonetta; Bartolini, Roberto

    2008-01-01

    Semantic annotation of text requires the dynamic merging of linguistically structured information and a ?world model?, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain-ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping paradox requires an incremental process of annotation-acquisition-annotation, whereby domain-specific...

  20. AnnaBot: A Static Verifier for Java Annotation Usage

    OpenAIRE

    Ian Darwin

    2010-01-01

    This paper describes AnnaBot, one of the first tools to verify correct use of Annotation-based metadata in the Java programming language. These Annotations are a standard Java 5 mechanism used to attach metadata to types, methods, or fields without using an external configuration file. A binary representation of the Annotation becomes part of the compiled “.class” file, for inspection by another component or library at runtime. Java Annotations were introduced into the Java language in 2004 a...