literature annotation resource: Topics by WorldWideScience.org

Sample records for literature annotation resource

FragKB: structural and literature annotation resource of conserved peptide fragments and residues.

Directory of Open Access Journals (Sweden)

Ashish V Tendulkar

Full Text Available BACKGROUND: FragKB (Fragment Knowledgebase is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.
Annotated bibliography of cultural resources literature for the Nevada Nuclear Waste Storage Investigations

International Nuclear Information System (INIS)

1983-11-01

This annotated bibliography of the cultural resources literature pertinent for the Nevada Nuclear Waste Storage Investigations was assembled in order to (1) identify and evaluate the prehistoric and historic properties previously recorded in the Nevada Nuclear Waste Storage Investigations Project Area of southern Nye County, Nevada, (2) identify and develop research problems that have been and/or could be addressed by the cultural resources of this area, (3) isolate factors that might be important in the selection of a potential locality for a high level nuclear waste repository in the project area, and (4) critically evaluate the adequacy and current status of cultural resources knowledge in the project area. 195 references
A literature-based approach to annotation and browsing of Web resources

Directory of Open Access Journals (Sweden)

Miguel A. Sicilia

2003-01-01

Full Text Available The emerging Semantic Web technologies critically depend on the availability of shared knowledge representations called ontologies, which are intended to encode consensual knowledge about specific domains. Currently, the proposed processes for building and maintaining those ontologies entail the joint effort of groups of representative domain experts, which can be expensive in terms of co-ordination and in terms of time to reach consensus.In this paper, literature-based ontologies, which can be initially developed by a single expert and maintained continuously, are proposed as preliminary alternatives to group-generated domain ontologies, or as early versions for them. These ontologies encode domain knowledge in the form of terms and relations along with the (formal or informal bibliographical resources that define or deal with them, which makes them specially useful for domains in which a common terminology or jargon is not soundly established. A general-purpose metamodelling framework for literature-based ontologies - which has been used in two concrete domains - is described, along with a proposed methodology and a specific resource annotation approach. In addition, the implementation of an RDF-based Web resource browser - that uses the ontologies to guide the user in the exploration of a corpus of digital resources- is presented as a proof of concept.
Annotated Bibliography of Children's Literature Resources on War, Terrorism, and Disaster since 1945: By Continents/Countries for Grades K-8

Science.gov (United States)

Gangi, Jane M.

2009-01-01

This article presents an annotated bibliography of children's literature resources on war, terrorism, and disaster since 1945. This annotated bibliography focuses on grades K-8 from different continents/countries, including (1) Africa; (2) Asia; (3) The Caribbean; (4) Central and South America; (5) Europe; and (6) The Middle East.
Partnerships panel: natural, resource partnerships: literature synthesis and research agenda

Science.gov (United States)

Steve Selin; Nancy Myers

1995-01-01

This paper presents a summary of an annotated bibliography on natural resource partnerships. Resource areas and management functions addressed in the partnership literature are examined. Partnership research is summarized and broken into categories including: Partnership outcomes, assessing the potential for partnerships, characteristics of successful partnerships,...
Annotating the biomedical literature for the human variome.

Science.gov (United States)

Verspoor, Karin; Jimeno Yepes, Antonio; Cavedon, Lawrence; McIntosh, Tara; Herten-Crabb, Asha; Thomas, Zoë; Plazzer, John-Paul

2013-01-01

This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.
PCAS – a precomputed proteome annotation database resource

Directory of Open Access Journals (Sweden)

Luo Jingchu

2003-11-01

Full Text Available Abstract Background Many model proteomes or "complete" sets of proteins of given organisms are now publicly available. Much effort has been invested in computational annotation of those "draft" proteomes. Motif or domain based algorithms play a pivotal role in functional classification of proteins. Employing most available computational algorithms, mainly motif or domain recognition algorithms, we set up to develop an online proteome annotation system with integrated proteome annotation data to complement existing resources. Results We report here the development of PCAS (ProteinCentric Annotation System as an online resource of pre-computed proteome annotation data. We applied most available motif or domain databases and their analysis methods, including hmmpfam search of HMMs in Pfam, SMART and TIGRFAM, RPS-PSIBLAST search of PSSMs in CDD, pfscan of PROSITE patterns and profiles, as well as PSI-BLAST search of SUPERFAMILY PSSMs. In addition, signal peptide and TM are predicted using SignalP and TMHMM respectively. We mapped SUPERFAMILY and COGs to InterPro, so the motif or domain databases are integrated through InterPro. PCAS displays table summaries of pre-computed data and a graphical presentation of motifs or domains relative to the protein. As of now, PCAS contains human IPI, mouse IPI, and rat IPI, A. thaliana, C. elegans, D. melanogaster, S. cerevisiae, and S. pombe proteome. PCAS is available at http://pak.cbi.pku.edu.cn/proteome/gca.php Conclusion PCAS gives better annotation coverage for model proteomes by employing a wider collection of available algorithms. Besides presenting the most confident annotation data, PCAS also allows customized query so users can inspect statistically less significant boundary information as well. Therefore, besides providing general annotation information, PCAS could be used as a discovery platform. We plan to update PCAS twice a year. We will upgrade PCAS when new proteome annotation algorithms
SAS- Semantic Annotation Service for Geoscience resources on the web

Science.gov (United States)

Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

2015-12-01

There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
Resources for Achieving Sex Equity: An Annotated Bibliography.

Science.gov (United States)

Miller, Susan W., Comp.

This annotated bibliography provides a list of resources dealing with sex equity in vocational education. The bibliography first provides operational definitions of "sexism,""sex fair,""sex affirmative,""sex bias," and "affirmative action." It then lists resources under the following topics and/or bibliographic forms: (1) sex role definition, (2)…
Annotating gene sets by mining large literature collections with protein networks.

Science.gov (United States)

Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

2018-01-01

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
Hydrology and water resources overview for the Nevada Nuclear Waste Storage Investigations, Nevada Test Site, Nye County, Nevada: annotated bibliography

International Nuclear Information System (INIS)

French, R.H.; Elzeftawy, A.; Elliot, B.

1984-06-01

The literature available regarding hydrology and utilization of water resources in the southwestern Nevada Test Site area is reviewed. In the context of this annotated bibliography, hydrology is defined to include hydrometeorology, surface water resources, and groundwater resources. Water utilization includes water supply, demand and use; future supply, demand and use; and wastewater treatment and disposal. The bibliography is arranged in alphabetical order and indexed with both technical key words and geographical key words
MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

Directory of Open Access Journals (Sweden)

Gustavo Arango-Argoty

Full Text Available Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/, which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.
MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

Science.gov (United States)

Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

2016-01-01

Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.
MetaStorm: A Public Resource for Customizable Metagenomics Annotation

Science.gov (United States)

Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

2016-01-01

Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579
Rural Development Literature 1976-1977: An Updated Annotated Bibliography.

Science.gov (United States)

Buzzard, Shirley, Comp.

More than 100 books and articles on rural development published during 1976-77 are annotated in this selective bibliography. Concentrating on social science literature, the bibliography is interdisciplinary in nature, spanning agricultural economics, anthropology, community development, community health, and rural sociology. Types of works…
Metab2MeSH: annotating compounds with medical subject headings.

Science.gov (United States)

Sartor, Maureen A; Ade, Alex; Wright, Zach; States, David; Omenn, Gilbert S; Athey, Brian; Karnovsky, Alla

2012-05-15

Progress in high-throughput genomic technologies has led to the development of a variety of resources that link genes to functional information contained in the biomedical literature. However, tools attempting to link small molecules to normal and diseased physiology and published data relevant to biologists and clinical investigators, are still lacking. With metabolomics rapidly emerging as a new omics field, the task of annotating small molecule metabolites becomes highly relevant. Our tool Metab2MeSH uses a statistical approach to reliably and automatically annotate compounds with concepts defined in Medical Subject Headings, and the National Library of Medicine's controlled vocabulary for biomedical concepts. These annotations provide links from compounds to biomedical literature and complement existing resources such as PubChem and the Human Metabolome Database.
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

Science.gov (United States)

Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

2017-07-03

BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Radicalization: An Overview and Annotated Bibliography of Open-Source Literature

Science.gov (United States)

2006-12-15

particularly with liberal, democratic, and humanistic Muslims Phares points to Jihadism as the main root cause of terrorism and suggests that defending...An Overview and Annotated Bibliography of Open-Source Literature 155 of ambiguity), epistemic and existential needs theory (need for closure...This book presents Terror Management Theory, which addresses behavioral and psychological responses to terrorist events. An existential
Adolescent Literacy Resources: An Annotated Bibliography. Second Edition 2009

Science.gov (United States)

Center on Instruction, 2009

2009-01-01

This annotated bibliography updated from a 2007 edition, is intended as a resource for technical assistance providers as they work with states on adolescent literacy. This revision includes current research and documents of practical use in guiding improvements in grades 4-12 reading instruction in the content areas and in interventions for…
A Resource of Quantitative Functional Annotation for Homo sapiens Genes.

Science.gov (United States)

Taşan, Murat; Drabkin, Harold J; Beaver, John E; Chua, Hon Nian; Dunham, Julie; Tian, Weidong; Blake, Judith A; Roth, Frederick P

2012-02-01

The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

Thermal effects on aquatic organisms: annotated bibliography of the 1974 literature

International Nuclear Information System (INIS)

Coutant, C.C.; Talmage, S.S.; Carrier, R.F.; Collier, B.N.

1975-06-01

The annotated bibliography covers the 1974 literature concerning thermal effects on aquatic organisms. Emphasis is placed on the effects of the release of thermal effluents on aquatic ecosystems. Indexes are provided for: author, keywords, subject category, geographic location, taxon, and title (alphabetical listing of keyword-in-context of the nontrivial words in the title). (CH)
Designing Flightdeck Procedures: Literature Resources

Science.gov (United States)

Feldman, Jolene; Barshi, Immanuel; Degani, Asaf; Loukopoulou, Loukia; Mauro, Robert

2017-01-01

This technical publication contains the titles, abstracts, summaries, descriptions, and/or annotations of available literature sources on procedure design and development, requirements, and guidance. It is designed to provide users with an easy access to available resources on the topic of procedure design, and with a sense of the contents of these sources. This repository of information is organized into the following publication sources: Research (e.g., journal articles, conference proceedings), Manufacturers' (e.g., operation manuals, newsletters), and Regulatory and/or Government (e.g., advisory circulars, reports). An additional section contains synopses of Accident/Incident Reports involving procedures. This work directly supports a comprehensive memorandum by Barshi, Mauro, Degani, & Loukopoulou (2016) that summarizes the results of a multi-year project, partially funded by the FAA, to develop technical reference materials that support guidance on the process of developing cockpit procedures (see "Designing Flightdeck Procedures" https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20160013263.pdf). An extensive treatment of this topic is presented in a forthcoming book by the same authors.
Gene annotation from scientific literature using mappings between keyword systems.

Science.gov (United States)

Pérez, Antonio J; Perez-Iratxeta, Carolina; Bork, Peer; Thode, Guillermo; Andrade, Miguel A

2004-09-01

The description of genes in databases by keywords helps the non-specialist to quickly grasp the properties of a gene and increases the efficiency of computational tools that are applied to gene data (e.g. searching a gene database for sequences related to a particular biological process). However, the association of keywords to genes or protein sequences is a difficult process that ultimately implies examination of the literature related to a gene. To support this task, we present a procedure to derive keywords from the set of scientific abstracts related to a gene. Our system is based on the automated extraction of mappings between related terms from different databases using a model of fuzzy associations that can be applied with all generality to any pair of linked databases. We tested the system by annotating genes of the SWISS-PROT database with keywords derived from the abstracts linked to their entries (stored in the MEDLINE database of scientific references). The performance of the annotation procedure was much better for SWISS-PROT keywords (recall of 47%, precision of 68%) than for Gene Ontology terms (recall of 8%, precision of 67%). The algorithm can be publicly accessed and used for the annotation of sequences through a web server at http://www.bork.embl.de/kat
Annotation an effective device for student feedback: a critical review of the literature.

Science.gov (United States)

Ball, Elaine C

2010-05-01

of annotation influences student learning and assessment or, indeed, helps tutors to employ better annotative practices [Juwah, C., Macfarlane-Dick, D., Matthew, B., Nicol, D., Ross, D., Smith, B., 2004. Enhancing student learning through effective formative feedback. The Higher Education Academy, 1-40; Jewitt, C., Kress, G., 2005. English in classrooms: only write down what you need to know: annotation for what? English in Education, 39(1), 5-18]. There is little evidence on ways to heighten students' self-awareness when their essays are returned with annotated feedback [Storch, N., Tapper, J., 1997. Student annotations: what NNS and NS university students say about their own writing. Journal of Second Language Writing, 6(3), 245-265]. The literature review clarifies forms of annotation as feedback practice and offers a summary of the challenges and usefulness of annotation. Copyright 2009. Published by Elsevier Ltd.
Making web annotations persistent over time

Energy Technology Data Exchange (ETDEWEB)

Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

2010-01-01

As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.
Annotated bibliography of coal in the Caribbean region. [Lignite

Energy Technology Data Exchange (ETDEWEB)

Orndorff, R.C.

1985-01-01

The purpose of preparing this annotated bibliography was to compile information on coal localities for the Caribbean region used for preparation of a coal map of the region. Also, it serves as a brief reference list of publications for future coal studies in the Caribbean region. It is in no way an exhaustive study or complete listing of coal literature for the Caribbean. All the material was gathered from published literature with the exception of information from Cuba which was supplied from a study by Gordon Wood of the US Geological Survey, Branch of Coal Resources. Following the classification system of the US Geological Survey (Wood and others, 1983), the term coal resources has been used in this report for reference to general estimates of coal quantities even though authors of the material being annotated may have used the term coal reserves in a similar denotation. The literature ranges from 1857 to 1981. The countries listed include Colombia, Mexico, Venezuela, Cuba, the Dominican Republic, Haiti, Jamaica, Puerto Rico, and the countries of Central America.
Solar Tutorial and Annotation Resource (STAR)

Science.gov (United States)

Showalter, C.; Rex, R.; Hurlburt, N. E.; Zita, E. J.

2009-12-01

We have written a software suite designed to facilitate solar data analysis by scientists, students, and the public, anticipating enormous datasets from future instruments. Our “STAR" suite includes an interactive learning section explaining 15 classes of solar events. Users learn software tools that exploit humans’ superior ability (over computers) to identify many events. Annotation tools include time slice generation to quantify loop oscillations, the interpolation of event shapes using natural cubic splines (for loops, sigmoids, and filaments) and closed cubic splines (for coronal holes). Learning these tools in an environment where examples are provided prepares new users to comfortably utilize annotation software with new data. Upon completion of our tutorial, users are presented with media of various solar events and asked to identify and annotate the images, to test their mastery of the system. Goals of the project include public input into the data analysis of very large datasets from future solar satellites, and increased public interest and knowledge about the Sun. In 2010, the Solar Dynamics Observatory (SDO) will be launched into orbit. SDO’s advancements in solar telescope technology will generate a terabyte per day of high-quality data, requiring innovation in data management. While major projects develop automated feature recognition software, so that computers can complete much of the initial event tagging and analysis, still, that software cannot annotate features such as sigmoids, coronal magnetic loops, coronal dimming, etc., due to large amounts of data concentrated in relatively small areas. Previously, solar physicists manually annotated these features, but with the imminent influx of data it is unrealistic to expect specialized researchers to examine every image that computers cannot fully process. A new approach is needed to efficiently process these data. Providing analysis tools and data access to students and the public have proven
Pipeline to upgrade the genome annotations

Directory of Open Access Journals (Sweden)

Lijin K. Gopi

2017-12-01

Full Text Available Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS. Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus. It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms.
Rfam: annotating families of non-coding RNA sequences.

Science.gov (United States)

Daub, Jennifer; Eberhardt, Ruth Y; Tate, John G; Burge, Sarah W

2015-01-01

The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.
Assessment of community-submitted ontology annotations from a novel database-journal partnership.

Science.gov (United States)

Berardini, Tanya Z; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva

2012-01-01

As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.
DFAST and DAGA: web-based integrated genome annotation tools and resources.

Science.gov (United States)

Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori

2016-01-01

Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.
Annotated bibliography

International Nuclear Information System (INIS)

1997-08-01

Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners
Annotated Bibliography of Alcohol, Other Drug, and Violence Prevention Resources, 2006-2008

Science.gov (United States)

Segars, Lance, Ed.; Akinola, Olayinka, Ed.

2009-01-01

The U.S. Department of Education's Higher Education Center for Alcohol and Other Drug Abuse and Violence Prevention has developed this annotated bibliography to provide those interested in prevention at colleges and universities--and in surrounding communities--with a ready reference of current, important, and available information resources.…
Policies and Background Literature for Self-Education on Research Data Management: An Annotated Bibliography

Science.gov (United States)

Goben, Abigail; Raszewski, Rebecca

2015-01-01

Librarians navigating research data management self-education have an increasing body of literature to choose from, which may become overwhelming. This annotated bibliography reviews: (1) U.S. federal policies; (2) articles; and (3) books to assist librarians who are self-educating on research data management or are seeking background reading…
Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

Science.gov (United States)

Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

2017-06-26

The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis
Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

Directory of Open Access Journals (Sweden)

Dorrell Nick

2007-06-01

Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.
The Africa Collection: An Annotated Historical Resource Bibliography for the Student of Africa.

Science.gov (United States)

Lynn, Karen

This annotated bibliographic collection of resources on Africa including non-fiction, fiction, texts, poetry, draft papers, addresses, periodicals, film, records, and travel agencies is designed to aid secondary students and their teachers interested in research on Africa. An instructional approach is taken, drawing upon examples to demonstrate…
OAHG: an integrated resource for annotating human genes with multi-level ontologies.

Science.gov (United States)

Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

2016-10-05

OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ 2 = 0.2428, p < 2.2e-16).
MimoSA: a system for minimotif annotation

Directory of Open Access Journals (Sweden)

Kundeti Vamsi

2010-06-01

Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to
Resources on Campus Governance and Employment Relations, 1967-1977. With Essay, Annotations, and Indexes.

Science.gov (United States)

Tice, Terrence N.

An annotated bibliography containing 1,110 items from 1967 through early 1977 and other relevant resources covers virtually all the significant material on campus collective bargaining and closely related issues regarding campus governance and employment relations. It complements five publications from 1972-76 and indexes both authors and…

Using literature and data to learn Bayesian networks as clinical models of ovarian tumors

DEFF Research Database (Denmark)

Antal, P.; Fannes, G.; Timmerman, D.

2004-01-01

Thanks to its increasing availability, electronic literature has become a potential source of information for the development of complex Bayesian networks (BN), when human expertise is missing or data is scarce or contains much noise. This opportunity raises the question of how to integrate...... information from free-text resources with statistical data in learning Bayesian networks. Firstly, we report on the collection of prior information resources in the ovarian cancer domain, which includes "kernel" annotations of the domain variables. We introduce methods based on the annotations and literature...
A Review and Annotated Bibliography of the Literature Pertaining to Team and Small Group Performance (1989 to 1999)

National Research Council Canada - National Science Library

LaJoie, Andrew

1999-01-01

.... Training and military doctrine has been evolving to reflect this emphasis on teamwork. The purpose of this annotated bibliography is to review literature published over the last ten years concerning team and small group performance...
Use of wildlife webcams - Literature review and annotated bibliography

Science.gov (United States)

Ratz, Joan M.; Conk, Shannon J.

2010-01-01

The U.S. Fish and Wildlife Service National Conservation Training Center requested a literature review product that would serve as a resource to natural resource professionals interested in using webcams to connect people with nature. The literature review focused on the effects on the public of viewing wildlife through webcams and on information regarding installation and use of webcams. We searched the peer reviewed, published literature for three topics: wildlife cameras, virtual tourism, and technological nature. Very few publications directly addressed the effect of viewing wildlife webcams. The review of information on installation and use of cameras yielded information about many aspects of the use of remote photography, but not much specifically regarding webcams. Aspects of wildlife camera use covered in the literature review include: camera options, image retrieval, system maintenance and monitoring, time to assemble, power source, light source, camera mount, frequency of image recording, consequences for animals, and equipment security. Webcam technology is relatively new and more publication regarding the use of the technology is needed. Future research should specifically study the effect that viewing wildlife through webcams has on the viewers' conservation attitudes, behaviors, and sense of connectedness to nature.
BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments.

Science.gov (United States)

López-Fernández, H; Reboiro-Jato, M; Glez-Peña, D; Aparicio, F; Gachet, D; Buenaga, M; Fdez-Riverola, F

2013-07-01

Automatic term annotation from biomedical documents and external information linking are becoming a necessary prerequisite in modern computer-aided medical learning systems. In this context, this paper presents BioAnnote, a flexible and extensible open-source platform for automatically annotating biomedical resources. Apart from other valuable features, the software platform includes (i) a rich client enabling users to annotate multiple documents in a user friendly environment, (ii) an extensible and embeddable annotation meta-server allowing for the annotation of documents with local or remote vocabularies and (iii) a simple client/server protocol which facilitates the use of our meta-server from any other third-party application. In addition, BioAnnote implements a powerful scripting engine able to perform advanced batch annotations. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Thermal effects on aquatic organisms: an annotated bibliography of the 1976 literature

Energy Technology Data Exchange (ETDEWEB)

Talmage, S.S. (comp.)

1978-05-01

This bibliography, containing 784 annotated references on the effects of temperature on aquatic organisms, is part of an assessment of the literature on the effects of thermal power plants on the environment. The effects of thermal discharges at power plant sites are emphasized. Laboratory and field studies on temperature tolerance and the effects of temperature changes on reproduction, development, growth, distribution, physiology, and sensitivity to other stresses are included. Indexes are provided for author, keywords, subject category, geographic location of the study, taxon, and title (alphabetical listing of keywords-in-context of nontrivial words in the title).
Thermal effects on aquatic organisms: an annotated bibliography of the 1977 literature

International Nuclear Information System (INIS)

Talmage, S.S.

1978-12-01

This bibliography, containing 537 references from the 1977 literature, is the seventh in a series of annotated bibliographies on the effects of heat on aquatic organisms. The effects of thermal discharges at power plant sites are emphasized. Laboratory and field studies on temperature tolerance and the effects of temperature changes on reproduction, development, growth, distribution, physiology, and sensitivity to other stresses are included. References in the bibliography are divided into three subject categories: marine systems, freshwater systems, and estuaries. The references are arranged alphabetically by first author. Indexes are provided for author, keywords, subject category, geographic location of the study, taxon, and title
Improving HIV proteome annotation: new features of BioAfrica HIV Proteomics Resource.

Science.gov (United States)

Druce, Megan; Hulo, Chantal; Masson, Patrick; Sommer, Paula; Xenarios, Ioannis; Le Mercier, Philippe; De Oliveira, Tulio

2016-01-01

The Human Immunodeficiency Virus (HIV) is one of the pathogens that cause the greatest global concern, with approximately 35 million people currently infected with HIV. Extensive HIV research has been performed, generating a large amount of HIV and host genomic data. However, no effective vaccine that protects the host from HIV infection is available and HIV is still spreading at an alarming rate, despite effective antiretroviral (ARV) treatment. In order to develop effective therapies, we need to expand our knowledge of the interaction between HIV and host proteins. In contrast to virus proteins, which often rapidly evolve drug resistance mutations, the host proteins are essentially invariant within all humans. Thus, if we can identify the host proteins needed for virus replication, such as those involved in transporting viral proteins to the cell surface, we have a chance of interrupting viral replication. There is no proteome resource that summarizes this interaction, making research on this subject a difficult enterprise. In order to fill this gap in knowledge, we curated a resource presents detailed annotation on the interaction between the HIV proteome and host proteins. Our resource was produced in collaboration with ViralZone and used manual curation techniques developed by UniProtKB/Swiss-Prot. Our new website also used previous annotations of the BioAfrica HIV-1 Proteome Resource, which has been accessed by approximately 10 000 unique users a year since its inception in 2005. The novel features include a dedicated new page for each HIV protein, a graphic display of its function and a section on its interaction with host proteins. Our new webpages also add information on the genomic location of each HIV protein and the position of ARV drug resistance mutations. Our improved BioAfrica HIV-1 Proteome Resource fills a gap in the current knowledge of biocuration.Database URL:http://www.bioafrica.net/proteomics/HIVproteome.html. © The Author(s) 2016. Published
Annotated bibliography of the Russian languages literature on glaciology for 2015

Directory of Open Access Journals (Sweden)

V. M. Kotlyakov

2017-01-01

Full Text Available The proposed annual bibliography continues annotated lists of the Russian‑language literature on glaciology that were regularly published in the past. It includes 245 references grouped into the following ten sections: 1 general issues of glaciology; 2 physics and chemistry of ice; 3 atmospheric ice; 4 snow cover; 5 ava‑ lanches and glacial mudflows; 6 sea ice; 7 river and lake ice; 8 icings and ground ice; 9 the glaciers and ice caps; 10 palaeoglaciology. In addition to the works of the current year, some works of earlier years are added, that, for various reasons, were not included in previous bibliographies.
Counselor's Information Service. A Quarterly Annotated Bibliography of Current Literature on Educational and Vocational Guidance. Volume 32, Number 1.

Science.gov (United States)

B'nai B'rith, Washington, DC. Career and Counseling Services.

This quarterly annotated bibliography of current literature on educational and vocational guidance describes various pamphlets, guides and brochures which provide occupational information, educational, vocational and personal guidance, guidance administration and procedures, information on student aids and aids for the teacher. Additional guidance…
MIPS bacterial genomes functional annotation benchmark dataset.

Science.gov (United States)

Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

2005-05-15

Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab
Thermal effects on aquatic organisms: an annotated bibliography of the 1977 literature

Energy Technology Data Exchange (ETDEWEB)

Talmage, S.S. (comp.)

1978-12-01

This bibliography, containing 537 references from the 1977 literature, is the seventh in a series of annotated bibliographies on the effects of heat on aquatic organisms. The effects of thermal discharges at power plant sites are emphasized. Laboratory and field studies on temperature tolerance and the effects of temperature changes on reproduction, development, growth, distribution, physiology, and sensitivity to other stresses are included. References in the bibliography are divided into three subject categories: marine systems, freshwater systems, and estuaries. The references are arranged alphabetically by first author. Indexes are provided for author, keywords, subject category, geographic location of the study, taxon, and title (alphabetical listing of keywords-in-context of nontrivial words in the title).
A Selected Annotated Bibliography on Work Time Options.

Science.gov (United States)

Ivantcho, Barbara

This annotated bibliography is divided into three sections. Section I contains annotations of general publications on work time options. Section II presents resources on flexitime and the compressed work week. In Section III are found resources related to these reduced work time options: permanent part-time employment, job sharing, voluntary…
Geothermal wetlands: an annotated bibliography of pertinent literature

Energy Technology Data Exchange (ETDEWEB)

Stanley, N.E.; Thurow, T.L.; Russell, B.F.; Sullivan, J.F.

1980-05-01

This annotated bibliography covers the following topics: algae, wetland ecosystems; institutional aspects; macrophytes - general, production rates, and mineral absorption; trace metal absorption; wetland soils; water quality; and other aspects of marsh ecosystems. (MHR)
Concept annotation in the CRAFT corpus.

Science.gov (United States)

Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

2012-07-09

Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
An Annotated Bibliography of the Literature Dealing with the Incorporation of Right Brain Learning into Left Brain Oriented Schools.

Science.gov (United States)

Lewallen, Martha

Articles and documents concerning brain growth and hemispheric specialization, theories of cognitive style, educational implications of brain research, and right-brain learning activities are cited in this annotated bibliography. Citations are preceded by a glossary of terms and followed by a brief review of the assembled literature. Educational…
Quantum Computing: Selected Internet Resources for Librarians, Researchers, and the Casually Curious

OpenAIRE

Cirasella, Jill

2009-01-01

This article is an annotated selection of the most important and informative Internet resources for learning about quantum computing, finding quantum computing literature, and tracking quantum computing news.
A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

Science.gov (United States)

Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

2014-10-12

BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

Science.gov (United States)

Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

2014-01-01

Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific
Resource impact: curse or blessing? a literature survey

International Nuclear Information System (INIS)

Stevens, P.

2003-01-01

Common sense and economic theory suggest large revenues from natural resource projects should generate economic progress and development. Yet much evidence argues the opposite and that resource-rich countries suffer from 'resource curse'. This paper provides a survey of the academic literature on the impact of natural resources on an economy. The topic has long attracted interest in the economics literature but more recently, interest has revived. The paper first considers the large body of empirical work examining the relationship between resource abundance, poor economic performance and poverty. While this evidence supports the view of a negative impact, it is not without criticism and some assert a few countries managed instead to receive a 'blessing'. The paper assesses how the literature explains the transmission mechanisms between resource revenues and economic damage. Six areas are discussed: a long-term decline in terms of trade; revenue volatility; Dutch disease; crowding out effects; increasing the role of the state; and the sociocultural and political impacts. Finally, various options from the literature to avoid negative impacts are analysed: not developing the mineral deposits; diversifying the economy away from dependence on oil, gas and mineral exports; sterilising the incoming revenue; the use of stabilisation and oil funds; and reconsidering investment policies. The paper finishes by assessing what political reforms might be needed to carry out the necessary policies to avoid negative impacts. (author)
An Atlas of annotations of Hydra vulgaris transcriptome.

Science.gov (United States)

Evangelista, Daniela; Tripathi, Kumar Parijat; Guarracino, Mario Rosario

2016-09-22

RNA sequencing takes advantage of the Next Generation Sequencing (NGS) technologies for analyzing RNA transcript counts with an excellent accuracy. Trying to interpret this huge amount of data in biological information is still a key issue, reason for which the creation of web-resources useful for their analysis is highly desiderable. Starting from a previous work, Transcriptator, we present the Atlas of Hydra's vulgaris, an extensible web tool in which its complete transcriptome is annotated. In order to provide to the users an advantageous resource that include the whole functional annotated transcriptome of Hydra vulgaris water polyp, we implemented the Atlas web-tool contains 31.988 accesible and downloadable transcripts of this non-reference model organism. Atlas, as a freely available resource, can be considered a valuable tool to rapidly retrieve functional annotation for transcripts differentially expressed in Hydra vulgaris exposed to the distinct experimental treatments. WEB RESOURCE URL: http://www-labgtp.na.icar.cnr.it/Atlas .

A selective annotated bibliography for clinical audiology (1989-2009): books.

Science.gov (United States)

Ferrer-Vinent, Susan T; Ferrer-Vinent, Ignacio J

2010-12-01

This is the 2nd in a series of 3 planned companion articles that present a selected, annotated, and indexed bibliography of clinical audiology publications from 1989 through 2009. Research and preparation of the bibliography were based on published guidelines, professional audiology experience, and professional librarian experience. The first article in the series covered reference works. This article focuses on other books. The planned third companion article will present periodicals and online resources. Audiologists and librarians can use this bibliography to help them identify relevant clinical audiology literature.
Quantum Computing: Selected Internet Resources for Librarians, Researchers, and the Casually Curious

Science.gov (United States)

Cirasella, Jill

2009-01-01

This article presents an annotated selection of the most important and informative Internet resources for learning about quantum computing, finding quantum computing literature, and tracking quantum computing news. All of the quantum computing resources described in this article are freely available, English-language web sites that fall into one…
DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

Directory of Open Access Journals (Sweden)

Baseler Michael W

2007-11-01

Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.
Metingear: a development environment for annotating genome-scale metabolic models.

Science.gov (United States)

May, John W; James, A Gordon; Steinbeck, Christoph

2013-09-01

Genome-scale metabolic models often lack annotations that would allow them to be used for further analysis. Previous efforts have focused on associating metabolites in the model with a cross reference, but this can be problematic if the reference is not freely available, multiple resources are used or the metabolite is added from a literature review. Associating each metabolite with chemical structure provides unambiguous identification of the components and a more detailed view of the metabolism. We have developed an open-source desktop application that simplifies the process of adding database cross references and chemical structures to genome-scale metabolic models. Annotated models can be exported to the Systems Biology Markup Language open interchange format. Source code, binaries, documentation and tutorials are freely available at http://johnmay.github.com/metingear. The application is implemented in Java with bundles available for MS Windows and Macintosh OS X.
Potential use of geothermal resources in the Snake River Basin: an environmental overview. Volume II. Annotated bibliography

Energy Technology Data Exchange (ETDEWEB)

Spencer, S.G.; Russell, B.F.; Sullivan, J.F. (eds.)

1979-09-01

This volume is a partially annotated bibliography of reference materials pertaining to the seven KGRA's. The bibliography is divided into sections by program element as follows: terrestrial ecology, aquatic ecology, heritage resources, socioeconomics and demography, geology, geothermal, soils, hydrology and water quality, seismicity, and subsidence. Cross-referencing is available for those references which are applicable to specific KGRA's. (MHR)
A selective annotated bibliography for clinical audiology (1988-2008): reference works.

Science.gov (United States)

Ferrer-Vinent, Susan T; Ferrer-Vinent, Ignacio J

2009-06-01

This is the 1st in a series of 3 planned companion articles that present a selected, annotated, and indexed bibliography of clinical audiology publications from 1988 to 2008. Research and preparation of the bibliography were based on published guidelines, professional audiology experience, and professional librarian experience. This article presents reference works (dictionaries, encyclopedias, handbooks, and manuals). The future planned articles will cover other monographs, periodicals, and online resources. Audiologists and librarians can use these lists as a guide when seeking clinical audiology literature.
WormBase: Annotating many nematode genomes.

Science.gov (United States)

Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

2012-01-01

WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.
Transcriptomic resources for the medicinal legume Mucuna pruriens: de novo transcriptome assembly, annotation, identification and validation of EST-SSR markers.

Science.gov (United States)

Sathyanarayana, N; Pittala, Ranjith Kumar; Tripathi, Pankaj Kumar; Chopra, Ratan; Singh, Heikham Russiachand; Belamkar, Vikas; Bhardwaj, Pardeep Kumar; Doyle, Jeff J; Egan, Ashley N

2017-05-25

The medicinal legume Mucuna pruriens (L.) DC. has attracted attention worldwide as a source of the anti-Parkinson's drug L-Dopa. It is also a popular green manure cover crop that offers many agronomic benefits including high protein content, nitrogen fixation and soil nutrients. The plant currently lacks genomic resources and there is limited knowledge on gene expression, metabolic pathways, and genetics of secondary metabolite production. Here, we present transcriptomic resources for M. pruriens, including a de novo transcriptome assembly and annotation, as well as differential transcript expression analyses between root, leaf, and pod tissues. We also develop microsatellite markers and analyze genetic diversity and population structure within a set of Indian germplasm accessions. One-hundred ninety-one million two hundred thirty-three thousand two hundred forty-two bp cleaned reads were assembled into 67,561 transcripts with mean length of 626 bp and N50 of 987 bp. Assembled sequences were annotated using BLASTX against public databases with over 80% of transcripts annotated. We identified 7,493 simple sequence repeat (SSR) motifs, including 787 polymorphic repeats between the parents of a mapping population. 134 SSRs from expressed sequenced tags (ESTs) were screened against 23 M. pruriens accessions from India, with 52 EST-SSRs retained after quality control. Population structure analysis using a Bayesian framework implemented in fastSTRUCTURE showed nearly similar groupings as with distance-based (neighbor-joining) and principal component analyses, with most of the accessions clustering per geographical origins. Pair-wise comparison of transcript expression in leaves, roots and pods identified 4,387 differentially expressed transcripts with the highest number occurring between roots and leaves. Differentially expressed transcripts were enriched with transcription factors and transcripts annotated as belonging to secondary metabolite pathways. The M
Xylella fastidiosa comparative genomic database is an information resource to explore the annotation, genomic features, and biology of different strains

Directory of Open Access Journals (Sweden)

Alessandro M. Varani

2012-01-01

Full Text Available The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.
dictyBase 2015: Expanding data and annotations in a new software environment.

Science.gov (United States)

Basu, Siddhartha; Fey, Petra; Jimenez-Morales, David; Dodson, Robert J; Chisholm, Rex L

2015-08-01

dictyBase is the model organism database for the social amoeba Dictyostelium discoideum and related species. The primary mission of dictyBase is to provide the biomedical research community with well-integrated high quality data, and tools that enable original research. Data presented at dictyBase is obtained from sequencing centers, groups performing high throughput experiments such as large-scale mutagenesis studies, and RNAseq data, as well as a growing number of manually added functional gene annotations from the published literature, including Gene Ontology, strain, and phenotype annotations. Through the Dicty Stock Center we provide the community with an impressive amount of annotated strains and plasmids. Recently, dictyBase accomplished a major overhaul to adapt an outdated infrastructure to the current technological advances, thus facilitating the implementation of innovative tools and comparative genomics. It also provides new strategies for high quality annotations that enable bench researchers to benefit from the rapidly increasing volume of available data. dictyBase is highly responsive to its users needs, building a successful relationship that capitalizes on the vast efforts of the Dictyostelium research community. dictyBase has become the trusted data resource for Dictyostelium investigators, other investigators or organizations seeking information about Dictyostelium, as well as educators who use this model system. © 2015 Wiley Periodicals, Inc.
Comparison of concept recognizers for building the Open Biomedical Annotator

Directory of Open Access Journals (Sweden)

Rubin Daniel

2009-09-01

Full Text Available Abstract The National Center for Biomedical Ontology (NCBO is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2:S1. The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers – NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.
Annotated bibliography of Software Engineering Laboratory literature

Science.gov (United States)

Morusiewicz, Linda; Valett, Jon D.

1991-01-01

An annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory is given. More than 100 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. All materials have been grouped into eight general subject areas for easy reference: The Software Engineering Laboratory; The Software Engineering Laboratory: Software Development Documents; Software Tools; Software Models; Software Measurement; Technology Evaluations; Ada Technology; and Data Collection. Subject and author indexes further classify these documents by specific topic and individual author.
An annotated corpus with nanomedicine and pharmacokinetic parameters.

Science.gov (United States)

Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

2017-01-01

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.
Discovering gene annotations in biomedical text databases

Directory of Open Access Journals (Sweden)

Ozsoyoglu Gultekin

2008-03-01

Full Text Available Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i automating the annotation of genomic entities with Gene Ontology concepts, and (ii providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate
Image annotation under X Windows

Science.gov (United States)

Pothier, Steven

1991-08-01

A mechanism for attaching graphic and overlay annotation to multiple bits/pixel imagery while providing levels of performance approaching that of native mode graphics systems is presented. This mechanism isolates programming complexity from the application programmer through software encapsulation under the X Window System. It ensures display accuracy throughout operations on the imagery and annotation including zooms, pans, and modifications of the annotation. Trade-offs that affect speed of display, consumption of memory, and system functionality are explored. The use of resource files to tune the display system is discussed. The mechanism makes use of an abstraction consisting of four parts; a graphics overlay, a dithered overlay, an image overly, and a physical display window. Data structures are maintained that retain the distinction between the four parts so that they can be modified independently, providing system flexibility. A unique technique for associating user color preferences with annotation is introduced. An interface that allows interactive modification of the mapping between image value and color is discussed. A procedure that provides for the colorization of imagery on 8-bit display systems using pixel dithering is explained. Finally, the application of annotation mechanisms to various applications is discussed.
The Eimeria Transcript DB: an integrated resource for annotated transcripts of protozoan parasites of the genus Eimeria

Science.gov (United States)

Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur

2013-01-01

Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718
Objective-guided image annotation.

Science.gov (United States)

Mao, Qi; Tsang, Ivor Wai-Hung; Gao, Shenghua

2013-04-01

Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macro-averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over state-of-the-art baseline methods in terms of example-based measures on four
Annotated bibliography: overview of energy and mineral resources for the Nevada nuclear-waste-storage investigations, Nevada Test Site, Nye County, Nevada

International Nuclear Information System (INIS)

Bell, E.J.; Larson, L.T.

1982-09-01

This Annotated Bibliography was prepared for the US Department of Energy as part of the Environmental Area Characterization for the Nevada Nuclear Waste Storage Investigations (NNWSI) at the Nevada Test Site (NTS). References were selected to specifically address energy resources including hydrocarbons, geothermal and radioactive fuel materials, mineral resources including base and precious metals and associated minerals, and industrial minerals and rock materials which occur in the vicinity of the NNWSI area
Semantic annotation of consumer health questions.

Science.gov (United States)

Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina

2018-02-06

Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most
MIPS: curated databases and comprehensive secondary data resources in 2010.

Science.gov (United States)

Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

2011-01-01

The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

Elucidating high-dimensional cancer hallmark annotation via enriched ontology.

Science.gov (United States)

Yan, Shankai; Wong, Ka-Chun

2017-09-01

Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as a complementary approach that can retrieve knowledge from massive text information, advancing numerous focused studies in cancer research. Nonetheless, the high-dimensional nature of cancer hallmark annotation imposes a unique challenge. To address the curse of dimensionality, we compared multiple cancer hallmark annotation methods on 1580 PubMed abstracts. Based on the insights, a novel approach, UDT-RF, which makes use of ontological features is proposed. It expands the feature space via the Medical Subject Headings (MeSH) ontology graph and utilizes novel feature selections for elucidating the high-dimensional cancer hallmark annotation space. To demonstrate its effectiveness, state-of-the-art methods are compared and evaluated by a multitude of performance metrics, revealing the full performance spectrum on the full set of cancer hallmarks. Several case studies are conducted, demonstrating how the proposed approach could reveal novel insights into cancers. https://github.com/cskyan/chmannot. Copyright © 2017 Elsevier Inc. All rights reserved.
Annotated bibliography of software engineering laboratory literature

Science.gov (United States)

Kistler, David; Bristow, John; Smith, Don

1994-01-01

This document is an annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory. Nearly 200 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. This document has been updated and reorganized substantially since the original version (SEL-82-006, November 1982). All materials have been grouped into eight general subject areas for easy reference: (1) The Software Engineering Laboratory; (2) The Software Engineering Laboratory: Software Development Documents; (3) Software Tools; (4) Software Models; (5) Software Measurement; (6) Technology Evaluations; (7) Ada Technology; and (8) Data Collection. This document contains an index of these publications classified by individual author.
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

Science.gov (United States)

Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E

2017-08-17

Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not
Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

Science.gov (United States)

Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

2016-01-01

Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.
Public Relations: Selected, Annotated Bibliography.

Science.gov (United States)

Demo, Penny

Designed for students and practitioners of public relations (PR), this annotated bibliography focuses on recent journal articles and ERIC documents. The 34 citations include the following: (1) surveys of public relations professionals on career-related education; (2) literature reviews of research on measurement and evaluation of PR and…
A framework for annotating human genome in disease context.

Science.gov (United States)

Xu, Wei; Wang, Huisong; Cheng, Wenqing; Fu, Dong; Xia, Tian; Kibbe, Warren A; Lin, Simon M

2012-01-01

Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.
An open annotation ontology for science on web 3.0.

Science.gov (United States)

Ciccarese, Paolo; Ocana, Marco; Garcia Castro, Leyla Jael; Das, Sudeshna; Clark, Tim

2011-05-17

There is currently a gap between the rich and expressive collection of published biomedical ontologies, and the natural language expression of biomedical papers consumed on a daily basis by scientific researchers. The purpose of this paper is to provide an open, shareable structure for dynamic integration of biomedical domain ontologies with the scientific document, in the form of an Annotation Ontology (AO), thus closing this gap and enabling application of formal biomedical ontologies directly to the literature as it emerges. Initial requirements for AO were elicited by analysis of integration needs between biomedical web communities, and of needs for representing and integrating results of biomedical text mining. Analysis of strengths and weaknesses of previous efforts in this area was also performed. A series of increasingly refined annotation tools were then developed along with a metadata model in OWL, and deployed for feedback and additional requirements the ontology to users at a major pharmaceutical company and a major academic center. Further requirements and critiques of the model were also elicited through discussions with many colleagues and incorporated into the work. This paper presents Annotation Ontology (AO), an open ontology in OWL-DL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables "stand-off" or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation. AO is freely available under open source license at http://purl.org/ao/, and extensive documentation including screencasts is available on AO's Google Code page: http://code.google.com/p/annotation-ontology/ . The Annotation Ontology meets critical requirements for
The GATO gene annotation tool for research laboratories

Directory of Open Access Journals (Sweden)

A. Fujita

2005-11-01

Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.
A topic modeling approach for web service annotation

Directory of Open Access Journals (Sweden)

Leandro Ordóñez-Ante

2014-06-01

Full Text Available The actual implementation of semantic-based mechanisms for service retrieval has been restricted, given the resource-intensive procedure involved in the formal specification of services, which generally comprises associating semantic annotations to their documentation sources. Typically, developer performs such a procedure by hand, requiring specialized knowledge on models for semantic description of services (e.g. OWL-S, WSMO, SAWSDL, as well as formal specifications of knowledge. Thus, this semantic-based service description procedure turns out to be a cumbersome and error-prone task. This paper introduces a proposal for service annotation, based on processing web service documentation for extracting information regarding its offered capabilities. By uncovering the hidden semantic structure of such information through statistical analysis techniques, we are able to associate meaningful annotations to the services operations/resources, while grouping those operations into non-exclusive semantic related categories. This research paper belongs to the TelComp 2.0 project, which Colciencas and University of Cauca founded in cooperation.
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

Directory of Open Access Journals (Sweden)

Deng Jixin

2009-02-01

Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins
ANNOTATED BIBLIOGRAPHY OF RUSSIAN LITERATURE ON GLACIOLOGY IN 2010

Directory of Open Access Journals (Sweden)

V. M. Kotlyakov

2012-01-01

Full Text Available This bibliography includes 294 books and articles published mostly in 2010. Annotated books and articles relate to ten thematic sections of glaciology:1. General problems of glaciology2. Physics and chemistry of ice3. Ice in atmosphere4. Snow cover5. Snow avalanches and glacial mudflows6. Sea ice7. River and lake ice8. Icings and ground ice9. Glaciers and ice sheets10. Paleoglaciology
Assessment of disease named entity recognition on a corpus of annotated sentences.

Science.gov (United States)

Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich

2008-04-11

In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found
Practical guide to electronic resources in the humanities

CERN Document Server

Dubnjakovic, Ana

2010-01-01

From full-text article databases to digitized collections of primary source materials, newly emerging electronic resources have radically impacted how research in the humanities is conducted and discovered. This book, covering high-quality, up-to-date electronic resources for the humanities, is an easy-to-use annotated guide for the librarian, student, and scholar alike. It covers online databases, indexes, archives, and many other critical tools in key humanities disciplines including philosophy, religion, languages and literature, and performing and visual arts. Succinct overviews of key eme
An annotated corpus with nanomedicine and pharmacokinetic parameters

Directory of Open Access Journals (Sweden)

Lewinski NA

2017-10-01

Full Text Available Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corpora
Annotated chemical patent corpus: a gold standard for text mining.

Directory of Open Access Journals (Sweden)

Saber A Akhondi

Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
An Annotated Bibliography of Isotonic Weight-Training Methods.

Science.gov (United States)

Wysong, John V.

This literature study was conducted to compare and evaluate various types and techniques of weight lifting so that a weight lifting program could be selected or devised for a secondary school. Annotations of 32 research reports, journal articles, and monographs on isotonic strength training are presented. The literature in the first part of the…
Improving Microbial Genome Annotations in an Integrated Database Context

Science.gov (United States)

Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

2013-01-01

Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
Improving microbial genome annotations in an integrated database context.

Directory of Open Access Journals (Sweden)

I-Min A Chen

Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.
EST-PAC a web package for EST annotation and protein sequence prediction

Directory of Open Access Journals (Sweden)

Strahm Yvan

2006-10-01

Full Text Available Abstract With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1 searching local or remote biological databases for sequence similarities using Blast services, 2 predicting protein coding sequence from EST data and, 3 annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics.
BioCause: Annotating and analysing causality in the biomedical domain.

Science.gov (United States)

Mihăilă, Claudiu; Ohta, Tomoko; Pyysalo, Sampo; Ananiadou, Sophia

2013-01-16

Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new

Automatic assignment of prokaryotic genes to functional categories using literature profiling.

Directory of Open Access Journals (Sweden)

Raul Torrieri

Full Text Available In the last years, there was an exponential increase in the number of publicly available genomes. Once finished, most genome projects lack financial support to review annotations. A few of these gene annotations are based on a combination of bioinformatics evidence, however, in most cases, annotations are based solely on sequence similarity to a previously known gene, which was most probably annotated in the same way. As a result, a large number of predicted genes remain unassigned to any functional category despite the fact that there is enough evidence in the literature to predict their function. We developed a classifier trained with term-frequency vectors automatically disclosed from text corpora of an ensemble of genes representative of each functional category of the J. Craig Venter Institute Comprehensive Microbial Resource (JCVI-CMR ontology. The classifier achieved up to 84% precision with 68% recall (for confidence≥0.4, F-measure 0.76 (recall and precision equally weighted in an independent set of 2,220 genes, from 13 bacterial species, previously classified by JCVI-CMR into unambiguous categories of its ontology. Finally, the classifier assigned (confidence≥0.7 to functional categories a total of 5,235 out of the ∼24 thousand genes previously in categories "Unknown function" or "Unclassified" for which there is literature in MEDLINE. Two biologists reviewed the literature of 100 of these genes, randomly picket, and assigned them to the same functional categories predicted by the automatic classifier. Our results confirmed the hypothesis that it is possible to confidently assign genes of a real world repository to functional categories, based exclusively on the automatic profiling of its associated literature. The LitProf--Gene Classifier web server is accessible at: www.cebio.org/litprofGC.
Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

Directory of Open Access Journals (Sweden)

Tie Hua Zhou

2015-05-01

Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.
MSeqDR mvTool: A mitochondrial DNA Web and API resource for comprehensive variant annotation, universal nomenclature collation, and reference genome conversion.

Science.gov (United States)

Shen, Lishuang; Attimonelli, Marcella; Bai, Renkui; Lott, Marie T; Wallace, Douglas C; Falk, Marni J; Gai, Xiaowu

2018-06-01

Accurate mitochondrial DNA (mtDNA) variant annotation is essential for the clinical diagnosis of diverse human diseases. Substantial challenges to this process include the inconsistency in mtDNA nomenclatures, the existence of multiple reference genomes, and a lack of reference population frequency data. Clinicians need a simple bioinformatics tool that is user-friendly, and bioinformaticians need a powerful informatics resource for programmatic usage. Here, we report the development and functionality of the MSeqDR mtDNA Variant Tool set (mvTool), a one-stop mtDNA variant annotation and analysis Web service. mvTool is built upon the MSeqDR infrastructure (https://mseqdr.org), with contributions of expert curated data from MITOMAP (https://www.mitomap.org) and HmtDB (https://www.hmtdb.uniba.it/hmdb). mvTool supports all mtDNA nomenclatures, converts variants to standard rCRS- and HGVS-based nomenclatures, and annotates novel mtDNA variants. Besides generic annotations from dbNSFP and Variant Effect Predictor (VEP), mvTool provides allele frequencies in more than 47,000 germline mitogenomes, and disease and pathogenicity classifications from MSeqDR, Mitomap, HmtDB and ClinVar (Landrum et al., 2013). mvTools also provides mtDNA somatic variants annotations. "mvTool API" is implemented for programmatic access using inputs in VCF, HGVS, or classical mtDNA variant nomenclatures. The results are reported as hyperlinked html tables, JSON, Excel, and VCF formats. MSeqDR mvTool is freely accessible at https://mseqdr.org/mvtool.php. © 2018 Wiley Periodicals, Inc.
Algae from the arid southwestern United States: an annotated bibliography

Energy Technology Data Exchange (ETDEWEB)

Thomas, W.H.; Gaines, S.R.

1983-06-01

Desert algae are attractive biomass producers for capturing solar energy through photosynthesis of organic matter. They are probably capable of higher yields and efficiencies of light utilization than higher plants, and are already adapted to extremes of sunlight intensity, salinity and temperature such as are found in the desert. This report consists of an annotated bibliography of the literature on algae from the arid southwestern United States. It was prepared in anticipation of efforts to isolate desert algae and study their yields in the laboratory. These steps are necessary prior to setting up outdoor algal culture ponds. Desert areas are attractive for such applications because land, sunlight, and, to some extent, water resources are abundant there. References are sorted by state.
Low-level radioactive waste technology: a selected, annotated bibliography

International Nuclear Information System (INIS)

Fore, C.S.; Carrier, R.F.; Brewster, R.H.; Hyder, L.K.; Barnes, K.A.

1981-10-01

This annotated bibliography of 416 references represents the third in a series to be published by the Hazardous Materials Information Center containing scientific, technical, economic, and regulatory information relevant to low-level radioactive waste technology. The bibliography focuses on disposal site, environmental transport, and waste treatment studies as well as general reviews on the subject. The publication covers both domestic and foreign literature for the period 1951 to 1981. Major chapters selected are Chemical and Physical Aspects; Container Design and Performance; Disposal Site; Environmental Transport; General Studies and Reviews; Geology, Hydrology, and Site Resources; Regulatory and Economic Aspects; Social Aspects; Transportation Technology; Waste Production; and Waste Treatment. Entries in each of the chapters are further classified as a field study, laboratory study, theoretical study, or general overview involving one or more of these research areas
A Linked Data-Based Collaborative Annotation System for Increasing Learning Achievements

Science.gov (United States)

Zarzour, Hafed; Sellami, Mokhtar

2017-01-01

With the emergence of the Web 2.0, collaborative annotation practices have become more mature in the field of learning. In this context, several recent studies have shown the powerful effects of the integration of annotation mechanism in learning process. However, most of these studies provide poor support for semantically structured resources,…
MIPS: analysis and annotation of genome information in 2007.

Science.gov (United States)

Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

2008-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
The Universal Protein Resource (UniProt): an expanding universe of protein information.

Science.gov (United States)

Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

2006-01-01

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
Plant protein annotation in the UniProt Knowledgebase.

Science.gov (United States)

Schneider, Michel; Bairoch, Amos; Wu, Cathy H; Apweiler, Rolf

2005-05-01

The Swiss-Prot, TrEMBL, Protein Information Resource (PIR), and DNA Data Bank of Japan (DDBJ) protein database activities have united to form the Universal Protein Resource (UniProt) Consortium. UniProt presents three database layers: the UniProt Archive, the UniProt Knowledgebase (UniProtKB), and the UniProt Reference Clusters. The UniProtKB consists of two sections: UniProtKB/Swiss-Prot (fully manually curated entries) and UniProtKB/TrEMBL (automated annotation, classification and extensive cross-references). New releases are published fortnightly. A specific Plant Proteome Annotation Program (http://www.expasy.org/sprot/ppap/) was initiated to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Through UniProt, our aim is to provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information that will allow the plant community to fully explore and utilize the wealth of information available for both plant and non-plant model organisms.
Essential Annotation Schema for Ecology (EASE)-A framework supporting the efficient data annotation and faceted navigation in ecology.

Science.gov (United States)

Pfaff, Claas-Thido; Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

2017-01-01

Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.
Essential Annotation Schema for Ecology (EASE-A framework supporting the efficient data annotation and faceted navigation in ecology.

Directory of Open Access Journals (Sweden)

Claas-Thido Pfaff

Full Text Available Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.
Estimating the annotation error rate of curated GO database sequence annotations

Directory of Open Access Journals (Sweden)

Brown Alfred L

2007-05-01

Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.
ACID: annotation of cassette and integron data

Directory of Open Access Journals (Sweden)

Stokes Harold W

2009-04-01

Full Text Available Abstract Background Although integrons and their associated gene cassettes are present in ~10% of bacteria and can represent up to 3% of the genome in which they are found, very few have been properly identified and annotated in public databases. These genetic elements have been overlooked in comparison to other vectors that facilitate lateral gene transfer between microorganisms. Description By automating the identification of integron integrase genes and of the non-coding cassette-associated attC recombination sites, we were able to assemble a database containing all publicly available sequence information regarding these genetic elements. Specialists manually curated the database and this information was used to improve the automated detection and annotation of integrons and their encoded gene cassettes. ACID (annotation of cassette and integron data can be searched using a range of queries and the data can be downloaded in a number of formats. Users can readily annotate their own data and integrate it into ACID using the tools provided. Conclusion ACID is a community resource providing easy access to annotations of integrons and making tools available to detect them in novel sequence data. ACID also hosts a forum to prompt integron-related discussion, which can hopefully lead to a more universal definition of this genetic element.
Annotation of the protein coding regions of the equine genome

DEFF Research Database (Denmark)

Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

2015-01-01

Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...
Book Reviews, Annotation, and Web Technology.

Science.gov (United States)

Schulze, Patricia

From reading texts to annotating web pages, grade 6-8 students rely on group cooperation and individual reading and writing skills in this research project that spans six 50-minute lessons. Student objectives for this project are that they will: read, discuss, and keep a journal on a book in literature circles; understand the elements of and…
H2DB: a heritability database across multiple species by annotating trait-associated genomic loci.

Science.gov (United States)

Kaminuma, Eli; Fujisawa, Takatomo; Tanizawa, Yasuhiro; Sakamoto, Naoko; Kurata, Nori; Shimizu, Tokurou; Nakamura, Yasukazu

2013-01-01

H2DB (http://tga.nig.ac.jp/h2db/), an annotation database of genetic heritability estimates for humans and other species, has been developed as a knowledge database to connect trait-associated genomic loci. Heritability estimates have been investigated for individual species, particularly in human twin studies and plant/animal breeding studies. However, there appears to be no comprehensive heritability database for both humans and other species. Here, we introduce an annotation database for genetic heritabilities of various species that was annotated by manually curating online public resources in PUBMED abstracts and journal contents. The proposed heritability database contains attribute information for trait descriptions, experimental conditions, trait-associated genomic loci and broad- and narrow-sense heritability specifications. Annotated trait-associated genomic loci, for which most are single-nucleotide polymorphisms derived from genome-wide association studies, may be valuable resources for experimental scientists. In addition, we assigned phenotype ontologies to the annotated traits for the purposes of discussing heritability distributions based on phenotypic classifications.
Essential Annotation Schema for Ecology (EASE)—A framework supporting the efficient data annotation and faceted navigation in ecology

Science.gov (United States)

Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

2017-01-01

Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines. PMID:29023519
Using literature and data to learn Bayesian networks as clinical models of ovarian tumors.

Science.gov (United States)

Antal, Peter; Fannes, Geert; Timmerman, Dirk; Moreau, Yves; De Moor, Bart

2004-03-01

Thanks to its increasing availability, electronic literature has become a potential source of information for the development of complex Bayesian networks (BN), when human expertise is missing or data is scarce or contains much noise. This opportunity raises the question of how to integrate information from free-text resources with statistical data in learning Bayesian networks. Firstly, we report on the collection of prior information resources in the ovarian cancer domain, which includes "kernel" annotations of the domain variables. We introduce methods based on the annotations and literature to derive informative pairwise dependency measures, which are derived from the statistical cooccurrence of the names of the variables, from the similarity of the "kernel" descriptions of the variables and from a combined method. We perform wide-scale evaluation of these text-based dependency scores against an expert reference and against data scores (the mutual information (MI) and a Bayesian score). Next, we transform the text-based dependency measures into informative text-based priors for Bayesian network structures. Finally, we report the benefit of such informative text-based priors on the performance of a Bayesian network for the classification of ovarian tumors from clinical data.
Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature.

Directory of Open Access Journals (Sweden)

Ibrahim Burak Ozyurt

Full Text Available The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web. RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual
Plant Protein Annotation in the UniProt Knowledgebase1

Science.gov (United States)

Schneider, Michel; Bairoch, Amos; Wu, Cathy H.; Apweiler, Rolf

2005-01-01

The Swiss-Prot, TrEMBL, Protein Information Resource (PIR), and DNA Data Bank of Japan (DDBJ) protein database activities have united to form the Universal Protein Resource (UniProt) Consortium. UniProt presents three database layers: the UniProt Archive, the UniProt Knowledgebase (UniProtKB), and the UniProt Reference Clusters. The UniProtKB consists of two sections: UniProtKB/Swiss-Prot (fully manually curated entries) and UniProtKB/TrEMBL (automated annotation, classification and extensive cross-references). New releases are published fortnightly. A specific Plant Proteome Annotation Program (http://www.expasy.org/sprot/ppap/) was initiated to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Through UniProt, our aim is to provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information that will allow the plant community to fully explore and utilize the wealth of information available for both plant and nonplant model organisms. PMID:15888679

annot8r: GO, EC and KEGG annotation of EST datasets

Directory of Open Access Journals (Sweden)

Schmid Ralf

2008-04-01

Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non
NoGOA: predicting noisy GO annotations using evidences and sparse representation.

Science.gov (United States)

Yu, Guoxian; Lu, Chang; Wang, Jun

2017-07-21

Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Systemic Planning: An Annotated Bibliography and Literature Guide. Exchange Bibliography No. 91.

Science.gov (United States)

Catanese, Anthony James

Systemic planning is an operational approach to using scientific rigor and qualitative judgment in a complementary manner. It integrates rigorous techniques and methods from systems analysis, cybernetics, decision theory, and work programing. The annotated reference sources in this bibliography include those works that have been most influential…
Persuasion: Attitude/Behavior Change. A Selected, Annotated Bibliography.

Science.gov (United States)

Benoit, William L.

Designed for teachers, students and researchers of the psychological dimensions of attitude and behavior change, this annotated bibliography lists books, bibliographies and articles on the subject ranging from general introductions and surveys through specific research studies, and from theoretical position essays to literature reviews. The 42…
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

OpenAIRE

Hamed Hassanzadeh; MohammadReza Keyvanpour

2011-01-01

The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as ...
A robust data-driven approach for gene ontology annotation.

Science.gov (United States)

Li, Yanpeng; Yu, Hong

2014-01-01

Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.
High-performance web services for querying gene and variant annotation.

Science.gov (United States)

Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

2016-05-06

Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.
Low-level radioactive waste technology: a selected, annotated bibliography. [416 references

Energy Technology Data Exchange (ETDEWEB)

Fore, C.S.; Carrier, R.F.; Brewster, R.H.; Hyder, L.K.; Barnes, K.A.

1981-10-01

This annotated bibliography of 416 references represents the third in a series to be published by the Hazardous Materials Information Center containing scientific, technical, economic, and regulatory information relevant to low-level radioactive waste technology. The bibliography focuses on disposal site, environmental transport, and waste treatment studies as well as general reviews on the subject. The publication covers both domestic and foreign literature for the period 1951 to 1981. Major chapters selected are Chemical and Physical Aspects; Container Design and Performance; Disposal Site; Environmental Transport; General Studies and Reviews; Geology, Hydrology, and Site Resources; Regulatory and Economic Aspects; Social Aspects; Transportation Technology; Waste Production; and Waste Treatment. Entries in each of the chapters are further classified as a field study, laboratory study, theoretical study, or general overview involving one or more of these research areas.
Annotated Bibliography of EDGE2D Use

Energy Technology Data Exchange (ETDEWEB)

J.D. Strachan and G. Corrigan

2005-06-24

This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables.
Annotated Bibliography of EDGE2D Use

International Nuclear Information System (INIS)

Strachan, J.D.; Corrigan, G.

2005-01-01

This annotated bibliography is intended to help EDGE2D users, and particularly new users, find existing published literature that has used EDGE2D. Our idea is that a person can find existing studies which may relate to his intended use, as well as gain ideas about other possible applications by scanning the attached tables
Ubiquitous Annotation Systems

DEFF Research Database (Denmark)

Hansen, Frank Allan

2006-01-01

Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...
Facilitating Full-text Access to Biomedical Literature Using Open Access Resources.

Science.gov (United States)

Kang, Hongyu; Hou, Zhen; Li, Jiao

2015-01-01

Open access (OA) resources and local libraries often have their own literature databases, especially in the field of biomedicine. We have developed a method of linking a local library to a biomedical OA resource facilitating researchers' full-text article access. The method uses a model based on vector space to measure similarities between two articles in local library and OA resources. The method achieved an F-score of 99.61%. This method of article linkage and mapping between local library and OA resources is available for use. Through this work, we have improved the full-text access of the biomedical OA resources.
Stakeholder Involvement in Decision Making: A Short Guide to Issues, Approaches and Resources

International Nuclear Information System (INIS)

2015-01-01

Radioactive waste management is embedded in broader societal issues such as the environment, risk management, energy, health policy and sustainability. In all these fields, there is an increasing demand for public involvement and engagement. This 2015 update of Stakeholder Involvement Techniques: Short Guide and Annotated Bibliography, assists practitioners and non-specialists by outlining the steps and issues associated with stakeholder involvement in decision making and by facilitating access to useful online resources (handbooks, toolboxes and case studies). The updated guide has been considerably enriched with experiences since 2004 and includes extensive references to the literature. It is published alongside the release of an online annotated bibliography that will be updated regularly. (authors)
Gray literature: An important resource in systematic reviews.

Science.gov (United States)

Paez, Arsenio

2017-08-01

Systematic reviews aide the analysis and dissemination of evidence, using rigorous and transparent methods to generate empirically attained answers to focused research questions. Identifying all evidence relevant to the research questions is an essential component, and challenge, of systematic reviews. Gray literature, or evidence not published in commercial publications, can make important contributions to a systematic review. Gray literature can include academic papers, including theses and dissertations, research and committee reports, government reports, conference papers, and ongoing research, among others. It may provide data not found within commercially published literature, providing an important forum for disseminating studies with null or negative results that might not otherwise be disseminated. Gray literature may thusly reduce publication bias, increase reviews' comprehensiveness and timeliness, and foster a balanced picture of available evidence. Gray literature's diverse formats and audiences can present a significant challenge in a systematic search for evidence. However, the benefits of including gray literature may far outweigh the cost in time and resource needed to search for it, and it is important for it to be included in a systematic review or review of evidence. A carefully thought out gray literature search strategy may be an invaluable component of a systematic review. This narrative review provides guidance about the benefits of including gray literature in a systematic review, and sources for searching through gray literature. An illustrative example of a search for evidence within gray literature sources is presented to highlight the potential contributions of such a search to a systematic review. Benefits and challenges of gray literature search methods are discussed, and recommendations made. © 2017 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Grey literature: An important resource in systematic reviews.

Science.gov (United States)

Paez, Arsenio

2017-12-21

Systematic reviews aid the analysis and dissemination of evidence, using rigorous and transparent methods to generate empirically attained answers to focused research questions. Identifying all evidence relevant to the research questions is an essential component, and challenge, of systematic reviews. Grey literature, or evidence not published in commercial publications, can make important contributions to a systematic review. Grey literature can include academic papers, including theses and dissertations, research and committee reports, government reports, conference papers, and ongoing research, among others. It may provide data not found within commercially published literature, providing an important forum for disseminating studies with null or negative results that might not otherwise be disseminated. Grey literature may thusly reduce publication bias, increase reviews' comprehensiveness and timeliness and foster a balanced picture of available evidence. Grey literature's diverse formats and audiences can present a significant challenge in a systematic search for evidence. However, the benefits of including grey literature may far outweigh the cost in time and resource needed to search for it, and it is important for it to be included in a systematic review or review of evidence. A carefully thought out grey literature search strategy may be an invaluable component of a systematic review. This narrative review provides guidance about the benefits of including grey literature in a systematic review, and sources for searching through grey literature. An illustrative example of a search for evidence within grey literature sources is presented to highlight the potential contributions of such a search to a systematic review. Benefits and challenges of grey literature search methods are discussed, and recommendations made. © 2017 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
NCBI disease corpus: a resource for disease name recognition and concept normalization.

Science.gov (United States)

Doğan, Rezarta Islamaj; Leaman, Robert; Lu, Zhiyong

2014-02-01

Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH®) or Online Mendelian Inheritance in Man (OMIM®). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency. The public release of the NCBI disease corpus contains 6892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier. We were able to link 91% of the mentions to a single disease concept, while the rest are described as a combination of concepts. In order to help researchers use the corpus to design and test disease identification methods, we have prepared the corpus as training, testing and development sets. To demonstrate its utility, we conducted a benchmarking experiment where we compared three different
GoGene: gene annotation in the fast lane.

Science.gov (United States)

Plake, Conrad; Royer, Loic; Winnenburg, Rainer; Hakenberg, Jörg; Schroeder, Michael

2009-07-01

High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4,000,000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18,000,000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene.
Impingement: an annotated bibliography

International Nuclear Information System (INIS)

Uziel, M.S.; Hannon, E.H.

1979-04-01

This bibliography of 655 annotated references on impingement of aquatic organisms at intake structures of thermal-power-plant cooling systems was compiled from the published and unpublished literature. The bibliography includes references from 1928 to 1978 on impingement monitoring programs; impingement impact assessment; applicable law; location and design of intake structures, screens, louvers, and other barriers; fish behavior and swim speed as related to impingement susceptibility; and the effects of light, sound, bubbles, currents, and temperature on fish behavior. References are arranged alphabetically by author or corporate author. Indexes are provided for author, keywords, subject category, geographic location, taxon, and title
Grey Guide: A Community Driven Open Resource Project in Grey Literature

OpenAIRE

Biagioni, Stefania; Giannini, Silvia

2017-01-01

In December 2013, the GreyGuide Project was formerly launched as an online forum and repository of good practice in grey literature. The GreyGuide manages Open Source Repositories and provides a unique resource in the field of grey literature that is long awaited and which responds to the information needs of a diverse, international grey literature community. As GreyNet's web access Portal, the GreyGuide now provides a wealth of content that was previously either confined to web pages or was...
Bats and wind energy: a literature synthesis and annotated bibliography

Science.gov (United States)

Ellison, Laura E.

2012-01-01

Turbines have been used to harness energy from wind for hundreds of years. However, with growing concerns about climate change, wind energy has only recently entered the mainstream of global electricity production. Since early on in the development of wind-energy production, concerns have arisen about the potential impacts of turbines to wildlife; these concerns have especially focused on the mortality of birds. Despite recent improvements to turbines that have resulted in reduced mortality of birds, there is clear evidence that bat mortality at wind turbines is of far greater conservation concern. Bats of certain species are dying by the thousands at turbines across North America, and the species consistently affected tend to be those that rely on trees as roosts and most migrate long distances. Turbine-related bat mortalities are now affecting nearly a quarter of all bat species occurring in the United States and Canada. Most documented bat mortality at wind-energy facilities has occurred in late summer and early fall and has involved tree bats, with hoary bats (Lasiurus cinereus) being the most prevalent among fatalities. This literature synthesis and annotated bibliography focuses on refereed journal publications and theses about bats and wind-energy development in North America (United States and Canada). Thirty-six publications and eight theses were found, and their key findings were summarized. These publications date from 1996 through 2011, with the bulk of publications appearing from 2007 to present, reflecting the relatively recent conservation concerns about bats and wind energy. The idea for this Open-File Report formed while organizing a joint U.S. Fish and Wildlife Service/U.S. Geological Survey "Bats and Wind Energy Workshop," on January 25-26, 2012. The purposes of the workshop were to develop a list of research priorities to support decision making concerning bats with respect to siting and operations of wind-energy facilities across the United

Workshops for the Handicapped; An Annotated Bibliography - No. 6.

Science.gov (United States)

Perkins, Dorothy C., Comp.; And Others

An annotated bibliography of workshops for the handicapped covers the literature on work programs for the period July, 1968 through June, 1969. One hundred and fifty four publications were reviewed; the number of articles on administration, management, and planning of facilities and programs has increased since the last edition. (Author/RJ)
Entrainment: an annotated bibliography. Interim report

International Nuclear Information System (INIS)

Carrier, R.F.; Hannon, E.H.

1979-04-01

The 604 annotated references in this bibliography on the effects of pumped entrainment of aquatic organisms through the cooling systems of thermal power plants were compiled from published and unpublished literature and cover the years 1947 through 1977. References to published literature were obtained by searching large-scale commercial data bases, ORNL in-house-generated data bases, relevant journals, and periodical bibliographies. The unpublished literature is a compilation of Sections 316(a) and 316(b) demonstrations, environmental impact statements, and environmental reports prepared by the utilities in compliance with Federal Water Pollution Control Administration regulations. The bibliography includes references on monitoring studies at power plant sites, laboratory studies of physical and biological effects on entrained organisms, engineering strategies for the mitigation of entrainment effects, and selected theoretical studies concerned with the methodology for determining entrainment effects
3D facial landmarks: Inter-operator variability of manual annotation

DEFF Research Database (Denmark)

Fagertun, Jens; Harder, Stine; Rosengren, Anders

2014-01-01

Background Manual annotation of landmarks is a known source of variance, which exist in all fields of medical imaging, influencing the accuracy and interpretation of the results. However, the variability of human facial landmarks is only sparsely addressed in the current literature as opposed to ...
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.

Science.gov (United States)

Brister, James Rodney; Bao, Yiming; Kuiken, Carla; Lefkowitz, Elliot J; Le Mercier, Philippe; Leplae, Raphael; Madupu, Ramana; Scheuermann, Richard H; Schobel, Seth; Seto, Donald; Shrivastava, Susmita; Sterk, Peter; Zeng, Qiandong; Klimke, William; Tatusova, Tatiana

2010-10-01

Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

Directory of Open Access Journals (Sweden)

Qiandong Zeng

2010-10-01

Full Text Available Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
Annotated Bibliography on School Finance: Policy and Political Issues; Federal Government; State Issues; Non-Public Schools; Accountability.

Science.gov (United States)

Gipson, Joella

Limited to periodical literature, this annotated bibliography on school finance contains 81 references grouped in 5 categories: (1) policy and politica issues, (2) federal government, (3) state issues, (4) aid to nonpublic schools, and (5) accountability. Following the bibliographic citations, annotations range from 4 to 15 lines and conclude by…
Grey Literature Searching for Health Sciences Systematic Reviews: A Prospective Study of Time Spent and Resources Utilized.

Science.gov (United States)

Saleh, Ahlam A; Ratajeski, Melissa A; Bertolet, Marnie

To identify estimates of time taken to search grey literature in support of health sciences systematic reviews and to identify searcher or systematic review characteristics that may impact resource selection or time spent searching. A survey was electronically distributed to searchers embarking on a new systematic review. Characteristics of the searcher and systematic review were collected along with time spent searching and what resources were searched. Time and resources were tabulated and resources were categorized as grey or non-grey. Data was analyzed using Kruskal-Wallis tests. Out of 81 original respondents, 21% followed through with completion of the surveys in their entirety. The median time spent searching all resources was 471 minutes, and of those a median of 85 minutes were spent searching grey literature. The median number of resources used in a systematic review search was four and the median number of grey literature sources searched was two. The amount of time spent searching was influenced by whether the systematic review was grant funded. Additionally, the number of resources searched was impacted by institution type and whether systematic review training was received. This study characterized the amount of time for conducting systematic review searches including searching the grey literature, in addition to the number and types of resources used. This may aid searchers in planning their time, along with providing benchmark information for future studies. This paper contributes by quantifying current grey literature search patterns and associating them with searcher and review characteristics. Further discussion and research into the search approach for grey literature in support of systematic reviews is encouraged.
Semi-Semantic Annotation: A guideline for the URDU.KON-TB treebank POS annotation

Directory of Open Access Journals (Sweden)

Qaiser ABBAS

2016-12-01

Full Text Available This work elaborates the semi-semantic part of speech annotation guidelines for the URDU.KON-TB treebank: an annotated corpus. A hierarchical annotation scheme was designed to label the part of speech and then applied on the corpus. This raw corpus was collected from the Urdu Wikipedia and the Jang newspaper and then annotated with the proposed semi-semantic part of speech labels. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. This exercise finally contributed a part of speech annotation to the URDU.KON-TB treebank. Twenty-two main part of speech categories are divided into subcategories, which conclude the morphological, and semantical information encoded in it. This article reports the annotation guidelines in major; however, it also briefs the development of the URDU.KON-TB treebank, which includes the raw corpus collection, designing & employment of annotation scheme and finally, its statistical evaluation and results. The guidelines presented as follows, will be useful for linguistic community to annotate the sentences not only for the national language Urdu but for the other indigenous languages like Punjab, Sindhi, Pashto, etc., as well.
Chado controller: advanced annotation management with a community annotation system.

Science.gov (United States)

Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie

2012-04-01

We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary data are available at Bioinformatics online.
Discourse Analysis in Stylistics and Literature Instruction.

Science.gov (United States)

Short, Mick

1990-01-01

A review of research regarding discourse analysis in stylistics and literature instruction covers studies of text, systematic analysis, meaning, style, literature pedagogy, and applied linguistics. A 10-citation annotated bibliography and a larger unannotated bibliography are included. (CB)
Annotated bibliography of highly ionized atoms of importance to plasmas

International Nuclear Information System (INIS)

Schmieder, R.W.

1975-04-01

A bibliography is presented of the literature on highly ionized atoms which have relevance to plasmas. The bibliography is annotated with keywords, and indexed by subjects and authors. It should be of greatest use to researchers working on the problems of impurity cooling and diagnostics of CTR plasmas. (U.S.)
Use of Annotations for Component and Framework Interoperability

Science.gov (United States)

David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

2009-12-01

western United States at the USDA NRCS National Water and Climate Center. PRMS is a component based modular precipitation-runoff model developed to evaluate the impacts of various combinations of precipitation, climate, and land use on streamflow and general basin hydrology. The new OMS 3.0 PRMS model source code is more concise and flexible as a result of using the new framework’s annotation based approach. The fully annotated components are now providing information directly for (i) model assembly and building, (ii) dataflow analysis for implicit multithreading, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Experience to date has demonstrated the multi-purpose value of using annotations. Annotations are also a feasible and practical method to enable interoperability among models and modeling frameworks. As a prototype example, model code annotations were used to generate binding and mediation code to allow the use of OMS 3.0 model components within the OpenMI context.
HUMAN RESOURCE MANAGEMENT AND CORPORATE SOCIAL RESPONSIBILITY: A SYSTEMATIC LITERATURE REVIEW

OpenAIRE

FERREIRA, ELİZABETH REAL DE OLIVEIRA – PEDRO; SAUR, IRİNA; AMARAL,

2013-01-01

We perform a systematic literature review on academic papers in Human Resources Management and Corporate Social Responsibility in ISI Current Contents. Based on 117 academic papers from 2001 to date, we perform content analysis in a grounded-theory methodological approach and map the field of Human Resources Management and Corporate Social Responsibility, identifying main schools of thought (invisible colleges) and main players. We see a tendency to increase publications from 2008 onwards. We...
Grass buffers for playas in agricultural landscapes: An annotated bibliography

Science.gov (United States)

Melcher, Cynthia P.; Skagen, Susan K.

2005-01-01

This bibliography and associated literature synthesis (Melcher and Skagen, 2005) was developed for the Playa Lakes Joint Venture (PLJV). The PLJV sought compilation and annotation of the literature on grass buffers for protecting playas from runoff containing sediments, nutrients, pesticides, and other contaminants. In addition, PLJV sought information regarding the extent to which buffers may attenuate the precipitation runoff needed to fill playas, and avian use of buffers. We emphasize grass buffers, but we also provide information on other buffer types.
Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

Science.gov (United States)

Liu, Wuying; Wang, Lin

2018-03-01

The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.
Model and Interoperability using Meta Data Annotations

Science.gov (United States)

David, O.

2011-12-01

Software frameworks and architectures are in need for meta data to efficiently support model integration. Modelers have to know the context of a model, often stepping into modeling semantics and auxiliary information usually not provided in a concise structure and universal format, consumable by a range of (modeling) tools. XML often seems the obvious solution for capturing meta data, but its wide adoption to facilitate model interoperability is limited by XML schema fragmentation, complexity, and verbosity outside of a data-automation process. Ontologies seem to overcome those shortcomings, however the practical significance of their use remains to be demonstrated. OMS version 3 took a different approach for meta data representation. The fundamental building block of a modular model in OMS is a software component representing a single physical process, calibration method, or data access approach. Here, programing language features known as Annotations or Attributes were adopted. Within other (non-modeling) frameworks it has been observed that annotations lead to cleaner and leaner application code. Framework-supported model integration, traditionally accomplished using Application Programming Interfaces (API) calls is now achieved using descriptive code annotations. Fully annotated components for various hydrological and Ag-system models now provide information directly for (i) model assembly and building, (ii) data flow analysis for implicit multi-threading or visualization, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, calibration, and optimization, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Such a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework but a strong reference to its originating code. Since models and
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

Science.gov (United States)

Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard

2014-01-01

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.
Annotated bibliography of economic literature on wetlands

Science.gov (United States)

Douglas, Aaron J.

1989-01-01

This bibliography is intended for the use of wetlands scientists, policy analysts, and natural resource professionals who have little acquaintance with natural resource economics, and natural resource professionals who have some background in economic analysis and wish to sharpen their appreciation of the specialized methods used to value the nonmarket uses of wetland resources. It is not intended to serve as a first primer of natural resource economics. The purpose of including this discussion is to introduce the reader to the fact that specialized language and analytic techniques are used in this field, and that summary discussion of these techniques are not available in introductory or intermediate level economics textbooks. A key difficulty in economic analysis lies in the need that economists have to express common-sense terms such as "demand" or "supply" in a precise way; this facilitates the interpretation of data and is a powerful aid in making internally consistent, policy analysis. Natural resource economists would like to find a consistent, intuitively plausible measure of the social benefits conferred by some good or service. The most common fallacy noneconomists make in this field is to use expenditures as a measure of well-being or benefits. This measure is defective; expenditures may rise, while benefits fall. The following simple example should clarify the issue. Suppose that a certain population center, in the 1940's, is located 5 miles from a riverine recreation site. Suppose that a factory opens up 15 miles away from the site during the 1950's, and closes at the end of the 1960's; and that during this 20-year period, the bulk of this region's populace resides 15 miles from the site, close to the factory. In the 1970's, the populace of the region returns to the old population center, 5 miles from the recreation site. The benefits conferred by the site diminished during the 1950's and 1960's, even though travel (and even total) expenditures
An Annotated Checklist of the Mammals of Kuwait

Directory of Open Access Journals (Sweden)

Peter J. Cowan

2013-12-01

Full Text Available An annotated checklist of the mammals of Kuwait is presented, based on the literature, personal communications, a Kuwait website and a blog and the author’s observations. Twenty five species occur, a further four are uncommon or rare visitors, six used to occur whilst another two are of doubtful provenance. This list should assist those planning desert rehabilitation, animal reintroduction and protected area projects in Kuwait.
Multi-label literature classification based on the Gene Ontology graph

Directory of Open Access Journals (Sweden)

Lu Xinghua

2008-12-01

Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

Exploring and linking biomedical resources through multidimensional semantic spaces.

Science.gov (United States)

Berlanga, Rafael; Jiménez-Ruiz, Ernesto; Nebot, Victoria

2012-01-25

The semantic integration of biomedical resources is still a challenging issue which is required for effective information processing and data analysis. The availability of comprehensive knowledge resources such as biomedical ontologies and integrated thesauri greatly facilitates this integration effort by means of semantic annotation, which allows disparate data formats and contents to be expressed under a common semantic space. In this paper, we propose a multidimensional representation for such a semantic space, where dimensions regard the different perspectives in biomedical research (e.g., population, disease, anatomy and protein/genes). This paper presents a novel method for building multidimensional semantic spaces from semantically annotated biomedical data collections. This method consists of two main processes: knowledge and data normalization. The former one arranges the concepts provided by a reference knowledge resource (e.g., biomedical ontologies and thesauri) into a set of hierarchical dimensions for analysis purposes. The latter one reduces the annotation set associated to each collection item into a set of points of the multidimensional space. Additionally, we have developed a visual tool, called 3D-Browser, which implements OLAP-like operators over the generated multidimensional space. The method and the tool have been tested and evaluated in the context of the Health-e-Child (HeC) project. Automatic semantic annotation was applied to tag three collections of abstracts taken from PubMed, one for each target disease of the project, the Uniprot database, and the HeC patient record database. We adopted the UMLS Meta-thesaurus 2010AA as the reference knowledge resource. Current knowledge resources and semantic-aware technology make possible the integration of biomedical resources. Such an integration is performed through semantic annotation of the intended biomedical data resources. This paper shows how these annotations can be exploited for
Annotation of nerve cord transcriptome in earthworm Eisenia fetida

Directory of Open Access Journals (Sweden)

Vasanthakumar Ponesakki

2017-12-01

Full Text Available In annelid worms, the nerve cord serves as a crucial organ to control the sensory and behavioral physiology. The inadequate genome resource of earthworms has prioritized the comprehensive analysis of their transcriptome dataset to monitor the genes express in the nerve cord and predict their role in the neurotransmission and sensory perception of the species. The present study focuses on identifying the potential transcripts and predicting their functional features by annotating the transcriptome dataset of nerve cord tissues prepared by Gong et al., 2010 from the earthworm Eisenia fetida. Totally 9762 transcripts were successfully annotated against the NCBI nr database using the BLASTX algorithm and among them 7680 transcripts were assigned to a total of 44,354 GO terms. The conserve domain analysis indicated the over representation of P-loop NTPase domain and calcium binding EF-hand domain. The COG functional annotation classified 5860 transcript sequences into 25 functional categories. Further, 4502 contig sequences were found to map with 124 KEGG pathways. The annotated contig dataset exhibited 22 crucial neuropeptides having considerable matches to the marine annelid Platynereis dumerilii, suggesting their possible role in neurotransmission and neuromodulation. In addition, 108 human stem cell marker homologs were identified including the crucial epigenetic regulators, transcriptional repressors and cell cycle regulators, which may contribute to the neuronal and segmental regeneration. The complete functional annotation of this nerve cord transcriptome can be further utilized to interpret genetic and molecular mechanisms associated with neuronal development, nervous system regeneration and nerve cord function.
Historical Slovenian Language Resources

Directory of Open Access Journals (Sweden)

Tomaž Erjavec

2013-09-01

Full Text Available EXTENDED ABSTRACT:The paper presents three language resources enabling better full-text access to digitised printed historical Slovenian texts: a hand-annotated corpus, a hand-annotated lexicon of historical words and a collection of transcribed texts. The aim of the resources is twofold: on one hand they support empirical linguistic research (corpus, collection and represent a reference tool for the research of historical Slovenian (lexicon while on the other hand they may serve as training data for the development of Human Language Technologies enabling better full-text search in digital libraries containing Slovenian written cultural heritage, modernisation of historical texts, and the development of better technological solutions for text recognition and scanning. The hand annotated corpus of historical Slovenian contains the text from 1,000 pages sampled from the years 1750 to 1900, two texts date to the end of the 16th or 17th century. The corpus contains a little more than 250,000 word tokens; each of them being annotated with hand validated linguistic features: modernised form, lemma or base form, and morhpo-syntactic description. Thus the word token »ajfram« is annotated with the normalised form »ajfrom«, by the lemma »ajfer« and morphosyntactic description »Som« or »Samostalnik« (noun, »občni« (common, »moški« (masculine and a modernised form »gorečnost« (fervour. At first the corpus was annotated automatically and then manually verified and corrected. The lexicon was created automatically from the hand-annotated corpus. It contains only attested word-forms and examples of use. The word-forms are ordered under their modern equivalents. All the modern forms of a particular word constitute a dictionary entry, defined by its lemma with conjoint information i.e. the morpho-syntactic description and the closest contemporary synonyms. Thus the entry »ajfrer/Som/gorečnost« is annotated by two modernised words »ajfra �
MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

Directory of Open Access Journals (Sweden)

Shu-Chuan Chen

Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.
Software engineering laboratory series: Annotated bibliography of software engineering laboratory literature

Science.gov (United States)

Morusiewicz, Linda; Valett, Jon

1992-01-01

This document is an annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory. More than 100 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. This document has been updated and reorganized substantially since the original version (SEL-82-006, November 1982). All materials have been grouped into eight general subject areas for easy reference: (1) the Software Engineering Laboratory; (2) the Software Engineering Laboratory: Software Development Documents; (3) Software Tools; (4) Software Models; (5) Software Measurement; (6) Technology Evaluations; (7) Ada Technology; and (8) Data Collection. This document contains an index of these publications classified by individual author.
Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

Directory of Open Access Journals (Sweden)

Iomdin Leonid

2017-12-01

Full Text Available Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.
Stakeholder Confidence and Radioactive Waste management - An annotated glossary of key terms

International Nuclear Information System (INIS)

Martell, Meritxell; Pescatore, Claudio; Mays, Claire

2013-01-01

The OECD Nuclear Energy Agency (NEA) Forum on Stakeholder Confidence (FSC) Annotated Glossary is a review of concepts central to societal decision making about radioactive waste management. It records the evolution in understanding that has taken place in the group as the FSC has worked with these concepts over time. This should be a useful resource not only for new FSC participants but also for others: this annotated glossary forms a good reference handbook for future texts regarding societal aspects of radioactive waste management and its governance. Each glossary entry is structured, to the extent possible, as follows: - The term and its variants, if any, in FSC literature are identified. - The common FSC understanding of the concept and any guidance are captured, based upon a review of all FSC documents to date. - Any evolution of the concept observed over the decade of FSC work is analysed. - The FSC interpretation of the symbolic dimension is explored. - The current status of outlook in the FSC, and intended activities according to the current Programme of Work (2010 and beyond) are assessed. Overall, although different persons and groups may assign different meanings to words, and although terminology will continue to evolve, this glossary is the FSC's 'state-of-the-art' guide to key terms in use. As such, it should prove to be a handy reference for all those interested in the governance of radioactive waste management
Promoting positive parenting: an annotated bibliography.

Science.gov (United States)

Ahmann, Elizabeth

2002-01-01

Positive parenting is built on respect for children and helps develop self-esteem, inner discipline, self-confidence, responsibility, and resourcefulness. Positive parenting is also good for parents: parents feel good about parenting well. It builds a sense of dignity. Positive parenting can be learned. Understanding normal development is a first step, so that parents can distinguish common behaviors in a stage of development from "problems." Central to positive parenting is developing thoughtful approaches to child guidance that can be used in place of anger, manipulation, punishment, and rewards. Support for developing creative and loving approaches to meet special parenting challenges, such as temperament, disabilities, separation and loss, and adoption, is sometimes necessary as well. This annotated bibliography offers resources to professionals helping parents and to parents wishing to develop positive parenting skills.
Evaluating Hierarchical Structure in Music Annotations.

Science.gov (United States)

McFee, Brian; Nieto, Oriol; Farbood, Morwaread M; Bello, Juan Pablo

2017-01-01

Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for "flat" descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.
Evaluating Hierarchical Structure in Music Annotations

Directory of Open Access Journals (Sweden)

Brian McFee

2017-08-01

Full Text Available Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR, it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for “flat” descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.
Impingement and entrainment: an updated annotated bibliography. Final report

International Nuclear Information System (INIS)

Yost, F.E.; Uziel, M.S.

1981-05-01

Presented as an annotated bibliography are 1343 references dealing with entrainment and impingement effects on aquatic organisms passing through the cooling systems of thermal power plants. The references were obtained from open literature and from environmental reports and impact statements prepared by or for the electric utility industry. Two earlier bibliographies contain literature from 1950 through 1976. This update contains additional literature acquired since 1976. Topics covered are site-specific field studies at facilities located on lakes, reservoirs, rivers, or estuaries. The studies include special engineering studies, laboratory studies, studies of biological effects, reviews and methodologies, and studies of the mitigation of effects. References are arranged alphabetically by author, and indexes are provided to personal and corporate authors, and to facility, waterbody, and taxonomic names
The Universal Protein Resource (UniProt)

Data.gov (United States)

U.S. Department of Health & Human Services — The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase...
Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

Science.gov (United States)

Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

2012-01-01

Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

Science.gov (United States)

Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

2012-01-01

Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832
Reasoning with Annotations of Texts

OpenAIRE

Ma , Yue; Lévy , François; Ghimire , Sudeep

2011-01-01

International audience; Linguistic and semantic annotations are important features for text-based applications. However, achieving and maintaining a good quality of a set of annotations is known to be a complex task. Many ad hoc approaches have been developed to produce various types of annotations, while comparing those annotations to improve their quality is still rare. In this paper, we propose a framework in which both linguistic and domain information can cooperate to reason with annotat...
GAS METHANE HYDRATES-RESEARCH STATUS, ANNOTATED BIBLIOGRAPHY, AND ENERGY IMPLICATIONS

Energy Technology Data Exchange (ETDEWEB)

James Sorensen; Jaroslav Solc; Bethany Bolles

2000-07-01

The objective of this task as originally conceived was to compile an assessment of methane hydrate deposits in Alaska from available sources and to make a very preliminary evaluation of the technical and economic feasibility of producing methane from these deposits for remote power generation. Gas hydrates have recently become a target of increased scientific investigation both from the standpoint of their resource potential to the natural gas and oil industries and of their positive and negative implications for the global environment After we performed an extensive literature review and consulted with representatives of the U.S. Geological Survey (USGS), Canadian Geological Survey, and several oil companies, it became evident that, at the current stage of gas hydrate research, the available information on methane hydrates in Alaska does not provide sufficient grounds for reaching conclusions concerning their use for energy production. Hence, the original goals of this task could not be met, and the focus was changed to the compilation and review of published documents to serve as a baseline for possible future research at the Energy & Environmental Research Center (EERC). An extensive annotated bibliography of gas hydrate publications has been completed. The EERC will reassess its future research opportunities on methane hydrates to determine where significant initial contributions could be made within the scope of limited available resources.
Selected Resources and Bibliography

Science.gov (United States)

New Directions for Higher Education, 2011

2011-01-01

This chapter provides an annotated bibliography of resources pertaining to international branch campuses (IBCs). This collection of references has been selected to represent the breadth of emerging scholarship on cross-border higher education and is intended to provide further resources on a range of concerns surrounding cross-border higher…
UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

Science.gov (United States)

Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Schneider, Michel; Bansal, Parit; Bridge, Alan J; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis

2016-01-01

The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.
Desiderata for ontologies to be used in semantic annotation of biomedical documents.

Science.gov (United States)

Bada, Michael; Hunter, Lawrence

2011-02-01

A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to translate text into unambiguous formal representations grounded in high-quality consensus ontologies, and these systems in turn rely on gold-standard corpora of annotated documents for training and testing. To this end, we are constructing the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-text biomedical journal articles that are being manually annotated with the entire sets of terms from select vocabularies, predominantly from the Open Biomedical Ontologies (OBO) library. Our efforts in building this corpus has illuminated infelicities of these ontologies with respect to the semantic annotation of biomedical documents, and we propose desiderata whose implementation could substantially improve their utility in this task; these include the integration of overlapping terms across OBOs, the resolution of OBO-specific ambiguities, the integration of the BFO with the OBOs and the use of mid-level ontologies, the inclusion of noncanonical instances, and the expansion of relations and realizable entities. Copyright © 2010 Elsevier Inc. All rights reserved.
Annotation of toponyms in TEI digital literary editions and linking to the web of data

Directory of Open Access Journals (Sweden)

Frontini, Francesca

2016-07-01

Full Text Available This paper aims to discuss the challenges and benefits of the annotation of place names in literary texts and literary criticism. We shall first highlight the problems of encoding spatial information in digital editions using the TEI format by means of two manual annotation experiments and the discussion of various cases. This will lead to the question of how to use existing semantic web resources to complement and enrich toponym mark-up, in particular to provide mentions with precise georeferencing. Finally the automatic annotation of a large corpus will show the potential of visualizing places from texts, by illustrating an analysis of the evolution of literary life from the spatial and geographical point of view.

Students' Resources for Stance-Taking in the Literature Classroom

DEFF Research Database (Denmark)

Kabel, Kristine

of specialization within LCT (Maton, 2007, 2010), my analyses show a variation in students’ interpersonal meaning-making choices, linking their literary response texts within the same task to either primarily a knower or a knowledge code. This variation suggests a tension in the literature education at this time......Making aspects of privileged ways of participating visible is central to support students’ literacy development within different educational disciplines (Hasan, 1996, 2011). In my doctoral work I focus on the discipline of literature in lower secondary school in the school subject of Danish......, exploring students’ resources for stance-taking in their written literary response texts. In my presentation on Friday I will outline the theoretical grounding of the study and the preliminary findings. Drawing on the appraisal system within SFL (Hood, 2011; Martin & White, 2005) and the dimension...
Data Acquisition and Linguistic Resources

Science.gov (United States)

Strassel, Stephanie; Christianson, Caitlin; McCary, John; Staderman, William; Olive, Joseph

All human language technology demands substantial quantities of data for system training and development, plus stable benchmark data to measure ongoing progress. While creation of high quality linguistic resources is both costly and time consuming, such data has the potential to profoundly impact not just a single evaluation program but language technology research in general. GALE's challenging performance targets demand linguistic data on a scale and complexity never before encountered. Resources cover multiple languages (Arabic, Chinese, and English) and multiple genres -- both structured (newswire and broadcast news) and unstructured (web text, including blogs and newsgroups, and broadcast conversation). These resources include significant volumes of monolingual text and speech, parallel text, and transcribed audio combined with multiple layers of linguistic annotation, ranging from word aligned parallel text and Treebanks to rich semantic annotation.
Women's Suffrage: A Sampler of ERIC Resources.

Science.gov (United States)

Risinger, C. Frederick

1995-01-01

Discusses the resources available from the ERIC System on issues related to women's suffrage and women's rights. Includes an annotated bibliography of six resources, including lesson plans and historiography reviews. (CFR)
SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data

Science.gov (United States)

Talo, Francesco; Ide-Smith, Michele; Gobeill, Julien; Carter, Jacob; Batista-Navarro, Riza; Ananiadou, Sophia; Ruch, Patrick; McEntyre, Johanna

2017-01-01

The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts. As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data. PMID:28948232
Predicting word sense annotation agreement

DEFF Research Database (Denmark)

Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier

2015-01-01

High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty...
Additional Resources on Asian Americans.

Science.gov (United States)

Kodama, Corinne Maekawa; Lee, Sunny; Liang, Christopher T. H.; Alvarez, Alvin N.; McEwen, Marylu K.

2002-01-01

The authors identify Asian American associations and organizations, academic journals, periodicals, and media resources. Selected annotated resources on Asian American activism and politics, counseling and psychology, educational issues, gender and sexual orientation, history, policy reports, and racial and ethnic identity are also included.…
Questionnaires for research: an annotated bibliography on design, construction, and use.

Science.gov (United States)

Dale R. Potter; Kathryn M. Sharpe; John C. Hendee; Roger N. Clark

1972-01-01

Questionnaires as social science tools are used increasingly to study people aspects of outdoor recreation and other natural resource fields. An annotated bibliography including subjective evaluations of each article and a keyword list is presented for 193 references to aid researchers and managers in the design, construction, and use of mail questionnaires.
CommWalker: correctly evaluating modules in molecular networks in light of annotation bias.

Science.gov (United States)

Luecken, M D; Page, M J T; Crosby, A J; Mason, S; Reinert, G; Deane, C M

2018-03-15

Detecting novel functional modules in molecular networks is an important step in biological research. In the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as we show, the uneven distribution of functional annotations means that such evaluation methods favor communities of well-studied proteins. We propose a novel framework for the evaluation of communities as functional modules. Our proposed framework, CommWalker, takes communities as inputs and evaluates them in their local network environment by performing short random walks. We test CommWalker's ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks. We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods. Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation. CommWalker community prioritization both faithfully captures well-validated communities and identifies functional modules that may correspond to more novel biology. The CommWalker algorithm is freely available at opig.stats.ox.ac.uk/resources or as a docker image on the Docker Hub at hub.docker.com/r/lueckenmd/commwalker/. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Alignment-Annotator web server: rendering and annotating sequence alignments.

Science.gov (United States)

Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

2014-07-01

Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Similarity maps and hierarchical clustering for annotating FT-IR spectral images.

Science.gov (United States)

Zhong, Qiaoyong; Yang, Chen; Großerüschkamp, Frederik; Kallenbach-Thieltges, Angela; Serocka, Peter; Gerwert, Klaus; Mosig, Axel

2013-11-20

Unsupervised segmentation of multi-spectral images plays an important role in annotating infrared microscopic images and is an essential step in label-free spectral histopathology. In this context, diverse clustering approaches have been utilized and evaluated in order to achieve segmentations of Fourier Transform Infrared (FT-IR) microscopic images that agree with histopathological characterization. We introduce so-called interactive similarity maps as an alternative annotation strategy for annotating infrared microscopic images. We demonstrate that segmentations obtained from interactive similarity maps lead to similarly accurate segmentations as segmentations obtained from conventionally used hierarchical clustering approaches. In order to perform this comparison on quantitative grounds, we provide a scheme that allows to identify non-horizontal cuts in dendrograms. This yields a validation scheme for hierarchical clustering approaches commonly used in infrared microscopy. We demonstrate that interactive similarity maps may identify more accurate segmentations than hierarchical clustering based approaches, and thus are a viable and due to their interactive nature attractive alternative to hierarchical clustering. Our validation scheme furthermore shows that performance of hierarchical two-means is comparable to the traditionally used Ward's clustering. As the former is much more efficient in time and memory, our results suggest another less resource demanding alternative for annotating large spectral images.
Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus.

Science.gov (United States)

Carroll, Ronan K; Weiss, Andy; Broach, William H; Wiemels, Richard E; Mogen, Austin B; Rice, Kelly C; Shaw, Lindsey N

2016-02-09

In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions. Despite a large number of studies identifying regulatory or small RNA (sRNA) genes in Staphylococcus aureus, their annotation is notably lacking in available genome files. In addition to this, there has been a considerable lack of cross-referencing in the wealth of studies identifying these elements, often leading to the same sRNA being identified multiple times and bearing multiple names. In this work
The BioC-BioGRID corpus: full text articles annotated for curation of protein–protein and genetic interactions

Science.gov (United States)

Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike

2017-01-01

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future
Survey of literature relating to energy development in Utah's Colorado Plateau

Energy Technology Data Exchange (ETDEWEB)

Larsen, A.

1980-06-01

This study examines various energy resources in Utah including oil impregnated rocks (oil shale and oil sand deposits), geothermal, coal, uranium, oil and natural gas in terms of the following dimensions: resurce potential and location; resource technology, development and production status; resource development requirements; potential environmental and socio-economic impacts; and transportation tradeoffs. The advantages of minemouth power plants in comparison to combined cycle or hybrid power plants are also examined. Annotative bibliographies of the energy resources are presented in the appendices. Specific topics summarized in these annotative bibliographies include: economics, environmental impacts, water requirements, production technology, and siting requirements.
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

Science.gov (United States)

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

2013-01-01

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Annotated bibliography of structural equation modelling: technical work.

Science.gov (United States)

Austin, J T; Wolfle, L M

1991-05-01

Researchers must be familiar with a variety of source literature to facilitate the informed use of structural equation modelling. Knowledge can be acquired through the study of an expanding literature found in a diverse set of publishing forums. We propose that structural equation modelling publications can be roughly classified into two groups: (a) technical and (b) substantive applications. Technical materials focus on the procedures rather than substantive conclusions derived from applications. The focus of this article is the former category; included are foundational/major contributions, minor contributions, critical and evaluative reviews, integrations, simulations and computer applications, precursor and historical material, and pedagogical textbooks. After a brief introduction, we annotate 294 articles in the technical category dating back to Sewall Wright (1921).
Resource-efficient supply chains: a research framework, literature review and research agenda

NARCIS (Netherlands)

Matopoulos, A.; Barros, A.C.; Vorst, van der J.G.A.J.

2015-01-01

Purpose – The study aims to define a research agenda for creating resource-efficient supply chains (RESCs) by identifying and analysing their key characteristics as well as future research opportunities. Design/methodology/approach – We follow a systematic review method to analyse the literature and
Climate change and migration : A review of the literature

NARCIS (Netherlands)

O.A. Gómez (Oscar)

2013-01-01

textabstractThe present literature review aims to provide a panoramic view of the different ways in which the link between climate change and migration has been addressed in the existing literature, building on the recent non-annotated bibliography issued by the International Organization for
Literature, Science, and Cooking in the Primary Grades.

Science.gov (United States)

Donoghue, Mildred R.

Since the balanced literacy program presently mandated in California makes literature an integral part of the curriculum and leaves even less time for study of the sciences, this annotated bibliography provides some recommended literature together with the science concepts that evolve from those books. The bibliography also offers cooking recipes…
eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences

OpenAIRE

Huerta-Cepas, J.; Szklarczyk, D.; Forslund, K.; Cook, H.; Heller, D.; Walter, M.C.; Rattei, T.; Mende, D.R.; Sunagawa, S.; Kuhn, M.; Jensen, L.J.; von Mering, C.; Bork, P.

2016-01-01

eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing...
Assessment and management of animal damage in Pacific Northwest forests: an annotated bibliography.

Science.gov (United States)

D.M. Loucks; H.C. Black; M.L. Roush; S.R. Radosevich

1990-01-01

This annotated bibliography of published literature provides a comprehensive source of information on animal damage assessment and management for forest land managers and others in the Pacific Northwest. Citations and abstracts from more than 900 papers are indexed by subject and author. The publication complements and supplements A Silvicultural Approach to...

The Co-regulation Data Harvester: Automating gene annotation starting from a transcriptome database

Science.gov (United States)

Tsypin, Lev M.; Turkewitz, Aaron P.

Identifying co-regulated genes provides a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, a process much slower than genome sequencing per se. Tetrahymena thermophila, a unicellular eukaryote, has been a useful model organism and has a fully sequenced but sparsely annotated genome. One important resource for studying this organism has been an online transcriptomic database. We have developed an automated approach to gene annotation in the context of transcriptome data in T. thermophila, called the Co-regulation Data Harvester (CDH). Beginning with a gene of interest, the CDH identifies co-regulated genes by accessing the Tetrahymena transcriptome database. It then identifies their closely related genes (orthologs) in other organisms by using reciprocal BLAST searches. Finally, it collates the annotations of those orthologs' functions, which provides the user with information to help predict the cellular role of the initial query. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems.
Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study.

Science.gov (United States)

Hochheiser, Harry; Ning, Yifan; Hernandez, Andres; Horn, John R; Jacobson, Rebecca; Boyce, Richard D

2016-04-11

Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader
Electronic Grey Literature in Accelerator Science and Its Allied Subjects : Selected Web Resources for Scientists and Engineers

CERN Document Server

Rajendiran, P

2006-01-01

Grey literature Web resources in the field of accelerator science and its allied subjects are collected for the scientists and engineers of RRCAT (Raja Ramanna Centre for Advanced Technology). For definition purposes the different types of grey literature are described. The Web resources collected and compiled in this article (with an overview and link for each) specifically focus on technical reports, preprints or e-prints, which meet the main information needs of RRCAT users.
Essential Requirements for Digital Annotation Systems

Directory of Open Access Journals (Sweden)

ADRIANO, C. M.

2012-06-01

Full Text Available Digital annotation systems are usually based on partial scenarios and arbitrary requirements. Accidental and essential characteristics are usually mixed in non explicit models. Documents and annotations are linked together accidentally according to the current technology, allowing for the development of disposable prototypes, but not to the support of non-functional requirements such as extensibility, robustness and interactivity. In this paper we perform a careful analysis on the concept of annotation, studying the scenarios supported by digital annotation tools. We also derived essential requirements based on a classification of annotation systems applied to existing tools. The analysis performed and the proposed classification can be applied and extended to other type of collaborative systems.
LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures.

Science.gov (United States)

Ryan, Michael; Diekhans, Mark; Lien, Stephanie; Liu, Yun; Karchin, Rachel

2009-06-01

LS-SNP/PDB is a new WWW resource for genome-wide annotation of human non-synonymous (amino acid changing) SNPs. It serves high-quality protein graphics rendered with UCSF Chimera molecular visualization software. The system is kept up-to-date by an automated, high-throughput build pipeline that systematically maps human nsSNPs onto Protein Data Bank structures and annotates several biologically relevant features. LS-SNP/PDB is available at (http://ls-snp.icm.jhu.edu/ls-snp-pdb) and via links from protein data bank (PDB) biology and chemistry tabs, UCSC Genome Browser Gene Details and SNP Details pages and PharmGKB Gene Variants Downloads/Cross-References pages.
An Annotated Bibliography of Research on Reading and Adults Learning English as a Second Language.

Science.gov (United States)

Burt, Miriam, Comp.; Florez, MaryAnn, Comp.; Terrill, Lynda, Comp.; Van Duzer, Carol, Comp.

This annotated bibliography contains 27 references regarding research on reading and adults learning English as a Second Language (ESL). None of the resources are more than 10 years old. (Adjunct ERIC Clearinghouse for ESL Literacy Education) (KFT)
CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

Directory of Open Access Journals (Sweden)

Spraggins Thomas A

2007-04-01

Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the
Semantic annotation in biomedicine: the current landscape.

Science.gov (United States)

Jovanović, Jelena; Bagheri, Ebrahim

2017-09-22

The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.
Small molecule annotation for the Protein Data Bank.

Science.gov (United States)

Sen, Sanchayita; Young, Jasmine; Berrisford, John M; Chen, Minyu; Conroy, Matthew J; Dutta, Shuchismita; Di Costanzo, Luigi; Gao, Guanghua; Ghosh, Sutapa; Hudson, Brian P; Igarashi, Reiko; Kengaku, Yumiko; Liang, Yuhe; Peisach, Ezra; Persikova, Irina; Mukhopadhyay, Abhik; Narayanan, Buvaneswari Coimbatore; Sahni, Gaurav; Sato, Junko; Sekharan, Monica; Shao, Chenghua; Tan, Lihua; Zhuravleva, Marina A

2014-01-01

The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100,000 structures contain more than 20,000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors. © The Author(s) 2014. Published by Oxford University Press.
Remote sensing of natural resources. Quarterly literature review, October-December 1980

International Nuclear Information System (INIS)

Gonzales, R.W.; Inglis, M.H.

1981-02-01

This review covers literature pertaining to documented data and data gathering techniques that are performed or obtained remotely from space, aircraft, or ground-based stations. All of the documentation is related to remote sensing sensors or the remote sensing of the natural resources. Section headings are: general; geology; environmental quality; hydrology; vegetation; oceanography; regional planning and land use; data manipulation; and instrumentation and technology
AstroDAbis: Annotations and Cross-Matches for Remote Catalogues

Science.gov (United States)

Gray, N.; Mann, R. G.; Morris, D.; Holliman, M.; Noddle, K.

2012-09-01

Astronomers are good at sharing data, but poorer at sharing knowledge. Almost all astronomical data ends up in open archives, and access to these is being simplified by the development of the global Virtual Observatory (VO). This is a great advance, but the fundamental problem remains that these archives contain only basic observational data, whereas all the astrophysical interpretation of that data — which source is a quasar, which a low-mass star, and which an image artefact — is contained in journal papers, with very little linkage back from the literature to the original data archives. It is therefore currently impossible for an astronomer to pose a query like “give me all sources in this data archive that have been identified as quasars” and this limits the effective exploitation of these archives, as the user of an archive has no direct means of taking advantage of the knowledge derived by its previous users. The AstroDAbis service aims to address this, in a prototype service enabling astronomers to record annotations and cross-identifications in the AstroDAbis service, annotating objects in other catalogues. We have deployed two interfaces to the annotations, namely one astronomy-specific one using the TAP protocol (Dowler et al. 2011), and a second exploiting generic Linked Open Data (LOD) and RDF techniques.
Contributions to In Silico Genome Annotation

KAUST Repository

Kalkatawi, Manal M.

2017-11-30

Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally
Active learning reduces annotation time for clinical concept extraction.

Science.gov (United States)

Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

2017-10-01

To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.

Science.gov (United States)

Islamaj Dogan, Rezarta; Kim, Sun; Chatr-Aryamontri, Andrew; Chang, Christie S; Oughtred, Rose; Rust, Jennifer; Wilbur, W John; Comeau, Donald C; Dolinski, Kara; Tyers, Mike

2017-01-01

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein-protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future
Annotated bibliography on artificial recharge of ground water, 1955-67

Science.gov (United States)

Signor, Donald C.; Growitz, Douglas J.; Kam, William

1970-01-01

Artificial ground-water recharge has become more important as water use by agriculture, industry, and municipalities increases. Water management agencies are increasingly interested in potential use of recharge for pollution abatement, waste-water disposal, and re-use and reclamation of locally available supplies. Research projects and theoretical analyses of operational recharge systems show increased scientific emphasis on the practice. Overall ground-water basin management systems generally now contain considerations of artificial recharge, whether by direct or indirect methods. Artificial ground-water recharge is a means of conserving surface runoff for future use in places where it would otherwise be lost, of protecting ground-water basins from salt-water encroachment along coastal areas, and of storing and distributing imported water. The biblio-graphy emphasizes technology; however, annotations of articles on waste-water reclamation, ground-water management and ground-water basin management are included. Subjects closely related to artificial recharge, including colloidal flow through porous media, field or laboratory instrumentation, and waste disposal by deep well injection are included where they specifically relate to potential recharge problems. Where almost the same material has been published in several journals, all references are included on the assumption that some publications may be more readily available to interested persons than others. Other publications, especially those of foreign literature, provided abstracts that were used freely as time limitations precluded obtaining and annotating all materials. Abstracts taken from published sources are noted. These are: "Abstracts of North American Geology," U.S. Department of the Interior, Geological Survey; "Abstracts of Recent Published Material on Foil and Water Conservation," ARS-41 series, Agricultural F.esearch Service, U.S. Department of Agriculture; "Water and1 Water
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

Directory of Open Access Journals (Sweden)

Mazo Ilya

2007-07-01

Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Energy Technology Data Exchange (ETDEWEB)

Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

2015-02-10

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

Science.gov (United States)

Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

2015-02-10

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
Improving adult immunization equity: Where do the published research literature and existing resources lead?

Science.gov (United States)

Prins, Wendy; Butcher, Emily; Hall, Laura Lee; Puckrein, Gary; Rosof, Bernard

2017-05-25

Evidence suggests that disparities in adult immunization (AI) rates are growing. Providers need adequate patient resources and information about successful interventions to help them engage in effective practices to reduce AI disparities. The primary purposes of this paper were to review and summarize the evidence base regarding interventions to reduce AI disparities and to scan for relevant resources that could support providers in their AI efforts to specifically target disparities. First, building on a literature review conducted by the U.S. Centers for Disease Control and Prevention, we searched the peer-reviewed literature to identify articles that either discussed interventions to reduce AI disparities or provided reasons and associations for disparities. We scanned the articles and conducted an internet search to identify tools and resources to support efforts to improve AI rates. We limited both searches to resources that addressed influenza, pneumococcal, hepatitis B, Tdap, and/or herpes zoster vaccinations. We found that most articles characterized AI disparities, but several discussed strategies for reducing AI disparities, including practice-based changes, communication and health literacy approaches, and partnering with community-based organizations. The resources we identified were largely fact sheets and handouts for patients and journal articles for providers. Most resources pertain to influenza vaccination and Spanish was the most prevalent language after English. More evaluation is needed to assess the health literacy levels of the materials. We conclude that additional research is needed to identify effective ways to reduce AI disparities and more resources are needed to support providers in their efforts. We recommend identifying best practices of high performers, further reviewing the appropriateness and usefulness of available resources, and prioritizing which gaps should be addressed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Science.gov (United States)

Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

2015-01-01

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh

Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Directory of Open Access Journals (Sweden)

Anika Oellrich

Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content
USI: a fast and accurate approach for conceptual document annotation.

Science.gov (United States)

Fiorini, Nicolas; Ranwez, Sylvie; Montmain, Jacky; Ranwez, Vincent

2015-03-14

Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

Science.gov (United States)

Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

2016-04-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

Science.gov (United States)

Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

2009-05-01

The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

International Nuclear Information System (INIS)

Yu Jia-Feng; Sui Tian-Xiang; Wang Ji-Hua; Wang Hong-Mei; Wang Chun-Ling; Jing Li

2015-01-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. (special topic)
Computer systems for annotation of single molecule fragments

Science.gov (United States)

Schwartz, David Charles; Severin, Jessica

2016-07-19

There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.
Linking human diseases to animal models using ontology-based phenotype annotation.

Directory of Open Access Journals (Sweden)

Nicole L Washington

2009-11-01

Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify
An analysis on the entity annotations in biological corpora [v1; ref status: indexed, http://f1000r.es/2o0

Directory of Open Access Journals (Sweden)

Mariana Neves

2014-04-01

Full Text Available Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six semantic groups and an overview on the semantic entities which can be found in each corpus is shown. Results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain.
Motion lecture annotation system to learn Naginata performances

Science.gov (United States)

Kobayashi, Daisuke; Sakamoto, Ryota; Nomura, Yoshihiko

2013-12-01

This paper describes a learning assistant system using motion capture data and annotation to teach "Naginata-jutsu" (a skill to practice Japanese halberd) performance. There are some video annotation tools such as YouTube. However these video based tools have only single angle of view. Our approach that uses motion-captured data allows us to view any angle. A lecturer can write annotations related to parts of body. We have made a comparison of effectiveness between the annotation tool of YouTube and the proposed system. The experimental result showed that our system triggered more annotations than the annotation tool of YouTube.
BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.

2015-08-18

Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/
BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

Science.gov (United States)

Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

2015-08-18

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
JGI Plant Genomics Gene Annotation Pipeline

Energy Technology Data Exchange (ETDEWEB)

Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

2014-07-14

Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.
Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

Science.gov (United States)

Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

2016-01-04

The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Annotating temporal information in clinical narratives.

Science.gov (United States)

Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem

2013-12-01

Temporal information in clinical narratives plays an important role in patients' diagnosis, treatment and prognosis. In order to represent narrative information accurately, medical natural language processing (MLP) systems need to correctly identify and interpret temporal information. To promote research in this area, the Informatics for Integrating Biology and the Bedside (i2b2) project developed a temporally annotated corpus of clinical narratives. This corpus contains 310 de-identified discharge summaries, with annotations of clinical events, temporal expressions and temporal relations. This paper describes the process followed for the development of this corpus and discusses annotation guideline development, annotation methodology, and corpus quality. Copyright © 2013 Elsevier Inc. All rights reserved.
Bridging the Gap: Enriching YouTube Videos with Jazz Music Annotations

Directory of Open Access Journals (Sweden)

Stefan Balke

2018-02-01

Full Text Available Web services allow permanent access to music from all over the world. Especially in the case of web services with user-supplied content, e.g., YouTube™, the available metadata is often incomplete or erroneous. On the other hand, a vast amount of high-quality and musically relevant metadata has been annotated in research areas such as Music Information Retrieval (MIR. Although they have great potential, these musical annotations are often inaccessible to users outside the academic world. With our contribution, we want to bridge this gap by enriching publicly available multimedia content with musical annotations available in research corpora, while maintaining easy access to the underlying data. Our web-based tools offer researchers and music lovers novel possibilities to interact with and navigate through the content. In this paper, we consider a research corpus called the Weimar Jazz Database (WJD as an illustrating example scenario. The WJD contains various annotations related to famous jazz solos. First, we establish a link between the WJD annotations and corresponding YouTube videos employing existing retrieval techniques. With these techniques, we were able to identify 988 corresponding YouTube videos for 329 solos out of 456 solos contained in the WJD. We then embed the retrieved videos in a recently developed web-based platform and enrich the videos with solo transcriptions that are part of the WJD. Furthermore, we integrate publicly available data resources from the Semantic Web in order to extend the presented information, for example, with a detailed discography or artists-related information. Our contribution illustrates the potential of modern web-based technologies for the digital humanities, and novel ways for improving access and interaction with digitized multimedia content.
Facilitating functional annotation of chicken microarray data

Directory of Open Access Journals (Sweden)

Gresham Cathy R

2009-10-01

Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and
How Important Are Student-Selected versus Instructor-Selected Literature Resources for Students' Learning and Motivation in Problem-Based Learning?

Science.gov (United States)

Wijnia, Lisette; Loyens, Sofie M.; Derous, Eva; Schmidt, Henk G.

2015-01-01

In problem-based learning students are responsible for their own learning process, which becomes evident when they must act independently, for example, when selecting literature resources for individual study. It is a matter of debate whether it is better to have students select their own literature resources or to present them with a list of…
Improving integrative searching of systems chemical biology data using semantic annotation.

Science.gov (United States)

Chen, Bin; Ding, Ying; Wild, David J

2012-03-08

Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.
A review and evaluation of the Langley Research Center's Scientific and Technical Information Program. Results of phase 5. Design and evaluation of STI systems: A selected, annotated bibliography

Science.gov (United States)

Pinelli, T. E.; Hinnebusch, P. A.; Jaffe, J. M.

1981-01-01

A selected, annotated bibliography of literature citations related to the design and evaluation of STI systems is presented. The use of manual and machine-readable literature searches; the review of numerous books, periodicals reports, and papers; and the selection and annotation of literature citations were required. The bibliography was produced because the information was needed to develop the methodology for the review and evaluation project, and a survey of the literature did not reveal the existence of a single published source of information pertinent to the subject. Approximately 200 citations are classified in four subject areas. The areas include information - general; information systems - design and evaluation, including information products and services; information - use and need; and information - economics.
Climate change and water supply, management and use: A literature review

International Nuclear Information System (INIS)

Chang, L.H.; Draves, J.D.; Hunsaker, C.T.

1992-05-01

There is evidence that atmospheric concentrations Of C0 2 , tropospheric 0 3 , and CH 4 , among other gases that contribute to the greenhouse effect, have increased in recent decades, and that these changes may induce changes in global air temperatures and regional climate features in coming years. A literature review was conducted to sample the literature base on which our understanding of the water resource impacts of climate change rests. Water resource issues likely to be important include hydrologic response to climate change, the resilience of water supply systems to changing climatic and hydrologic conditions, and the effects of climate change on water quality and water uses (such as navigation and energy generation). A computer-assisted search of literature on the effects of climate change on these subjects was conducted. All studies were classified by type of paper (e.g., review, discussion, case study), region, water resource variable studied, and source of climate scenario. The resulting bibliography containing more than 200 references was largely annotated. Case studies of potential hydrologic impacts have been more common than studies of impacts on water management or water use, but this apparent research gap is decreasing. Case studies demonstrating methods of incorporating potential risks of climate change into water project planning and management have been performed. Considerable variability in regional coverage exists; the Great Lakes basin and California receive relatively more attention than such regions as New England and the Missouri River basin. General circulation model-based and hypothetical climate scenarios have been the dominant sources of climate scenarios used in case studies, although a variety of other methods for developing climate scenarios have been developed

Climate change and water supply, management and use: A literature review

Energy Technology Data Exchange (ETDEWEB)

Chang, L.H.; Draves, J.D.; Hunsaker, C.T.

1992-05-01

There is evidence that atmospheric concentrations Of C0{sub 2}, tropospheric 0{sub 3}, and CH{sub 4}, among other gases that contribute to the greenhouse effect, have increased in recent decades, and that these changes may induce changes in global air temperatures and regional climate features in coming years. A literature review was conducted to sample the literature base on which our understanding of the water resource impacts of climate change rests. Water resource issues likely to be important include hydrologic response to climate change, the resilience of water supply systems to changing climatic and hydrologic conditions, and the effects of climate change on water quality and water uses (such as navigation and energy generation). A computer-assisted search of literature on the effects of climate change on these subjects was conducted. All studies were classified by type of paper (e.g., review, discussion, case study), region, water resource variable studied, and source of climate scenario. The resulting bibliography containing more than 200 references was largely annotated. Case studies of potential hydrologic impacts have been more common than studies of impacts on water management or water use, but this apparent research gap is decreasing. Case studies demonstrating methods of incorporating potential risks of climate change into water project planning and management have been performed. Considerable variability in regional coverage exists; the Great Lakes basin and California receive relatively more attention than such regions as New England and the Missouri River basin. General circulation model-based and hypothetical climate scenarios have been the dominant sources of climate scenarios used in case studies, although a variety of other methods for developing climate scenarios have been developed.
Dictionary-driven protein annotation.

Science.gov (United States)

Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

2002-09-01

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were
De novo transcriptome assembly and its annotation for the aposematic wood tiger moth (Parasemia plantaginis

Directory of Open Access Journals (Sweden)

Juan A. Galarza

2017-06-01

Full Text Available In this paper we report the public availability of transcriptome resources for the aposematic wood tiger moth (Parasemia plantaginis. A comprehensive assembly methods, quality statistics, and annotation are provided. This reference transcriptome may serve as a useful resource for investigating functional gene activity in aposematic Lepidopteran species. All data is freely available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena under study accession number: PRJEB14172.
Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics.

Science.gov (United States)

Chepelev, Leonid L; Riazanov, Alexandre; Kouznetsov, Alexandre; Low, Hong Sang; Dumontier, Michel; Baker, Christopher J O

2011-07-26

The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality. As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI) framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of our integrative methodology in the context of
Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics

Directory of Open Access Journals (Sweden)

Dumontier Michel

2011-07-01

Full Text Available Abstract Background The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality. Results As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of
Annotating activation/inhibition relationships to protein-protein interactions using gene ontology relations.

Science.gov (United States)

Yim, Soorin; Yu, Hasun; Jang, Dongjin; Lee, Doheon

2018-04-11

Signaling pathways can be reconstructed by identifying 'effect types' (i.e. activation/inhibition) of protein-protein interactions (PPIs). Effect types are composed of 'directions' (i.e. upstream/downstream) and 'signs' (i.e. positive/negative), thereby requiring directions as well as signs of PPIs to predict signaling events from PPI networks. Here, we propose a computational method for systemically annotating effect types to PPIs using relations between functional information of proteins. We used regulates, positively regulates, and negatively regulates relations in Gene Ontology (GO) to predict directions and signs of PPIs. These relations indicate both directions and signs between GO terms so that we can project directions and signs between relevant GO terms to PPIs. Independent test results showed that our method is effective for predicting both directions and signs of PPIs. Moreover, our method outperformed a previous GO-based method that did not consider the relations between GO terms. We annotated effect types to human PPIs and validated several highly confident effect types against literature. The annotated human PPIs are available in Additional file 2 to aid signaling pathway reconstruction and network biology research. We annotated effect types to PPIs by using regulates, positively regulates, and negatively regulates relations in GO. We demonstrated that those relations are effective for predicting not only signs, but also directions of PPIs. The usefulness of those relations suggests their potential applications to other types of interactions such as protein-DNA interactions.
The effectiveness of annotated (vs. non-annotated) digital pathology slides as a teaching tool during dermatology and pathology residencies.

Science.gov (United States)

Marsch, Amanda F; Espiritu, Baltazar; Groth, John; Hutchens, Kelli A

2014-06-01

With today's technology, paraffin-embedded, hematoxylin & eosin-stained pathology slides can be scanned to generate high quality virtual slides. Using proprietary software, digital images can also be annotated with arrows, circles and boxes to highlight certain diagnostic features. Previous studies assessing digital microscopy as a teaching tool did not involve the annotation of digital images. The objective of this study was to compare the effectiveness of annotated digital pathology slides versus non-annotated digital pathology slides as a teaching tool during dermatology and pathology residencies. A study group composed of 31 dermatology and pathology residents was asked to complete an online pre-quiz consisting of 20 multiple choice style questions, each associated with a static digital pathology image. After completion, participants were given access to an online tutorial composed of digitally annotated pathology slides and subsequently asked to complete a post-quiz. A control group of 12 residents completed a non-annotated version of the tutorial. Nearly all participants in the study group improved their quiz score, with an average improvement of 17%, versus only 3% (P = 0.005) in the control group. These results support the notion that annotated digital pathology slides are superior to non-annotated slides for the purpose of resident education. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Automatic annotation of head velocity and acceleration in Anvil

DEFF Research Database (Denmark)

Jongejan, Bart

2012-01-01

We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements....... The annotations are a useful supplement to manual annotations and may help human annotators to quickly and reliably determine onset of head movements and to suggest which kind of head movement is taking place....
Mesotext. Framing and exploring annotations

NARCIS (Netherlands)

Boot, P.; Boot, P.; Stronks, E.

2007-01-01

From the introduction: Annotation is an important item on the wish list for digital scholarly tools. It is one of John Unsworth’s primitives of scholarship (Unsworth 2000). Especially in linguistics,a number of tools have been developed that facilitate the creation of annotations to source material
Annotating images by mining image search results

NARCIS (Netherlands)

Wang, X.J.; Zhang, L.; Li, X.; Ma, W.Y.

2008-01-01

Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

Science.gov (United States)

Merchant, Nirav

2016-01-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
Teaching and Learning Communities through Online Annotation

Science.gov (United States)

van der Pluijm, B.

2016-12-01

What do colleagues do with your assigned textbook? What they say or think about the material? Want students to be more engaged in their learning experience? If so, online materials that complement standard lecture format provide new opportunity through managed, online group annotation that leverages the ubiquity of internet access, while personalizing learning. The concept is illustrated with the new online textbook "Processes in Structural Geology and Tectonics", by Ben van der Pluijm and Stephen Marshak, which offers a platform for sharing of experiences, supplementary materials and approaches, including readings, mathematical applications, exercises, challenge questions, quizzes, alternative explanations, and more. The annotation framework used is Hypothes.is, which offers a free, open platform markup environment for annotation of websites and PDF postings. The annotations can be public, grouped or individualized, as desired, including export access and download of annotations. A teacher group, hosted by a moderator/owner, limits access to members of a user group of teachers, so that its members can use, copy or transcribe annotations for their own lesson material. Likewise, an instructor can host a student group that encourages sharing of observations, questions and answers among students and instructor. Also, the instructor can create one or more closed groups that offers study help and hints to students. Options galore, all of which aim to engage students and to promote greater responsibility for their learning experience. Beyond new capacity, the ability to analyze student annotation supports individual learners and their needs. For example, student notes can be analyzed for key phrases and concepts, and identify misunderstandings, omissions and problems. Also, example annotations can be shared to enhance notetaking skills and to help with studying. Lastly, online annotation allows active application to lecture posted slides, supporting real-time notetaking
Displaying Annotations for Digitised Globes

Science.gov (United States)

Gede, Mátyás; Farbinger, Anna

2018-05-01

Thanks to the efforts of the various globe digitising projects, nowadays there are plenty of old globes that can be examined as 3D models on the computer screen. These globes usually contain a lot of interesting details that an average observer would not entirely discover for the first time. The authors developed a website that can display annotations for such digitised globes. These annotations help observers of the globe to discover all the important, interesting details. Annotations consist of a plain text title, a HTML formatted descriptive text and a corresponding polygon and are stored in KML format. The website is powered by the Cesium virtual globe engine.
THE DIMENSIONS OF COMPOSITION ANNOTATION.

Science.gov (United States)

MCCOLLY, WILLIAM

ENGLISH TEACHER ANNOTATIONS WERE STUDIED TO DETERMINE THE DIMENSIONS AND PROPERTIES OF THE ENTIRE SYSTEM FOR WRITING CORRECTIONS AND CRITICISMS ON COMPOSITIONS. FOUR SETS OF COMPOSITIONS WERE WRITTEN BY STUDENTS IN GRADES 9 THROUGH 13. TYPESCRIPTS OF THE COMPOSITIONS WERE ANNOTATED BY CLASSROOM ENGLISH TEACHERS. THEN, 32 ENGLISH TEACHERS JUDGED…
Evaluation of three automated genome annotations for Halorhabdus utahensis.

Directory of Open Access Journals (Sweden)

Peter Bakke

2009-07-01

Full Text Available Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.
Basic environmental bookshelf: An annotated bibliography. Current issue paper No. 116

Energy Technology Data Exchange (ETDEWEB)

Yeager, L.

1995-11-01

This bibliography is intended to provide objective basic information on science and the environment, and includes books and journal articles that should be understandable to the nonspecialist. The bibliographic entries are arranged alphabetically by author within broad categories such as acid precipitation, air quality, climate change, ecology and ecosystems, energy, environmental assessment, hazardous wastes, law, natural history, ozone depletion, parks and urban issues, public policy, toxic chemicals, and water resources. Most entries have brief annotations.
An Annotated Selective Bibliography on Human Performance in Fault Diagnosis Tasks. Technical Report 435. Final Report.

Science.gov (United States)

Johnson, William B.; And Others

This annotated bibliography developed in connection with an ongoing investigation of the use of computer simulations for fault diagnosis training cites 61 published works taken predominantly from the disciplines of engineering, psychology, and education. A review of the existing literature included computer searches of the past ten years of…
Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.

Directory of Open Access Journals (Sweden)

Qiongshi Lu

2017-07-01

Full Text Available Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD. Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.
Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.

Science.gov (United States)

Lu, Qiongshi; Powles, Ryan L; Abdallah, Sarah; Ou, Derek; Wang, Qian; Hu, Yiming; Lu, Yisi; Liu, Wei; Li, Boyang; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

2017-07-01

Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.
Mediagraphy: Print and Nonprint Resources.

Science.gov (United States)

Burdett, Anna E.

2003-01-01

Lists media-related journals, books, ERIC documents, journal articles, and nonprint resources published in 2001-2002. The annotated entries are classified under the following headings: artificial intelligence; computer assisted instruction; distance education; educational research; educational technology; information science and technology;…

Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

Science.gov (United States)

Sifrim, Alejandro; Van Houdt, Jeroen Kj; Tranchevent, Leon-Charles; Nowakowska, Beata; Sakai, Ryo; Pavlopoulos, Georgios A; Devriendt, Koen; Vermeesch, Joris R; Moreau, Yves; Aerts, Jan

2012-01-01

The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.
Questions of quality in repositories of open educational resources: a literature review

Directory of Open Access Journals (Sweden)

Javiera Atenas

2014-07-01

Full Text Available Open educational resources (OER are teaching and learning materials which are freely available and openly licensed. Repositories of OER (ROER are platforms that host and facilitate access to these resources. ROER should not just be designed to store this content – in keeping with the aims of the OER movement, they should support educators in embracing open educational practices (OEP such as searching for and retrieving content that they will reuse, adapt or modify as needed, without economic barriers or copyright restrictions. This paper reviews key literature on OER and ROER, in order to understand the roles ROER are said or supposed to fulfil in relation to furthering the aims of the OER movement. Four themes which should shape repository design are identified, and the following 10 quality indicators (QI for ROER effectiveness are discussed: featured resources; user evaluation tools; peer review; authorship of the resources; keywords of the resources; use of standardised metadata; multilingualism of the repositories; inclusion of social media tools; specification of the creative commons license; availability of the source code or original files. These QI form the basis of a method for the evaluation of ROER initiatives which, in concert with considerations of achievability and long-term sustainability, should assist in enhancement and development.
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

Science.gov (United States)

Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

2015-12-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.
Improving protein function prediction methods with integrated literature data

Directory of Open Access Journals (Sweden)

Gabow Aaron P

2008-04-01

Full Text Available Abstract Background Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder
Annotated bibliography for the humpback chub (Gila cypha) with emphasis on the Grand Canyon population.

Energy Technology Data Exchange (ETDEWEB)

Goulet, C. T.; LaGory, K. E.; Environmental Science Division

2009-10-05

Glen Canyon Dam is a hydroelectric facility located on the Colorado River in Arizona that is operated by the U.S. Bureau of Reclamation (Reclamation) for multiple purposes including water storage, flood control, power generation, recreation, and enhancement of fish and wildlife. Glen Canyon Dam operations have been managed for the last several years to improve conditions for the humpback chub (Gila cypha) and other ecosystem components. An extensive amount of literature has been produced on the humpback chub. We developed this annotated bibliography to assist managers and researchers in the Grand Canyon as they perform assessments, refine management strategies, and develop new studies to examine the factors affecting humpback chub. The U.S. Geological Survey recently created a multispecies bibliography (including references on the humpback chub) entitled Bibliography of Native Colorado River Big Fishes (available at www.fort.usgs.gov/Products/data/COFishBib). That bibliography, while quite extensive and broader in scope than ours, is not annotated, and, therefore, does not provide any of the information in the original literature. In developing this annotated bibliography, we have attempted to assemble abstracts from relevant published literature. We present here abstracts taken unmodified from individual reports and articles except where noted. The bibliography spans references from 1976 to 2009 and is organized in five broad topical areas, including: (1) biology, (2) ecology, (3) impacts of dam operations, (4) other impacts, and (5) conservation and management, and includes twenty subcategories. Within each subcategory, we present abstracts alphabetically by author and chronologically by year. We present relevant articles not specific to either the humpback chub or Glen Canyon Dam, but cited in other included reports, under the Supporting Articles subcategory. We provide all citations in alphabetical order in Section 7.
Hawaii Geothermal Project annotated bibliography: Biological resources of the geothermal subzones, the transmission corridors and the Puna District, Island of Hawaii

Energy Technology Data Exchange (ETDEWEB)

Miller, S.E.; Burgett, J.M. [Fish and Wildlife Service, Honolulu, HI (United States). Pacific Islands Office

1993-10-01

Task 1 of the Hawaii Geothermal Project Interagency Agreement between the Fish and Wildlife Service and the Department of Energy-Oak Ridge National Laboratory (DOE) includes an annotated bibliography of published and unpublished documents that cover biological issues related to the lowland rain forest in Puna, adjacent areas, transmission corridors, and in the proposed Hawaii Geothermal Project (HGP). The 51 documents reviewed in this report cover the main body of biological information for these projects. The full table of contents and bibliography for each document is included along with two copies (as requested in the Interagency Agreement) of the biological sections of each document. The documents are reviewed in five main categories: (1) geothermal subzones (29 documents); (2) transmission cable routes (8 documents); (3) commercial satellite launching facility (Spaceport; 1 document); (4) manganese nodule processing facility (2 documents); (5) water resource development (1 document); and (6) ecosystem stability and introduced species (11 documents).
Knowledge Representation and Management. From Ontology to Annotation. Findings from the Yearbook 2015 Section on Knowledge Representation and Management.

Science.gov (United States)

Charlet, J; Darmoni, S J

2015-08-13

To summarize the best papers in the field of Knowledge Representation and Management (KRM). A comprehensive review of medical informatics literature was performed to select some of the most interesting papers of KRM published in 2014. Four articles were selected, two focused on annotation and information retrieval using an ontology. The two others focused mainly on ontologies, one dealing with the usage of a temporal ontology in order to analyze the content of narrative document, one describing a methodology for building multilingual ontologies. Semantic models began to show their efficiency, coupled with annotation tools.
Diverse Image Annotation

KAUST Repository

Wu, Baoyuan

2017-11-09

In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.
Diverse Image Annotation

KAUST Repository

Wu, Baoyuan; Jia, Fan; Liu, Wei; Ghanem, Bernard

2017-01-01

In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.
Annotating individual human genomes.

Science.gov (United States)

Torkamani, Ali; Scott-Van Zeeland, Ashley A; Topol, Eric J; Schork, Nicholas J

2011-10-01

Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. Copyright © 2011 Elsevier Inc. All rights reserved.
ANNOTATING INDIVIDUAL HUMAN GENOMES*

Science.gov (United States)

Torkamani, Ali; Scott-Van Zeeland, Ashley A.; Topol, Eric J.; Schork, Nicholas J.

2014-01-01

Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. PMID:21839162
Mollified in Madison: Optimism in Contemporary American Literature.

Science.gov (United States)

Vale, Geraldine R.

1989-01-01

Discusses difficulties in teaching/reading "depressing" twentieth-century American literature. Suggests that the underlying themes are not depressing, and illustrates this assertion with examples from Edith Wharton's "Ethan Frome" and F. Scott Fitzgerald's "The Great Gatsby." Provides an annotated list of 25…
Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

Science.gov (United States)

Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

1997-01-01

SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
Improving integrative searching of systems chemical biology data using semantic annotation

Directory of Open Access Journals (Sweden)

Chen Bin

2012-03-01

Full Text Available Abstract Background Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. Results We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i simplifies the process of building SPARQL queries, (ii enables useful new kinds of queries on the data and (iii makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Availability Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.
GSV Annotated Bibliography

Energy Technology Data Exchange (ETDEWEB)

Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

2010-09-14

The following annotated bibliography was developed as part of the geospatial algorithm verification and validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Verification and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following five topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models. Many other papers were studied during the course of the investigation including. The annotations for these articles can be found in the paper "On the verification and validation of geospatial image analysis algorithms".
Annotating the human genome with Disease Ontology

Science.gov (United States)

Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

2009-01-01

Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883
Annotating non-coding regions of the genome.

Science.gov (United States)

Alexander, Roger P; Fang, Gang; Rozowsky, Joel; Snyder, Michael; Gerstein, Mark B

2010-08-01

Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.
PATRIC, the bacterial bioinformatics database and analysis resource.

Science.gov (United States)

Wattam, Alice R; Abraham, David; Dalay, Oral; Disz, Terry L; Driscoll, Timothy; Gabbard, Joseph L; Gillespie, Joseph J; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olson, Robert; Overbeek, Ross; Pusch, Gordon D; Shukla, Maulik; Schulman, Julie; Stevens, Rick L; Sullivan, Daniel E; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J C; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W

2014-01-01

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.
From efficacy to equity: Literature review of decision criteria for resource allocation and healthcare decisionmaking

NARCIS (Netherlands)

Guindo, L.A.; Wagner, M.; Baltussen, R.; Rindress, D.; van Til, Janine Astrid; Kind, P.; Goetghebeur, M.

2012-01-01

Objectives Resource allocation is a challenging issue faced by health policy decisionmakers requiring careful consideration of many factors. Objectives of this study were to identify decision criteria and their frequency reported in the literature on healthcare decisionmaking. Method An extensive
Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23-24, 2016 workshop (Gothenburg, Sweden

Directory of Open Access Journals (Sweden)

Kessy Abarenkov

2016-09-01

Full Text Available Recent molecular studies have identified substantial fungal diversity in indoor environments. Fungi and fungal particles have been linked to a range of potentially unwanted effects in the built environment, including asthma, decay of building materials, and food spoilage. The study of the built mycobiome is hampered by a number of constraints, one of which is the poor state of the metadata annotation of fungal DNA sequences from the built environment in public databases. In order to enable precise interrogation of such data – for example, “retrieve all fungal sequences recovered from bathrooms” – a workshop was organized at the University of Gothenburg (May 23-24, 2016 to annotate public fungal barcode (ITS sequences according to the MIxS-Built Environment annotation standard (http://gensc.org/mixs/. The 36 participants assembled a total of 45,488 data points from the published literature, including the addition of 8,430 instances of countries of collection from a total of 83 countries, 5,801 instances of building types, and 3,876 instances of surface-air contaminants. The results were implemented in the UNITE database for molecular identification of fungi (http://unite.ut.ee and were shared with other online resources. Data obtained from human/animal pathogenic fungi will furthermore be verified on culture based metadata for subsequent inclusion in the ISHAM-ITS database (http://its.mycologylab.org.

Systems Theory and Communication. Annotated Bibliography.

Science.gov (United States)

Covington, William G., Jr.

This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)
The surplus value of semantic annotations

NARCIS (Netherlands)

Marx, M.

2010-01-01

We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries,
Ground disposal of oil shale wastes: a review with an indexed annotated bibliography through 1976

Energy Technology Data Exchange (ETDEWEB)

Routson, R.C.; Bean, R.M.

1977-12-01

This review covers the available literature concerning ground-disposed wastes and effluents of a potential oil shale industry. Ground disposal has been proposed for essentially all of the solid and liquid wastes produced (Pfeffer, 1974). Since an oil shale industry is not actually in operation, the review is anticipatory in nature. The section, Oil Shale Technology, provides essential background for interpreting the literature on potential shale oil wastes and the topics are treated more completely in the section entitled Environmental Aspects of the Potential Disposal of Oil Shale Wastes to Ground. The first section of the annotated bibliography cites literature concerning potential oil shale wastes and the second section cites literature concerning oil shale technology. Each section contains references arranged historically by year. An index is provided.
Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease

Science.gov (United States)

Abdallah, Sarah; Ou, Derek; Wang, Qian; Hu, Yiming; Lu, Yisi; Liu, Wei; Li, Boyang; Mukherjee, Shubhabrata; Crane, Paul K.; Zhao, Hongyu

2017-01-01

Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer’s disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson’s disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline. PMID:28742084
Treating metadata as annotations: separating the content markup from the content

Directory of Open Access Journals (Sweden)

Fredrik Paulsson

2007-11-01

Full Text Available The use of digital learning resources creates an increasing need for semantic metadata, describing the whole resource, as well as parts of resources. Traditionally, schemas such as Text Encoding Initiative (TEI have been used to add semantic markup for parts of resources. This is not sufficient for use in a Ã¢Â€Âmetadata ecologyÃ¢Â€Â, where metadata is distributed, coherent to different Application Profiles, and added by different actors. A new methodology, where metadata is Ã¢Â€Âœpointed inÃ¢Â€Â as annotations, using XPointers, and RDF is proposed. A suggestion for how such infrastructure can be implemented, using existing open standards for metadata, and for the web is presented. We argue that such methodology and infrastructure is necessary to realize the decentralized metadata infrastructure needed for a Ã¢Â€Âœmetadata ecology".
Annotation-based enrichment of Digital Objects using open-source frameworks

Directory of Open Access Journals (Sweden)

Marcus Emmanuel Barnes

2017-07-01

Full Text Available The W3C Web Annotation Data Model, Protocol, and Vocabulary unify approaches to annotations across the web, enabling their aggregation, discovery and persistence over time. In addition, new javascript libraries provide the ability for users to annotate multi-format content. In this paper, we describe how we have leveraged these developments to provide annotation features alongside Islandora’s existing preservation, access, and management capabilities. We also discuss our experience developing with the Web Annotation Model as an open web architecture standard, as well as our approach to integrating mature external annotation libraries. The resulting software (the Web Annotation Utility Module for Islandora accommodates annotation across multiple formats. This solution can be used in various digital scholarship contexts.
Operations research for resource planning and -use in radiotherapy: a literature review.

Science.gov (United States)

Vieira, Bruno; Hans, Erwin W; van Vliet-Vroegindeweij, Corine; van de Kamer, Jeroen; van Harten, Wim

2016-11-25

The delivery of radiotherapy (RT) involves the use of rather expensive resources and multi-disciplinary staff. As the number of cancer patients receiving RT increases, timely delivery becomes increasingly difficult due to the complexities related to, among others, variable patient inflow, complex patient routing, and the joint planning of multiple resources. Operations research (OR) methods have been successfully applied to solve many logistics problems through the development of advanced analytical models for improved decision making. This paper presents the state of the art in the application of OR methods for logistics optimization in RT, at various managerial levels. A literature search was performed in six databases covering several disciplines, from the medical to the technical field. Papers included in the review were published in peer-reviewed journals from 2000 to 2015. Data extraction includes the subject of research, the OR methods used in the study, the extent of implementation according to a six-stage model and the (potential) impact of the results in practice. From the 33 papers included in the review, 18 addressed problems related to patient scheduling (of which 12 focus on scheduling patients on linear accelerators), 8 focus on strategic decision making, 5 on resource capacity planning, and 2 on patient prioritization. Although calculating promising results, none of the papers reported a full implementation of the model with at least a thorough pre-post performance evaluation, indicating that, apart from possible reporting bias, implementation rates of OR models in RT are probably low. The literature on OR applications in RT covers a wide range of approaches from strategic capacity management to operational scheduling levels, and shows that considerable benefits in terms of both waiting times and resource utilization are likely to be achieved. Various fields can be further developed, for instance optimizing the coordination between the available
Current and future trends in marine image annotation software

Science.gov (United States)

Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.

2016-12-01

Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images
PANNZER2: a rapid functional annotation web server.

Science.gov (United States)

Törönen, Petri; Medlar, Alan; Holm, Liisa

2018-05-08

The unprecedented growth of high-throughput sequencing has led to an ever-widening annotation gap in protein databases. While computational prediction methods are available to make up the shortfall, a majority of public web servers are hindered by practical limitations and poor performance. Here, we introduce PANNZER2 (Protein ANNotation with Z-scoRE), a fast functional annotation web server that provides both Gene Ontology (GO) annotations and free text description predictions. PANNZER2 uses SANSparallel to perform high-performance homology searches, making bulk annotation based on sequence similarity practical. PANNZER2 can output GO annotations from multiple scoring functions, enabling users to see which predictions are robust across predictors. Finally, PANNZER2 predictions scored within the top 10 methods for molecular function and biological process in the CAFA2 NK-full benchmark. The PANNZER2 web server is updated on a monthly schedule and is accessible at http://ekhidna2.biocenter.helsinki.fi/sanspanz/. The source code is available under the GNU Public Licence v3.
Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

Directory of Open Access Journals (Sweden)

Marais Gabriel AB

2011-07-01

Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to
Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

Science.gov (United States)

2011-01-01

Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a
Autobiographical and Biographical Resources for History of Psychology.

Science.gov (United States)

Ware, Mark E.

This document presents a variety of resources for students to use in preparing class presentations and historical research projects on the history of psychology. The resources pertain to historical figures in the field of psychology. Many of the references included in this bibliography are annotated and the names of the psychologists contained…
Biases in the experimental annotations of protein function and their effect on our understanding of protein function space.

Directory of Open Access Journals (Sweden)

Alexandra M Schnoes

Full Text Available The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the "few articles - many proteins" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.
Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space

Science.gov (United States)

Schnoes, Alexandra M.; Ream, David C.; Thorman, Alexander W.; Babbitt, Patricia C.; Friedberg, Iddo

2013-01-01

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the “few articles - many proteins” phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments. PMID:23737737
Nutrition Books and Resources 1971.

Science.gov (United States)

Hawaii Dietetic Association, Honolulu.

This is an annotated bibliography listing books, resources, and films and filmstrips on the subject of nutrition. Sections include: Food Sense; Controlling Your Weight; Feeding Your Family; Food for Teens; Learning and Teaching Nutrition; Other Sources; and Films and Filmstrips. The material is in pamphlet form. (LK)
Literature classification for semi-automated updating of biological knowledgebases

DEFF Research Database (Denmark)

Olsen, Lars Rønn; Kudahl, Ulrich Johan; Winther, Ole

2013-01-01

abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion: We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining...... types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results: We defined and applied a machine...
Chest x-ray screening practices: an annotated bibliography

International Nuclear Information System (INIS)

Torchia, M.; DuChez, J.

1980-03-01

This annotated bibliography is a review of the scientific literature on the selection of asymptomatic patients for chest x-ray screening examinations. Selected articles cover a period of time from 1969 through 1979. The articles are organized under 10 main topics which correspond to various categories of chest x-ray screening examinations performed in the United States today. Within each main topic, the articles are presented in chronological order. To aid the reader in identifying specific citations, an author index and a list of citations by journal have been included for user reference. The standard format for each citation includes the title of each article, the author(s), journal, volume, page, date, and abstract
Annotation of the Domestic Pig Genome by Quantitative Proteogenomics.

Science.gov (United States)

Marx, Harald; Hahne, Hannes; Ulbrich, Susanne E; Schnieke, Angelika; Rottmann, Oswald; Frishman, Dmitrij; Kuster, Bernhard

2017-08-04

The pig is one of the earliest domesticated animals in the history of human civilization and represents one of the most important livestock animals. The recent sequencing of the Sus scrofa genome was a major step toward the comprehensive understanding of porcine biology, evolution, and its utility as a promising large animal model for biomedical and xenotransplantation research. However, the functional and structural annotation of the Sus scrofa genome is far from complete. Here, we present mass spectrometry-based quantitative proteomics data of nine juvenile organs and six embryonic stages between 18 and 39 days after gestation. We found that the data provide evidence for and improve the annotation of 8176 protein-coding genes including 588 novel and 321 refined gene models. The analysis of tissue-specific proteins and the temporal expression profiles of embryonic proteins provides an initial functional characterization of expressed protein interaction networks and modules including as yet uncharacterized proteins. Comparative transcript and protein expression analysis to human organs reveal a moderate conservation of protein translation across species. We anticipate that this resource will facilitate basic and applied research on Sus scrofa as well as its porcine relatives.
MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.

Science.gov (United States)

Zhidkov, Ilia; Nagar, Tal; Mishmar, Dan; Rubin, Eitan

2011-11-01

The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming. Copyright Â© 2011 Elsevier B.V. and Mitochondria Research Society. All rights reserved. All rights reserved.
Correction of the Caulobacter crescentus NA1000 genome annotation.

Directory of Open Access Journals (Sweden)

Bert Ely

Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

Paradigmatic approaches used in enterprise resource planning systems research: A systematic literature review

Directory of Open Access Journals (Sweden)

Kevin Burgess

2013-03-01

Full Text Available The purpose of this paper is to determine the range of research paradigms employed in a smaller subset of Information Systems (IS literature, namely Enterprise Resource Planning (ERP systems. A systematic literature review based on papers that mentioned ERPs was drawn from eight of the most highly ranked journals according to their h-index. The findings indicate that the majority (96.6% of the ERP research papers were conducted within a positivist research paradigm, which is a far higher proportion than is suggested by other research in the general IS literature (approximately 81%. This paper suggests that there is a strong case for ERP researchers to look at existing paradigm selection and how effectively their research relates to the ERP body of knowledge, especially in respect to the issues of importance to managers within organizations (notably social and change management issues. This research also identified areas where existing paradigm evaluation methods could be enhanced and refined in respect to non-positivist classifications.
Geomicrobiology and its relevance to nuclear waste disposal - a further annotated bibliography

International Nuclear Information System (INIS)

West, J.M.; Arme, S.C.

1984-07-01

Scientific investigations into the disposal of high/intermediate level radioactive waste into deep geological formations includes work in the field of geomicrobiology. It has been shown that microbes exist in deep and shallow geological formations and that they could alter the geochemical environment of a waste repository and influence radionuclide migration. A preliminary literature survey indicated a lack of annotated material and an initial report (West, McKinley and Christofi, 1982) provided the first bibliography. This report is an updated annotated bibliography of relevant geomicrobiological research published since 1982 and should be used in conjunction with the previous report. Those without specific biological experience may find the glossary of common terms in the initial report of use. A major area of work which has not been referenced widely here is near surface disposal of both radioactive and conventional industrial waste. This vast topic can be accessed via some of the general reviews in this report. (author)
BOWiki: an ontology-based wiki for annotation of data and integration of knowledge in biology

Directory of Open Access Journals (Sweden)

Gregorio Sergio E

2009-05-01

Full Text Available Abstract Motivation Ontology development and the annotation of biological data using ontologies are time-consuming exercises that currently require input from expert curators. Open, collaborative platforms for biological data annotation enable the wider scientific community to become involved in developing and maintaining such resources. However, this openness raises concerns regarding the quality and correctness of the information added to these knowledge bases. The combination of a collaborative web-based platform with logic-based approaches and Semantic Web technology can be used to address some of these challenges and concerns. Results We have developed the BOWiki, a web-based system that includes a biological core ontology. The core ontology provides background knowledge about biological types and relations. Against this background, an automated reasoner assesses the consistency of new information added to the knowledge base. The system provides a platform for research communities to integrate information and annotate data collaboratively. Availability The BOWiki and supplementary material is available at http://www.bowiki.net/. The source code is available under the GNU GPL from http://onto.eva.mpg.de/trac/BoWiki.
Annotation of regular polysemy and underspecification

DEFF Research Database (Denmark)

Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

2013-01-01

We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...
Easing the Pain of Divorce through Children's Literature.

Science.gov (United States)

Kramer, Pamela A.; Smith, Gail G.

1998-01-01

Describes children's typical reactions to divorce and practical techniques that teachers and caregivers can use in helping children deal with divorce. The use of bibliotherapy is described as a means to help children. An annotated bibliography of select literature is provided, along with descriptions of developmentally appropriate activities that…
A semi-automatic annotation tool for cooking video

Science.gov (United States)

Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe

2013-03-01

In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.
Experiments with crowdsourced re-annotation of a POS tagging data set

DEFF Research Database (Denmark)

Hovy, Dirk; Plank, Barbara; Søgaard, Anders

2014-01-01

Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have assumed that syntactic tasks such as part......-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks....
MPEG-7 based video annotation and browsing

Science.gov (United States)

Hoeynck, Michael; Auweiler, Thorsten; Wellhausen, Jens

2003-11-01

The huge amount of multimedia data produced worldwide requires annotation in order to enable universal content access and to provide content-based search-and-retrieval functionalities. Since manual video annotation can be time consuming, automatic annotation systems are required. We review recent approaches to content-based indexing and annotation of videos for different kind of sports and describe our approach to automatic annotation of equestrian sports videos. We especially concentrate on MPEG-7 based feature extraction and content description, where we apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information. Having determined single shot positions as well as the visual highlights, the information is jointly stored with meta-textual information in an MPEG-7 description scheme. Based on this information, we generate content summaries which can be utilized in a user-interface in order to provide content-based access to the video stream, but further for media browsing on a streaming server.
A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

Energy Technology Data Exchange (ETDEWEB)

Dehal, Paramvir S.; Boore, Jeffrey L.

2005-08-25

We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.
Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

Science.gov (United States)

Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

2011-01-01

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.
Copyright Resources for School Librarians

Science.gov (United States)

Johnson, Yvonne M.; Johnson, Nicole M.

2016-01-01

This article provides a collection of annotated citations for online resources of interest to school librarians; the focus is on copyright law, related information, and guidelines. The citations are organized by themes based on common issues. Copyright protects originally created works, including movies, recorded music performances, novels,…
Ground Truth Annotation in T Analyst

DEFF Research Database (Denmark)

2015-01-01

This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...
The Viking viewer for connectomics: scalable multi-user annotation and summarization of large volume data sets.

Science.gov (United States)

Anderson, J R; Mohammed, S; Grimm, B; Jones, B W; Koshevoy, P; Tasdizen, T; Whitaker, R; Marc, R E

2011-01-01

Modern microscope automation permits the collection of vast amounts of continuous anatomical imagery in both two and three dimensions. These large data sets present significant challenges for data storage, access, viewing, annotation and analysis. The cost and overhead of collecting and storing the data can be extremely high. Large data sets quickly exceed an individual's capability for timely analysis and present challenges in efficiently applying transforms, if needed. Finally annotated anatomical data sets can represent a significant investment of resources and should be easily accessible to the scientific community. The Viking application was our solution created to view and annotate a 16.5 TB ultrastructural retinal connectome volume and we demonstrate its utility in reconstructing neural networks for a distinctive retinal amacrine cell class. Viking has several key features. (1) It works over the internet using HTTP and supports many concurrent users limited only by hardware. (2) It supports a multi-user, collaborative annotation strategy. (3) It cleanly demarcates viewing and analysis from data collection and hosting. (4) It is capable of applying transformations in real-time. (5) It has an easily extensible user interface, allowing addition of specialized modules without rewriting the viewer. © 2010 The Authors Journal of Microscopy © 2010 The Royal Microscopical Society.
Low-level radioactive waste technology: a selected, annotated bibliography

International Nuclear Information System (INIS)

Fore, C.S.; Vaughan, N.D.; Hyder, L.K.

1980-10-01

This annotated bibliography of 447 references contains scientific, technical, economic, and regulatory information relevant to low-level radioactive waste technology. The bibliography focuses on environmental transport, disposal site, and waste treatment studies. The publication covers both domestic and foreign literature for the period 1952 to 1979. Major chapters selected are Chemical and Physical Aspects; Container Design and Performance; Disposal Site; Environmental Transport; General Studies and Reviews; Geology, Hydrology and Site Resources; Regulatory and Economic Aspects; Transportation Technology; Waste Production; and Waste Treatment. Specialized data fields have been incorporated into the data file to improve the ease and accuracy of locating pertinent references. Specific radionuclides for which data are presented are listed in the Measured Radionuclides field, and specific parameters which affect the migration of these radionuclides are presented in the Measured Parameters field. In addition, each document referenced in this bibliography has been assigned a relevance number to facilitate sorting the documents according to their pertinence to low-level radioactive waste technology. The documents are rated 1, 2, 3, or 4, with 1 indicating direct applicability to low-level radioactive waste technology and 4 indicating that a considerable amount of interpretation is required for the information presented to be applied. The references within each chapter are arranged alphabetically by leading author, corporate affiliation, or title of the document. Indexes are provide for (1) author(s), (2) keywords, (3) subject category, (4) title, (5) geographic location, (6) measured parameters, (7) measured radionuclides, and (8) publication description
Low-level radioactive waste technology: a selected, annotated bibliography

Energy Technology Data Exchange (ETDEWEB)

Fore, C.S.; Vaughan, N.D.; Hyder, L.K.

1980-10-01

This annotated bibliography of 447 references contains scientific, technical, economic, and regulatory information relevant to low-level radioactive waste technology. The bibliography focuses on environmental transport, disposal site, and waste treatment studies. The publication covers both domestic and foreign literature for the period 1952 to 1979. Major chapters selected are Chemical and Physical Aspects; Container Design and Performance; Disposal Site; Environmental Transport; General Studies and Reviews; Geology, Hydrology and Site Resources; Regulatory and Economic Aspects; Transportation Technology; Waste Production; and Waste Treatment. Specialized data fields have been incorporated into the data file to improve the ease and accuracy of locating pertinent references. Specific radionuclides for which data are presented are listed in the Measured Radionuclides field, and specific parameters which affect the migration of these radionuclides are presented in the Measured Parameters field. In addition, each document referenced in this bibliography has been assigned a relevance number to facilitate sorting the documents according to their pertinence to low-level radioactive waste technology. The documents are rated 1, 2, 3, or 4, with 1 indicating direct applicability to low-level radioactive waste technology and 4 indicating that a considerable amount of interpretation is required for the information presented to be applied. The references within each chapter are arranged alphabetically by leading author, corporate affiliation, or title of the document. Indexes are provide for (1) author(s), (2) keywords, (3) subject category, (4) title, (5) geographic location, (6) measured parameters, (7) measured radionuclides, and (8) publication description.
Propagating annotations of molecular networks using in silico fragmentation.

Science.gov (United States)

da Silva, Ricardo R; Wang, Mingxun; Nothias, Louis-Félix; van der Hooft, Justin J J; Caraballo-Rodríguez, Andrés Mauricio; Fox, Evan; Balunas, Marcy J; Klassen, Jonathan L; Lopes, Norberto Peporine; Dorrestein, Pieter C

2018-04-18

The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.
Gene calling and bacterial genome annotation with BG7.

Science.gov (United States)

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Annotation of the Evaluative Language in a Dependency Treebank

Directory of Open Access Journals (Sweden)

Šindlerová Jana

2017-12-01

Full Text Available In the paper, we present our efforts to annotate evaluative language in the Prague Dependency Treebank 2.0. The project is a follow-up of the series of annotations of small plaintext corpora. It uses automatic identification of potentially evaluative nodes through mapping a Czech subjectivity lexicon to syntactically annotated data. These nodes are then manually checked by an annotator and either dismissed as standing in a non-evaluative context, or confirmed as evaluative. In the latter case, information about the polarity orientation, the source and target of evaluation is added by the annotator. The annotations unveiled several advantages and disadvantages of the chosen framework. The advantages involve more structured and easy-to-handle environment for the annotator, visibility of syntactic patterning of the evaluative state, effective solving of discontinuous structures or a new perspective on the influence of good/bad news. The disadvantages include little capability of treating cases with evaluation spread among more syntactically connected nodes at once, little capability of treating metaphorical expressions, or disregarding the effects of negation and intensification in the current scheme.
PATRIC, the bacterial bioinformatics database and analysis resource

Science.gov (United States)

Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

2014-01-01

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323
DOE Hydropower Program biennial report 1996-1997 (with an updated annotated bibliography)

Energy Technology Data Exchange (ETDEWEB)

Rinehart, B.N.; Francfort, J.E.; Sommers, G.L. [Lockheed Idaho Technologies Co., Idaho Falls, ID (United States); Cada, G.F.; Sale, M.J. [Oak Ridge National Lab., TN (United States)

1997-06-01

This report, the latest in a series of biennial Hydropower Program reports sponsored by the US Department of Energy, summarizes the research and development and technology transfer activities of fiscal years 1996 and 1997. The report discusses the activities in the six areas of the hydropower program: advanced hydropower turbine systems; environmental research; hydropower research and development; renewable Indian energy resources; resource assessment; and technology transfer. The report also includes an annotated bibliography of reports pertinent to hydropower, written by the staff of the Idaho National Engineering and Environmental Laboratory, Oak Ridge National Laboratory, Federal and state agencies, cities, metropolitan water districts, irrigation companies, and public and independent utilities. Most reports are available from the National Technical Information Service.

The caBIG annotation and image Markup project.

Science.gov (United States)

Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Sepukar, Kastubh; Rubin, Daniel L

2010-04-01

Image annotation and markup are at the core of medical interpretation in both the clinical and the research setting. Digital medical images are managed with the DICOM standard format. While DICOM contains a large amount of meta-data about whom, where, and how the image was acquired, DICOM says little about the content or meaning of the pixel data. An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human or machine observer. An image markup is the graphical symbols placed over the image to depict an annotation. While DICOM is the standard for medical image acquisition, manipulation, transmission, storage, and display, there are no standards for image annotation and markup. Many systems expect annotation to be reported verbally, while markups are stored in graphical overlays or proprietary formats. This makes it difficult to extract and compute with both of them. The goal of the Annotation and Image Markup (AIM) project is to develop a mechanism, for modeling, capturing, and serializing image annotation and markup data that can be adopted as a standard by the medical imaging community. The AIM project produces both human- and machine-readable artifacts. This paper describes the AIM information model, schemas, software libraries, and tools so as to prepare researchers and developers for their use of AIM.
Interoperable Multimedia Annotation and Retrieval for the Tourism Sector

NARCIS (Netherlands)

Chatzitoulousis, Antonios; Efraimidis, Pavlos S.; Athanasiadis, I.N.

2015-01-01

The Atlas Metadata System (AMS) employs semantic web annotation techniques in order to create an interoperable information annotation and retrieval platform for the tourism sector. AMS adopts state-of-the-art metadata vocabularies, annotation techniques and semantic web technologies.
The UNO Aviation Monograph Series: Aviation Security: An Annotated Bibliography of Responses to the Gore Commission

Science.gov (United States)

Carrico, John S.; Schaaf, Michaela M.

1998-01-01

This monograph is a companion to UNOAI Monograph 96-2, "The Image of Airport Security: An Annotated Bibliography," compiled in June 1996. The White House Commission on Aviation Safety and Security, headed by Vice President Al Gore, was formed as a result of the TWA Flight 800 crash in August 1996. The Commission's final report included 31 recommendations addressed toward aviation security. The recommendations were cause for security issues to be revisited in the media and by the aviation industry. These developments necessitated the need for an updated bibliography to review the resulting literature. Many of the articles were written in response to the recommendations made by the Gore Commission. "Aviation Security: An Annotated Bibliography of Responses to the Gore Commission" is the result of this need.
A Novel Approach to Semantic and Coreference Annotation at LLNL

Energy Technology Data Exchange (ETDEWEB)

Firpo, M

2005-02-04

A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have a system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.
Ecological investigations at power plant cooling lakes, reservoirs, and ponds: an annotated bibliography. Final report

International Nuclear Information System (INIS)

Yost, F.E.; Talmage, S.S.

1981-06-01

Presented as an annotated bibliography are 541 references dealing with ecological investigations at power plants which use cooling lakes, ponds, or reservoirs. The references were obtained from open literature and from environmental reports and impact statements prepared for or by the electric utility industry. The literature covers the period 1950 through mid-1980. Topics covered include site-specific studies at facilities using cooling lakes, ponds, or reservoirs, as well as special studies, engineering studies, and general studies. References are arranged alphabetically by author and indexes are provided to personal and corporate authors, facility names, regions, and taxonomic names
Focus on Firearms. Biblio Alert! New Resources for Preventing Injury and Violence.

Science.gov (United States)

National Center for Education in Maternal and Child Health, Arlington, VA.

This annotated bibliography provides information on important resources on firearm injuries. It is divided into two sections: (1) New Resources, including journal articles (28 titles), reports and books (15 titles), other resources (13 titles), and popular press (3 titles); and (2) Classics, including journal articles (10 titles), reports and…
Policy Development for Grey Literature Resources. An Assessment of the Pisa Declaration

OpenAIRE

Savić, Dobrica (IAEA-NIS); Farace, Dominic J. (GreyNet); Frantzen, Jerry (GreyNet); Biagioni, Stefania (ISTI-CNR); Carlesi, Carlo (ISTI-CNR); Gruttemeier, Herbert (Inist-CNRS); Stock, Christiane (Inist-CNRS); Puccinelli, Roberto (CNR); GreyNet, Grey Literature Network Service

2017-01-01

In the spring of 2014, a workshop took place at the Italian National Council of Research in Pisa. The topic of this event dealt with policy development for grey literature resources. Some seventy participants from nine countries took an active part in the workshop – the outcome of which produced what is today known as the Pisa Declaration. This fifteen point document arising from the input of those who attended the workshop sought to provide a roadmap that would help to serve diverse communit...
Policy Development for Grey Literature Resources: An Assessment of the Pisa Declaration

OpenAIRE

Savic, Dobrica; Farace, Dominic; Frantzen, Jerry; Biagioni, Stefania; Carlesi, Carlo; Gruttemeier, Herbert; Stock, Christiane

2017-01-01

In the spring of 2014, a workshop took place at the Italian National Council of Research in Pisa1. The topic of this event dealt with policy development for grey literature resources. Some seventy participants from nine countries took an active part in the workshop - the outcome of which produced what is today known as the Pisa Declaration2. This fifteen point document arising from the input of those who attended the workshop sought to provide a roadmap that would help to serve diverse commun...
Combined evidence annotation of transposable elements in genome sequences.

Directory of Open Access Journals (Sweden)

Hadi Quesneville

2005-07-01

Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other
Plann: A command-line application for annotating plastome sequences.

Science.gov (United States)

Huang, Daisie I; Cronk, Quentin C B

2015-08-01

Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann's output can be used in the National Center for Biotechnology Information's tbl2asn to create a Sequin file for GenBank submission. Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved.
Semantator: annotating clinical narratives with semantic web ontologies.

Science.gov (United States)

Song, Dezhao; Chute, Christopher G; Tao, Cui

2012-01-01

To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.
MIPS: analysis and annotation of proteins from whole genomes.

Science.gov (United States)

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
MEETING: Chlamydomonas Annotation Jamboree - October 2003

Energy Technology Data Exchange (ETDEWEB)

Grossman, Arthur R

2007-04-13

Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual
Transcriptome sequencing and annotation for the Jamaican fruit bat (Artibeus jamaicensis.

Directory of Open Access Journals (Sweden)

Timothy I Shaw

Full Text Available The Jamaican fruit bat (Artibeus jamaicensis is one of the most common bats in the tropical Americas. It is thought to be a potential reservoir host of Tacaribe virus, an arenavirus closely related to the South American hemorrhagic fever viruses. We performed transcriptome sequencing and annotation from lung, kidney and spleen tissues using 454 and Illumina platforms to develop this species as an animal model. More than 100,000 contigs were assembled, with 25,000 genes that were functionally annotated. Of the remaining unannotated contigs, 80% were found within bat genomes or transcriptomes. Annotated genes are involved in a broad range of activities ranging from cellular metabolism to genome regulation through ncRNAs. Reciprocal BLAST best hits yielded 8,785 sequences that are orthologous to mouse, rat, cattle, horse and human. Species tree analysis of sequences from 2,378 loci was used to achieve 95% bootstrap support for the placement of bat as sister to the clade containing horse, dog, and cattle. Through substitution rate estimation between bat and human, 32 genes were identified with evidence for positive selection. We also identified 466 immune-related genes, which may be useful for studying Tacaribe virus infection of this species. The Jamaican fruit bat transcriptome dataset is a resource that should provide additional candidate markers for studying bat evolution and ecology, and tools for analysis of the host response and pathology of disease.
Ontology modularization to improve semantic medical image annotation.

Science.gov (United States)

Wennerberg, Pinar; Schulz, Klaus; Buitelaar, Paul

2011-02-01

Searching for medical images and patient reports is a significant challenge in a clinical setting. The contents of such documents are often not described in sufficient detail thus making it difficult to utilize the inherent wealth of information contained within them. Semantic image annotation addresses this problem by describing the contents of images and reports using medical ontologies. Medical images and patient reports are then linked to each other through common annotations. Subsequently, search algorithms can more effectively find related sets of documents on the basis of these semantic descriptions. A prerequisite to realizing such a semantic search engine is that the data contained within should have been previously annotated with concepts from medical ontologies. One major challenge in this regard is the size and complexity of medical ontologies as annotation sources. Manual annotation is particularly time consuming labor intensive in a clinical environment. In this article we propose an approach to reducing the size of clinical ontologies for more efficient manual image and text annotation. More precisely, our goal is to identify smaller fragments of a large anatomy ontology that are relevant for annotating medical images from patients suffering from lymphoma. Our work is in the area of ontology modularization, which is a recent and active field of research. We describe our approach, methods and data set in detail and we discuss our results. Copyright © 2010 Elsevier Inc. All rights reserved.
An annotated bibliography of scientific literature on managing forests for carbon benefits

Science.gov (United States)

Sarah J. Hines; Linda S. Heath; Richard A. Birdsey

2010-01-01

Managing forests for carbon benefits is a consideration for climate change, bioenergy, sustainability, and ecosystem services. A rapidly growing body of scientific literature on forest carbon management includes experimental, modeling, and synthesis approaches, at the stand- to landscape- to continental-level. We conducted a search of the scientific literature on the...
[Prescription annotations in Welfare Pharmacy].

Science.gov (United States)

Han, Yi

2018-03-01

Welfare Pharmacy contains medical formulas documented by the government and official prescriptions used by the official pharmacy in the pharmaceutical process. In the last years of Southern Song Dynasty, anonyms gave a lot of prescription annotations, made textual researches for the name, source, composition and origin of the prescriptions, and supplemented important historical data of medical cases and researched historical facts. The annotations of Welfare Pharmacy gathered the essence of medical theory, and can be used as precious materials to correctly understand the syndrome differentiation, compatibility regularity and clinical application of prescriptions. This article deeply investigated the style and form of the prescription annotations in Welfare Pharmacy, the name of prescriptions and the evolution of terminology, the major functions of the prescriptions, processing methods, instructions for taking medicine and taboos of prescriptions, the medical cases and clinical efficacy of prescriptions, the backgrounds, sources, composition and cultural meanings of prescriptions, proposed that the prescription annotations played an active role in the textual dissemination, patent medicine production and clinical diagnosis and treatment of Welfare Pharmacy. This not only helps understand the changes in the names and terms of traditional Chinese medicines in Welfare Pharmacy, but also provides the basis for understanding the knowledge sources, compatibility regularity, important drug innovations and clinical medications of prescriptions in Welfare Pharmacy. Copyright© by the Chinese Pharmaceutical Association.
A database of annotated tentative orthologs from crop abiotic stress transcripts.

Science.gov (United States)

Balaji, Jayashree; Crouch, Jonathan H; Petite, Prasad V N S; Hoisington, David A

2006-10-07

A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.
Resources for Underwater Robotics Education

Science.gov (United States)

Wallace, Michael L.; Freitas, William M.

2016-01-01

4-H clubs can build and program underwater robots from raw materials. An annotated resource list for engaging youth in building underwater remotely operated vehicles (ROVs) is provided. This article is a companion piece to the Research in Brief article "Building Teen Futures with Underwater Robotics" in this issue of the "Journal of…
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING IN CLOUD COMPUTING

OpenAIRE

B. Muni Lavanya; C. Shoba Bindu

2016-01-01

The objective the work is intended to highlight the key features and afford finest future directions in the research community of Resource Allocation, Resource Scheduling and Resource management from 2009 to 2016. Exemplifying how research on Resource Allocation, Resource Scheduling and Resource management has progressively increased in the past decade by inspecting articles, papers from scientific and standard publications. Survey materialized in three-fold process. Firstly, investigate on t...

Annotating abstract pronominal anaphora in the DAD project

DEFF Research Database (Denmark)

Navarretta, Costanza; Olsen, Sussi Anni

2008-01-01

n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The exten......n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments....... The extended scheme, which we call the DAD annotation scheme, allows to annotate information about abstract anaphora which is important to investigate their use, see Webber (1988), Gundel et al. (2003), Navarretta (2004) and which can influence their automatic treatment. Intercoder agreement scores obtained...... by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that th information proposed in the scheme can be recognised in a reliable way....
DOE Hydropower Program biennial report 1990--1991 (with updated annotated bibliography)

Energy Technology Data Exchange (ETDEWEB)

Chappell, J.R.; Rinehart, B.N.; Sommers, G.L. (Idaho National Engineering Lab., Idaho Falls, ID (United States)); Sale, M.J. (Oak Ridge National Lab., TN (United States))

1991-07-01

This report summarizes the activities of the US Department of Energy's (DOE) Hydropower Program for fiscal years 1990 and 1991, and provides an annotated bibliography of research, engineering, operations, regulations, and costs of projects pertinent to hydropower development. The Hydropower Program is organized as follows: background (including Technology Development and Engineering Research and Development); Resource Assessment; National Energy Strategy; Technology Transfer; Environmental Research; and, the bibliography discusses reports written by both private and non-Federal Government sectors. Most reports are available from the National Technical Information Service. 5 figs., 2 tabs.
The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research.

Science.gov (United States)

Tenenbaum, Jessica D; Whetzel, Patricia L; Anderson, Kent; Borromeo, Charles D; Dinov, Ivo D; Gabriel, Davera; Kirschner, Beth; Mirel, Barbara; Morris, Tim; Noy, Natasha; Nyulas, Csongor; Rubenson, David; Saxman, Paul R; Singh, Harpreet; Whelan, Nancy; Wright, Zach; Athey, Brian D; Becich, Michael J; Ginsburg, Geoffrey S; Musen, Mark A; Smith, Kevin A; Tarantal, Alice F; Rubin, Daniel L; Lyster, Peter

2011-02-01

The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health. Copyright © 2010 Elsevier Inc. All rights reserved.
Liberating literacies: L1-students resources for stance-taking in the literature classroom

DEFF Research Database (Denmark)

Kabel, Kristine; Brok, Lene Storgaard

2015-01-01

a pattern in students’ linguistic choices in the literature classroom and their metadiscourses. Moreover, privileged ways of participating in group work about text production involve strategies that enhance students’ development of an independent voice and of resources for stance-taking. Such strategies can...... by approaches to the importance of explorative meaning-making processes in the classroom (Flower, 1994; Aadahl et al., 2010) and by social semiotic notions of reflection literacy (Hasan, 1996, 2011) as well as critical literacy (Gee, 2012; Gibbons, 2006;), which emphasize students’ meta knowledge and agency...... 5 classes) and consist of students’ written texts, student interviews, video recorded classroom observations and field notes. Preliminary results show a variety in students’ resources for stance-taking, specifically in regard to what extent other voices are integrated in texts, and they show...
Quick Pad Tagger : An Efficient Graphical User Interface for Building Annotated Corpora with Multiple Annotation Layers

OpenAIRE

Marc Schreiber; Kai Barkschat; Bodo Kraft; Albert Zundorf

2015-01-01

More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e. g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annota tion time we present...
Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

2015-01-01

Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27Â %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Extending eScience Provenance with User-Submitted Semantic Annotations

Science.gov (United States)

Michaelis, J.; Zednik, S.; West, P.; Fox, P. A.; McGuinness, D. L.

2010-12-01

eScience based systems generate provenance of their data products, related to such things as: data processing, data collection conditions, expert evaluation, and data product quality. Recent advances in web-based technology offer users the possibility of making annotations to both data products and steps in accompanying provenance traces, thereby expanding the utility of such provenance for others. These contributing users may have varying backgrounds, ranging from system experts to outside domain experts to citizen scientists. Furthermore, such users may wish to make varying types of annotations - ranging from documenting the purpose of a provenance step to raising concerns about the quality of data dependencies. Semantic Web technologies allow for such kinds of rich annotations to be made to provenance through the use of ontology vocabularies for (i) organizing provenance, and (ii) organizing user/annotation classifications. Furthermore, through Linked Data practices, Semantic linkages may be made from provenance steps to external data of interest. A desire for Semantically-annotated provenance has been motivated by data management issues in the Mauna Loa Solar Observatory’s (MLSO) Advanced Coronal Observing System (ACOS). In ACOS, photomoeter-based readings are taken of solar activity and subsequently processed into final data products consumable by end users. At intermediate stages of ACOS processing, factors such as evaluations by human experts and weather conditions are logged, which could impact data product quality. If such factors are linked via user-submitted annotations to provenance, it could be significantly beneficial for other users. Likewise, the background of a user could impact the credibility of their annotations. For example, an annotation made by a citizen scientist describing the purpose of a provenance step may not be as reliable as a similar annotation made by an ACOS project member. For this work, we have developed a software package that
TOPSAN: use of a collaborative environment for annotating, analyzing and disseminating data on JCSG and PSI structures

International Nuclear Information System (INIS)

Krishna, S. Sri; Weekes, Dana; Bakolitsa, Constantina; Elsliger, Marc-André; Wilson, Ian A.; Godzik, Adam; Wooley, John

2010-01-01

Specific use cases of TOPSAN, an innovative collaborative platform for creating, sharing and distributing annotations and insights about protein structures, such as those determined by high-throughput structural genomics in the Protein Structure Initiative (PSI), are described. TOPSAN is the main annotation platform for JCSG structures and serves as a conduit for initiating collaborations with the biological community, as illustrated in this special issue of Acta Crystallographica Section F. Developed at the JCSG with the goal of opening a dialogue on the novel protein structures with the broader biological community, TOPSAN is a unique tool for fostering distributed collaborations and provides an efficient pathway to peer-reviewed publications. The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (‘The Open Protein Structure Annotation Network’), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in
Harnessing Collaborative Annotations on Online Formative Assessments

Science.gov (United States)

Lin, Jian-Wei; Lai, Yuan-Cheng

2013-01-01

This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…
Crowdsourcing and annotating NER for Twitter #drift

DEFF Research Database (Denmark)

Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

2014-01-01

We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...
SNAD: sequence name annotation-based designer

Directory of Open Access Journals (Sweden)

Gorbalenya Alexander E

2009-08-01

Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.
The Plant Genome Integrative Explorer Resource: PlantGenIE.org.

Science.gov (United States)

Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R

2015-12-01

Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Creating Gaze Annotations in Head Mounted Displays

DEFF Research Database (Denmark)

Mardanbeigi, Diako; Qvarfordt, Pernilla

2015-01-01

To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion...
DaMold: A data-mining platform for variant annotation and visualization in molecular diagnostics research.

Science.gov (United States)

Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

2017-07-01

Next-generation sequencing (NGS) has become a powerful and efficient tool for routine mutation screening in clinical research. As each NGS test yields hundreds of variants, the current challenge is to meaningfully interpret the data and select potential candidates. Analyzing each variant while manually investigating several relevant databases to collect specific information is a cumbersome and time-consuming process, and it requires expertise and familiarity with these databases. Thus, a tool that can seamlessly annotate variants with clinically relevant databases under one common interface would be of great help for variant annotation, cross-referencing, and visualization. This tool would allow variants to be processed in an automated and high-throughput manner and facilitate the investigation of variants in several genome browsers. Several analysis tools are available for raw sequencing-read processing and variant identification, but an automated variant filtering, annotation, cross-referencing, and visualization tool is still lacking. To fulfill these requirements, we developed DaMold, a Web-based, user-friendly tool that can filter and annotate variants and can access and compile information from 37 resources. It is easy to use, provides flexible input options, and accepts variants from NGS and Sanger sequencing as well as hotspots in VCF and BED formats. DaMold is available as an online application at http://damold.platomics.com/index.html, and as a Docker container and virtual machine at https://sourceforge.net/projects/damold/. © 2017 Wiley Periodicals, Inc.
Ontological Annotation with WordNet

Energy Technology Data Exchange (ETDEWEB)

Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

2006-06-06

Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.
Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

Energy Technology Data Exchange (ETDEWEB)

Kuo, Alan; Grigoriev, Igor

2009-04-17

Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.
Developmental gene discovery in a hemimetabolous insect: de novo assembly and annotation of a transcriptome for the cricket Gryllus bimaculatus.

Directory of Open Access Journals (Sweden)

Victor Zeng

Full Text Available Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects, representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket, a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in
Mining the literature: new methods to exploit keyword profiles

OpenAIRE

Andrade-Navarro, Miguel A

2012-01-01

Bibliographic records in the PubMed database of biomedical literature are annotated with Medical Subject Headings (MeSH) by curators, which summarize the content of the articles. Two recent publications explain how to generate profiles of MeSH terms for a set of bibliographic records and to use them to define any given concept by its associated literature. These concepts can then be related by their keyword profiles, and this can be used, for example, to detect new associations between genes ...
Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

Directory of Open Access Journals (Sweden)

Merchant Sabeeha S

2011-07-01

Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of
Integrating Enterprise Resource Planning (SAP) in the Accounting Curriculum: A Systematic Literature Review and Case Study

Science.gov (United States)

Blount, Yvette; Abedin, Babak; Vatanasakdakul, Savanid; Erfani, Seyedezahra

2016-01-01

This study investigates how an enterprise resource planning (ERP) software package SAP was integrated into the curriculum of an accounting information systems (AIS) course in an Australian university. Furthermore, the paper provides a systematic literature review of articles published between 1990 and 2013 to understand how ERP systems were…

Construction of coffee transcriptome networks based on gene annotation semantics

Directory of Open Access Journals (Sweden)

Castillo Luis F.

2012-12-01

Full Text Available Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.
Functional annotation of hierarchical modularity.

Directory of Open Access Journals (Sweden)

Kanchana Padmanabhan

Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our
Fluid Annotations in a Open World

DEFF Research Database (Denmark)

Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning

2001-01-01

Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment to l...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....
Evacuation in emergencies: An annotated guide to research

Energy Technology Data Exchange (ETDEWEB)

Vogt, B.M.; Sorensen, J.H.

1987-02-01

The purpose of this literature review was to explore the relevant sources of knowledge regarding evacuation related issues among recent work published in the social sciences and emergency planning fields. In organizing the material, we looked primarily for articles that included either a theoretical or empirical basis for the findings. By empirical, we mean that the findings were based on data taken from actual research gained through surveys, questionnaires, interviews or a combination of these, and the use of secondary sources. The theoretical material consisted of work that built on past research or which explored the use of models. Some conceptual work, raising issues not covered by the general format, were included if they attempted to synthesize aspects of the literature now segmented or which asked additional questions about topics related to but not necessarily within the strict realm of research studies. The material was divided as to the emphasis placed on the individual or the organizational level of behavior. Empirically the individual, family or household response is easier to assess because of the bias that may intrude when individuals of organizations or agencies involved in the evacuation process are interviewed regarding their official status. The annotations of the literature as well as the specific key findings from each study, where appropriate, are organized by hazard type. An author index is provided.
Annotated Bibliography. Conference Materials: Strategy Conference on Education and the Economy (Washington, D.C., September 22-24, 1981).

Science.gov (United States)

Center for Law and Education, Boston, MA.

The 22 items included in a packet distributed to participants at a 1981 conference on education and the economy are listed in this annotated bibliography, with sources for the items identified. The materials include articles, chapters from books, and monographs, as well as a resource guide and bibliography concerning rural economic and community…
Black English Annotations for Elementary Reading Programs.

Science.gov (United States)

Prasad, Sandre

This report describes a program that uses annotations in the teacher's editions of existing reading programs to indicate the characteristics of black English that may interfere with the reading process of black children. The first part of the report provides a rationale for the annotation approach, explaining that the discrepancy between written…
Workflow and web application for annotating NCBI BioProject transcriptome data.

Science.gov (United States)

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A; Barrero, Luz S; Landsman, David; Mariño-Ramírez, Leonardo

2017-01-01

The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. URL: http://www.ncbi.nlm.nih.gov/projects/physalis/. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Special Issue: Annotated Bibliography for Volumes XIX-XXXII.

Science.gov (United States)

Pullin, Richard A.

1998-01-01

This annotated bibliography lists 310 articles from the "Journal of Cooperative Education" from Volumes XIX-XXXII, 1983-1997. Annotations are presented in the order they appear in the journal; author and subject indexes are provided. (JOW)
Integrative specimen information service - a campus-wide resource for tissue banking, experimental data annotation, and analysis services.

Science.gov (United States)

Schadow, Gunther; Dhaval, Rakesh; McDonald, Clement J; Ragg, Susanne

2006-01-01

We present the architecture and approach of an evolving campus-wide information service for tissues with clinical and data annotations to be used and contributed to by clinical researchers across the campus. The services provided include specimen tracking, long term data storage, and computational analysis services. The project is conceived and sustained by collaboration among researchers on the campus as well as participation in standards organizations and national collaboratives.
Environmental effects of postfire logging: literature review and annotated bibliography.

Science.gov (United States)

James D. McIver; Lynn Starr

2000-01-01

The scientific literature on logging after wildfire is reviewed, with a focus on environmental effects of logging and removal of large woody structure. Rehabilitation, the practice of planting or seeding after logging, is not reviewed here. Several publications are cited that can be described as âcommentaries,â intended to help frame the public debate. We review 21...
Traditional African Religion: A Resource Unit.

Science.gov (United States)

Garland, William E.

This resource unit is based on research conducted by Lynn Mitchell and Ernest Valenzuela, experienced classroom teachers of African history and culture. The unit consists of an introduction by Mr. Garland and two major parts. Part I is an annotated bibliography of selected sources on various aspects of traditional African Religion useful in…
BAT: An open-source, web-based audio events annotation tool

OpenAIRE

Blai Meléndez-Catalan, Emilio Molina, Emilia Gómez

2017-01-01

In this paper we present BAT (BMAT Annotation Tool), an open-source, web-based tool for the manual annotation of events in audio recordings developed at BMAT (Barcelona Music and Audio Technologies). The main feature of the tool is that it provides an easy way to annotate the salience of simultaneous sound sources. Additionally, it allows to define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data. Moreover, it is easy to install and deploy...
Annotating images by mining image search results.

Science.gov (United States)

Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying

2008-11-01

Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.
Collaborative Paper-Based Annotation of Lecture Slides

Science.gov (United States)

Steimle, Jurgen; Brdiczka, Oliver; Muhlhauser, Max

2009-01-01

In a study of notetaking in university courses, we found that the large majority of students prefer paper to computer-based media like Tablet PCs for taking notes and making annotations. Based on this finding, we developed CoScribe, a concept and system which supports students in making collaborative handwritten annotations on printed lecture…
Music journals in South Africa 1854-2010: an annotated bibliography

African Journals Online (AJOL)

Music journals in South Africa 1854-2010: an annotated bibliography. ... The article focuses on presenting an annotated bibliography of music journalism in South Africa from as early as 1854 until 2010. Most of ... Key words: annotated bibliography, electronic journals, music journals, periodicals, South African music history ...
Literature and information related to the natural resources of the North Aleutian Basin of Alaska.

Energy Technology Data Exchange (ETDEWEB)

Stull, E.A.; Hlohowskyj, I.; LaGory, K. E.; Environmental Science Division

2008-01-31

The North Aleutian Basin Planning Area of the Minerals Management Service (MMS) is a large geographic area with significant natural resources. The Basin includes most of the southeastern part of the Bering Sea Outer Continental Shelf, including all of Bristol Bay. The area supports important habitat for a wide variety of species and globally significant habitat for birds and marine mammals, including several federally listed species. Villages and communities of the Alaska Peninsula and other areas bordering or near the Basin rely on its natural resources (especially commercial and subsistence fishing) for much of their sustenance and livelihood. The offshore area of the North Aleutian Basin is considered to have important hydrocarbon reserves, especially natural gas. In 2006, the MMS released a draft proposed program, 'Outer Continental Shelf Oil and Gas Leasing Program, 2007-2012' and an accompanying draft programmatic environmental impact statement (EIS). The draft proposed program identified two lease sales proposed in the North Aleutian Basin in 2010 and 2012, subject to restrictions. The area proposed for leasing in the Basin was restricted to the Sale 92 Area in the southwestern portion. Additional EISs will be needed to evaluate the potential effects of specific lease actions, exploration activities, and development and production plans in the Basin. A full range of updated multidisciplinary scientific information will be needed to address oceanography, fate and effects of oil spills, marine ecosystems, fish, fisheries, birds, marine mammals, socioeconomics, and subsistence in the Basin. Scientific staff at Argonne National Laboratory were contracted to assist MMS with identifying and prioritizing information needs related to potential future oil and gas leasing and development activities in the North Aleutian Basin. Argonne focused on three related tasks: (1) identify and gather relevant literature published since 1996, (2) synthesize and
Plann: A command-line application for annotating plastome sequences1

Science.gov (United States)

Huang, Daisie I.; Cronk, Quentin C. B.

2015-01-01

Premise of the study: Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Methods and Results: Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann’s output can be used in the National Center for Biotechnology Information’s tbl2asn to create a Sequin file for GenBank submission. Conclusions: Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved. PMID:26312193
Selected, annotated bibliography of studies relevant to the isolation of nuclear wastes

International Nuclear Information System (INIS)

Hyder, L.K.; Fore, C.S.; Vaughan, N.D.; Faust, R.A.

1980-09-01

This annotated bibliography of 705 references represents the first in a series to be published by the Ecological Sciences Information Center containing scientific, technical, economic, and regulatory information relevant to nuclear waste isolation. Most references discuss deep geologic disposal, with fewer studies of deep seabed disposal; space disposal is also included. The publication covers both domestic and foreign literature for the period 1954 to 1980. Major chapters selected are Chemical and Physical Aspects; Container Design and Performance; Disposal Site; Envirnmental Transport; General Studies and Reviews; Geology, Hydrology and Site Resources; Regulatory and Economic Aspects; Repository Design and Engineering; Transportation Technology; Waste Production; and Waste Treatment. Specialized data fields have been incorporated to improve the ease and accuracy of locating pertinent references. Specific radionuclides for which data are presented are listed in the Measured Radionuclides field, and specific parameters which affect the migration of these radionuclides are presented in the Measured Parameters field. The references within each chapter are arranged alphabetically by leading author, corporate affiliation, or title of the document. When the author is not given, the corporate affiliation appears first. If these two levels of authorship are not given, the title of the document is used as the identifying level. Indexes are provided for author(s), keywords, subject category, title, geographic location, measured parameters, measured radionuclides, and publication description
Genomic variant annotation workflow for clinical applications [version 2; referees: 2 approved

Directory of Open Access Journals (Sweden)

Thomas Thurnherr

2016-10-01

Full Text Available Annotation and interpretation of DNA aberrations identified through next-generation sequencing is becoming an increasingly important task. Even more so in the context of data analysis pipelines for medical applications, where genomic aberrations are associated with phenotypic and clinical features. Here we describe a workflow to identify potential gene targets in aberrated genes or pathways and their corresponding drugs. To this end, we provide the R/Bioconductor package rDGIdb, an R wrapper to query the drug-gene interaction database (DGIdb. DGIdb accumulates drug-gene interaction data from 15 different resources and allows filtering on different levels. The rDGIdb package makes these resources and tools available to R users. Moreover, rDGIdb queries can be automated through incorporation of the rDGIdb package into NGS sequencing pipelines.
Evaluation of web-based annotation of ophthalmic images for multicentric clinical trials.

Science.gov (United States)

Chalam, K V; Jain, P; Shah, V A; Shah, Gaurav Y

2006-06-01

An Internet browser-based annotation system can be used to identify and describe features in digitalized retinal images, in multicentric clinical trials, in real time. In this web-based annotation system, the user employs a mouse to draw and create annotations on a transparent layer, that encapsulates the observations and interpretations of a specific image. Multiple annotation layers may be overlaid on a single image. These layers may correspond to annotations by different users on the same image or annotations of a temporal sequence of images of a disease process, over a period of time. In addition, geometrical properties of annotated figures may be computed and measured. The annotations are stored in a central repository database on a server, which can be retrieved by multiple users in real time. This system facilitates objective evaluation of digital images and comparison of double-blind readings of digital photographs, with an identifiable audit trail. Annotation of ophthalmic images allowed clinically feasible and useful interpretation to track properties of an area of fundus pathology. This provided an objective method to monitor properties of pathologies over time, an essential component of multicentric clinical trials. The annotation system also allowed users to view stereoscopic images that are stereo pairs. This web-based annotation system is useful and valuable in monitoring patient care, in multicentric clinical trials, telemedicine, teaching and routine clinical settings.

e-MIR2: a public online inventory of medical informatics resources.

Science.gov (United States)

de la Calle, Guillermo; García-Remesal, Miguel; Nkumu-Mbomio, Nelida; Kulikowski, Casimir; Maojo, Victor

2012-08-02

Over the past years, the number of available informatics resources in medicine has grown exponentially. While specific inventories of such resources have already begun to be developed for Bioinformatics (BI), comparable inventories are as yet not available for the Medical Informatics (MI) field, so that locating and accessing them currently remains a difficult and time-consuming task. We have created a repository of MI resources from the scientific literature, providing free access to its contents through a web-based service. We define informatics resources as all those elements that constitute, serve to define or are used by informatics systems, ranging from architectures or development methodologies to terminologies, vocabularies, databases or tools. Relevant information describing the resources is automatically extracted from manuscripts published in top-ranked MI journals. We used a pattern matching approach to detect the resources' names and their main features. Detected resources are classified according to three different criteria: functionality, resource type and domain. To facilitate these tasks, we have built three different classification schemas by following a novel approach based on folksonomies and social tagging. We adopted the terminology most frequently used by MI researchers in their publications to create the concepts and hierarchical relationships belonging to the classification schemas. The classification algorithm identifies the categories associated with resources and annotates them accordingly. The database is then populated with this data after manual curation and validation. We have created an online repository of MI resources to assist researchers in locating and accessing the most suitable resources to perform specific tasks. The database contains 609 resources at the time of writing and is available at http://www.gib.fi.upm.es/eMIR2. We are continuing to expand the number of available resources by taking into account further
MitoLSDB: a comprehensive resource to study genotype to phenotype correlations in human mitochondrial DNA variations.

Directory of Open Access Journals (Sweden)

Shamnamole K

Full Text Available Human mitochondrial DNA (mtDNA encodes a set of 37 genes which are essential structural and functional components of the electron transport chain. Variations in these genes have been implicated in a broad spectrum of diseases and are extensively reported in literature and various databases. In this study, we describe MitoLSDB, an integrated platform to catalogue disease association studies on mtDNA (http://mitolsdb.igib.res.in. The main goal of MitoLSDB is to provide a central platform for direct submissions of novel variants that can be curated by the Mitochondrial Research Community. MitoLSDB provides access to standardized and annotated data from literature and databases encompassing information from 5231 individuals, 675 populations and 27 phenotypes. This platform is developed using the Leiden Open (source Variation Database (LOVD software. MitoLSDB houses information on all 37 genes in each population amounting to 132397 variants, 5147 unique variants. For each variant its genomic location as per the Revised Cambridge Reference Sequence, codon and amino acid change for variations in protein-coding regions, frequency, disease/phenotype, population, reference and remarks are also listed. MitoLSDB curators have also reported errors documented in literature which includes 94 phantom mutations, 10 NUMTs, six documentation errors and one artefactual recombination. MitoLSDB is the largest repository of mtDNA variants systematically standardized and presented using the LOVD platform. We believe that this is a good starting resource to curate mtDNA variants and will facilitate direct submissions enhancing data coverage, annotation in context of pathogenesis and quality control by ensuring non-redundancy in reporting novel disease associated variants.
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

Science.gov (United States)

Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W

2018-03-09

The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements
Annotating Logical Forms for EHR Questions.

Science.gov (United States)

Roberts, Kirk; Demner-Fushman, Dina

2016-05-01

This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is to provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.
Managing and Querying Image Annotation and Markup in XML.

Science.gov (United States)

Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

2010-01-01

Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Managing and Querying Image Annotation and Markup in XML

Science.gov (United States)

Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

2010-01-01

Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167
Expressed Peptide Tags: An additional layer of data for genome annotation

Energy Technology Data Exchange (ETDEWEB)

Savidor, Alon [ORNL; Donahoo, Ryan S [ORNL; Hurtado-Gonzales, Oscar [University of Tennessee, Knoxville (UTK); Verberkmoes, Nathan C [ORNL; Shah, Manesh B [ORNL; Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL

2006-01-01

While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller sub-databases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While ~77% of Phytophthora EPTs supported the current annotation, a portion of them (7.2% and 12.6% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.
Update of identification and estimation of socioeconomic impacts resulting from perceived risks and changing images: An annotated bibliography

International Nuclear Information System (INIS)

Nieves, L.A.; Clark, D.E.; Wernette, D.

1991-08-01

This annotated bibliography reviews selected literature published through August 1991 on the identification of perceived risks and methods for estimating the economic impacts of risk perception. It updates the literature review found in Argonne National Laboratory report ANL/EAIS/TM-24 (February 1990). Included in this update are (1) a literature review of the risk perception process, of the relationship between risk perception and economic impacts, of economic methods and empirical applications, and interregional market interactions and adjustments; (2) a working bibliography (that includes the documents abstracted in the 1990 report); (3) a topical index to the abstracts found in both reports; and (4) abstracts of selected articles found in this update
Update of identification and estimation of socioeconomic impacts resulting from perceived risks and changing images: An annotated bibliography

Energy Technology Data Exchange (ETDEWEB)

Nieves, L.A.; Clark, D.E.; Wernette, D.

1991-08-01

This annotated bibliography reviews selected literature published through August 1991 on the identification of perceived risks and methods for estimating the economic impacts of risk perception. It updates the literature review found in Argonne National Laboratory report ANL/EAIS/TM-24 (February 1990). Included in this update are (1) a literature review of the risk perception process, of the relationship between risk perception and economic impacts, of economic methods and empirical applications, and interregional market interactions and adjustments; (2) a working bibliography (that includes the documents abstracted in the 1990 report); (3) a topical index to the abstracts found in both reports; and (4) abstracts of selected articles found in this update.
Radioactive occurrences in veins and igneous and metamorphic rocks of New Mexico with annotated bibliography

International Nuclear Information System (INIS)

McLemore, V.T.

1982-02-01

The primary objectives of this report are to list known radioactive occurrences in veins and igneous and metamorphic rocks in New Mexico, and to provide an annotated bibliography of geologic reports concerning these regions. Only plutonic, metamorphic, vein, and Precambrian quartz-pebble conglomerate uranium deposits are considered in this report; other nonsandstone uranium deposits (such as shale, limestone, phosphorite, coal, evaporative precipitates, and fossil placer deposits) will be considered at a later time. These objectives were achieved through a literature search. Some field examinations of some of the radioactive occurrences have been completed. A table of known radioactive occurrences in veins and igneous and metamorphic rocks was compiled from the literature (Appendix I)
PedAM: a database for Pediatric Disease Annotation and Medicine.

Science.gov (United States)

Jia, Jinmeng; An, Zhongxin; Ming, Yue; Guo, Yongli; Li, Wei; Li, Xin; Liang, Yunxiang; Guo, Dongming; Tai, Jun; Chen, Geng; Jin, Yaqiong; Liu, Zhimei; Ni, Xin; Shi, Tieliu

2018-01-04

There is a significant number of children around the world suffering from the consequence of the misdiagnosis and ineffective treatment for various diseases. To facilitate the precision medicine in pediatrics, a database namely the Pediatric Disease Annotations & Medicines (PedAM) has been built to standardize and classify pediatric diseases. The PedAM integrates both biomedical resources and clinical data from Electronic Medical Records to support the development of computational tools, by which enables robust data analysis and integration. It also uses disease-manifestation (D-M) integrated from existing biomedical ontologies as prior knowledge to automatically recognize text-mined, D-M-specific syntactic patterns from 774 514 full-text articles and 8 848 796 abstracts in MEDLINE. Additionally, disease connections based on phenotypes or genes can be visualized on the web page of PedAM. Currently, the PedAM contains standardized 8528 pediatric disease terms (4542 unique disease concepts and 3986 synonyms) with eight annotation fields for each disease, including definition synonyms, gene, symptom, cross-reference (Xref), human phenotypes and its corresponding phenotypes in the mouse. The database PedAM is freely accessible at http://www.unimd.org/pedam/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Systematically profiling and annotating long intergenic non-coding RNAs in human embryonic stem cell.

Science.gov (United States)

Tang, Xing; Hou, Mei; Ding, Yang; Li, Zhaohui; Ren, Lichen; Gao, Ge

2013-01-01

While more and more long intergenic non-coding RNAs (lincRNAs) were identified to take important roles in both maintaining pluripotency and regulating differentiation, how these lincRNAs may define and drive cell fate decisions on a global scale are still mostly elusive. Systematical profiling and comprehensive annotation of embryonic stem cells lincRNAs may not only bring a clearer big picture of these novel regulators but also shed light on their functionalities. Based on multiple RNA-Seq datasets, we systematically identified 300 human embryonic stem cell lincRNAs (hES lincRNAs). Of which, one forth (78 out of 300) hES lincRNAs were further identified to be biasedly expressed in human ES cells. Functional analysis showed that they were preferentially involved in several early-development related biological processes. Comparative genomics analysis further suggested that around half of the identified hES lincRNAs were conserved in mouse. To facilitate further investigation of these hES lincRNAs, we constructed an online portal for biologists to access all their sequences and annotations interactively. In addition to navigation through a genome browse interface, users can also locate lincRNAs through an advanced query interface based on both keywords and expression profiles, and analyze results through multiple tools. By integrating multiple RNA-Seq datasets, we systematically characterized and annotated 300 hES lincRNAs. A full functional web portal is available freely at http://scbrowse.cbi.pku.edu.cn. As the first global profiling and annotating of human embryonic stem cell lincRNAs, this work aims to provide a valuable resource for both experimental biologists and bioinformaticians.
e-MIR2: a public online inventory of medical informatics resources

Directory of Open Access Journals (Sweden)

de la Calle Guillermo

2012-08-01

Full Text Available Abstract Background Over the past years, the number of available informatics resources in medicine has grown exponentially. While specific inventories of such resources have already begun to be developed for Bioinformatics (BI, comparable inventories are as yet not available for the Medical Informatics (MI field, so that locating and accessing them currently remains a difficult and time-consuming task. Description We have created a repository of MI resources from the scientific literature, providing free access to its contents through a web-based service. We define informatics resources as all those elements that constitute, serve to define or are used by informatics systems, ranging from architectures or development methodologies to terminologies, vocabularies, databases or tools. Relevant information describing the resources is automatically extracted from manuscripts published in top-ranked MI journals. We used a pattern matching approach to detect the resources’ names and their main features. Detected resources are classified according to three different criteria: functionality, resource type and domain. To facilitate these tasks, we have built three different classification schemas by following a novel approach based on folksonomies and social tagging. We adopted the terminology most frequently used by MI researchers in their publications to create the concepts and hierarchical relationships belonging to the classification schemas. The classification algorithm identifies the categories associated with resources and annotates them accordingly. The database is then populated with this data after manual curation and validation. Conclusions We have created an online repository of MI resources to assist researchers in locating and accessing the most suitable resources to perform specific tasks. The database contains 609 resources at the time of writing and is available at http://www.gib.fi.upm.es/eMIR2. We are continuing to expand the number
Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

Directory of Open Access Journals (Sweden)

Jianfang Cao

2015-01-01

Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.
Multiview Hessian regularization for image annotation.

Science.gov (United States)

Liu, Weifeng; Tao, Dacheng

2013-07-01

The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.
Ten steps to get started in Genome Assembly and Annotation

Science.gov (United States)

Dominguez Del Angel, Victoria; Hjerde, Erik; Sterck, Lieven; Capella-Gutierrez, Salvadors; Notredame, Cederic; Vinnere Pettersson, Olga; Amselem, Joelle; Bouri, Laurent; Bocs, Stephanie; Klopp, Christophe; Gibrat, Jean-Francois; Vlasova, Anna; Leskosek, Brane L.; Soler, Lucile; Binzer-Panchal, Mahesh; Lantz, Henrik

2018-01-01

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR). PMID:29568489
Sharing Map Annotations in Small Groups: X Marks the Spot

Science.gov (United States)

Congleton, Ben; Cerretani, Jacqueline; Newman, Mark W.; Ackerman, Mark S.

Advances in location-sensing technology, coupled with an increasingly pervasive wireless Internet, have made it possible (and increasingly easy) to access and share information with context of one’s geospatial location. We conducted a four-phase study, with 27 students, to explore the practices surrounding the creation, interpretation and sharing of map annotations in specific social contexts. We found that annotation authors consider multiple factors when deciding how to annotate maps, including the perceived utility to the audience and how their contributions will reflect on the image they project to others. Consumers of annotations value the novelty of information, but must be convinced of the author’s credibility. In this paper we describe our study, present the results, and discuss implications for the design of software for sharing map annotations.
Annotation-based feature extraction from sets of SBML models.

Science.gov (United States)

Alm, Rebekka; Waltemath, Dagmar; Wolfien, Markus; Wolkenhauer, Olaf; Henkel, Ron

2015-01-01

Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.
An Annotated Bibliography of the Literature Dealing with Adolescent Suicide and What Educators Can Do To Help Reduce This Problem.

Science.gov (United States)

Van Dyke, Catherine A.

The purpose of this study was to provide an overview of the complex and growing problem of adolescent suicide. The annotated bibliography consists of 7 articles on recent facts and data, 10 articles on causes determined by research, 11 items on indicators seen in adolescents, and 14 documents on how educators can help. A lack of secure social…
RNA-seq analysis of Quercus pubescens Leaves: de novo transcriptome assembly, annotation and functional markers development.

Directory of Open Access Journals (Sweden)

Sara Torre

Full Text Available Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.

Roadmap for annotating transposable elements in eukaryote genomes.

Science.gov (United States)

Permal, Emmanuelle; Flutre, Timothée; Quesneville, Hadi

2012-01-01

Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.
A method for automatically extracting infectious disease-related primers and probes from the literature

Directory of Open Access Journals (Sweden)

Pérez-Rey David

2010-08-01

Full Text Available Abstract Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1 convert each document into a tree of paper sections, (2 detect the candidate sequences using a set of finite state machine-based recognizers, (3 refine problem sequences using a rule-based expert system, and (4 annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.
One hundred prime references on hydrogeochemical and stream sediment surveying for uranium as internationally practiced, including 60 annotated references

International Nuclear Information System (INIS)

Sharp, R.R. Jr.; Bolivar, S.L.

1981-04-01

The United States Department of Energy (DOE), formerly the US ERDA, has initiated a nationwide Hydrogeochemical and Stream Sediment Reconnaissance (HSSR). This program is part of the US National Uranium Resource Evaluation, designed to provide an improved estimate for the availability and economics of nuclear fuel resources and make available to industry information for use in exploration and development of uranium resources. The Los Alamos National Laboratory is responsible for completing the HSSR in Rocky Mountain states of New Mexico, Colorado, Wyoming, and Montana and in the state of Alaska. This report contains a compilation of 100 prime references on uranium hydrogeochemical and stream sediment reconnaissance as internationally practiced prior to 1977. The major emphasis in selection of these references was directed toward constructing a HSSR program with the purpose of identifying uranium in the Los Alamos National Laboratory area of responsibility. The context of the annotated abstracts are the authors' concept of what the respective article contains relative to uranium geochemistry and hydrogeochemical and stream sediment surveying. Consequently, in many cases, significant portions of the original articles are not discussed. The text consists of two parts. Part I contains 100 prime references, alphabetically arranged. Part II contains 60 select annotated abstracts, listed in chronological order
ExpTreeDB: web-based query and visualization of manually annotated gene expression profiling experiments of human and mouse from GEO.

Science.gov (United States)

Ni, Ming; Ye, Fuqiang; Zhu, Juanjuan; Li, Zongwei; Yang, Shuai; Yang, Bite; Han, Lu; Wu, Yongge; Chen, Ying; Li, Fei; Wang, Shengqi; Bo, Xiaochen

2014-12-01

Numerous public microarray datasets are valuable resources for the scientific communities. Several online tools have made great steps to use these data by querying related datasets with users' own gene signatures or expression profiles. However, dataset annotation and result exhibition still need to be improved. ExpTreeDB is a database that allows for queries on human and mouse microarray experiments from Gene Expression Omnibus with gene signatures or profiles. Compared with similar applications, ExpTreeDB pays more attention to dataset annotations and result visualization. We introduced a multiple-level annotation system to depict and organize original experiments. For example, a tamoxifen-treated cell line experiment is hierarchically annotated as 'agent→drug→estrogen receptor antagonist→tamoxifen'. Consequently, retrieved results are exhibited by an interactive tree-structured graphics, which provide an overview for related experiments and might enlighten users on key items of interest. The database is freely available at http://biotech.bmi.ac.cn/ExpTreeDB. Web site is implemented in Perl, PHP, R, MySQL and Apache. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

Directory of Open Access Journals (Sweden)

Saeideh Ahangari

2010-05-01

Full Text Available In our modern technological world, Computer-Assisted Language learning (CALL is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotations, dynamic picture annotations, and written annotations on L2 vocabulary learning. To fulfill this objective, the researchers selected sixty four EFL learners as the participants of this study. The participants were randomly assigned to one of the four groups: a control group that received no annotations and three experimental groups that received: still picture annotations, dynamic picture annotations, and written annotations. Each participant was required to take a pre-test. A vocabulary post- test was also designed and administered to the participants in order to assess the efficacy of each annotation. First for each group a paired t-test was conducted between their pre and post test scores in order to observe their improvement; then through an ANCOVA test the performance of four groups was compared. The results showed that using multimedia annotations resulted in a significant difference in the participants’ vocabulary learning. Based on the results of the present study, multimedia annotations are suggested as a vocabulary teaching strategy.
Cross-Lingual Morphological Tagging for Low-Resource Languages

OpenAIRE

Buys, Jan; Botha, Jan A.

2016-01-01

Morphologically rich languages often lack the annotated linguistic resources required to develop accurate natural language processing tools. We propose models suitable for training morphological taggers with rich tagsets for low-resource languages without using direct supervision. Our approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer constraints on the possible tags for a given word type or token. We propose a tagging model using Ws...
pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature

Science.gov (United States)

Ding, Ruoyao; Arighi, Cecilia N.; Lee, Jung-Youn; Wu, Cathy H.; Vijay-Shanker, K.

2015-01-01

Background Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. Methods In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. Results We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9% (Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource.org/iprolink/). PMID:26258475
Approaches to Working with Children, Young People and Families for Traveller, Irish Traveller, Gypsy, Roma and Show People Communities. Annotated Bibliography for the Children's Workforce Development Council

Science.gov (United States)

Robinson, Mark; Martin, Kerry; Wilkin, Carol

2008-01-01

This annoted bibliography relays a range of issues and approaches to working with Travellers, Irish Travellers, Gypsies, Roma and Show People. This is an accompanying document to the literature review report, ED501860.
Environmental aspects of the transuranics. A selected, annotated bibliography. Volume 9

International Nuclear Information System (INIS)

Ensminger, J.T.; Fore, C.S.; Dailey, N.S.

1978-09-01

This ninth published bibliography of 589 references is compiled from the Nevada Applied Ecology Information Center's Data Base on the Environmental Aspects of the Transuranics. The data base was built to provide information support to the Nevada Applied Ecology Group (NAEG) of DOE's Nevada Operations Office. The general scope covers environmental aspects of uranium and the transuranic elements, with emphasis on plutonium. This annotated bibliography highlights literature on plutonium 238 and 239 and americium 241 in the critical organs of man and animals. Studies on the migration of plutonium and the transplutonics through the environment are also emphasized. Supporting information on ecology of the Nevada Test Site and reviews and summarizing literature on other radionuclides have been included at the request of the NAEG. The references are arranged by subject category with leading authors appearing alphabetically within each category. Indexes are provided for author(s), geographic location, keywords, taxonomic name, title, and publication description
Recent revisions of phosphate rock reserves and resources: reassuring or misleading? An in-depth literature review of global estimates of phosphate rock reserves and resources

Science.gov (United States)

Edixhoven, J. D.; Gupta, J.; Savenije, H. H. G.

2013-09-01

Phosphate rock (PR) is a finite mineral indispensible for fertilizer production and a major pollutant. High grade PR is obtained from deposits which took millions of years to form and are gradually being depleted. Over the past three years, global PR reserves as reported by US Geological Survey (USGS) have seen a massive increase, from 16 000 Mt PR in 2010 to 65 000 Mt PR in 2011. The bulk of this four-fold increase is based on a 2010 report by International Fertilizer Development Center (IFDC), which increased Moroccan reserves from 5700 Mt PR as reported by USGS, to 51 000 Mt PR, reported as upgraded ("beneficiated") concentrate. IFDC used a starkly simplified classification compared to the classification used by USGS and proposed that agreement should be reached on PR resource terminology which should be as simple as possible. The report has profoundly influenced the PR scarcity debate, shifting the emphasis from depletion to the pollution angle of the phosphate problem. Various analysts adopted the findings of IFDC and USGS, and argued that that following depletion of reserves, uneconomic deposits (resources and occurrences) will remain available which will extend the lifetime of available deposits to thousands of years. Given the near total dependence of food production on PR, data on PR deposits must be transparent, comparable, reliable and credible. Based on an in-depth literature review, we analyze (i) how IFDC's simplified terminology compares to international best practice in resource classification and whether it is likely to yield data that meets the abovementioned requirements; (ii) whether the difference between ore reserves and reserves as concentrate is sufficiently noted in the literature, and (iii) whether the IFDC report and its estimate of PR reserves and resources is reliable. We conclude that, while there is a global development toward common criteria in resource reporting, IFDC's definitions contravene this development and - due to their
Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

Science.gov (United States)

Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

2017-01-01

The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.
AutoFACT: An Automatic Functional Annotation and Classification Tool

Directory of Open Access Journals (Sweden)

Lang B Franz

2005-06-01

Full Text Available Abstract Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1 analyzes nucleotide and protein sequence data; (2 determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3 assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4 generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at http://megasun.bch.umontreal.ca/Software/AutoFACT.htm.
MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data

Science.gov (United States)

Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

2013-01-01

MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269
Enhancing faba bean (Vicia faba L.) genome resources

NARCIS (Netherlands)

Cooper, James W.; Wilson, Michael H.; Derks, M.F.L.; Smit, Sandra; Kunert, Karl J.; Cullis, Christopher; Foyer, C.H.

2017-01-01

Grain legume improvement is currently impeded by a lack of genomic resources. The paucity of genome information for faba bean can be attributed to the intrinsic difficulties of assembling/annotating its giant (~13 Gb) genome. In order to address this challenge, RNA-sequencing analysis was performed
LeARN: a platform for detecting, clustering and annotating non-coding RNAs

Directory of Open Access Journals (Sweden)

Schiex Thomas

2008-01-01

Full Text Available Abstract Background In the last decade, sequencing projects have led to the development of a number of annotation systems dedicated to the structural and functional annotation of protein-coding genes. These annotation systems manage the annotation of the non-protein coding genes (ncRNAs in a very crude way, allowing neither the edition of the secondary structures nor the clustering of ncRNA genes into families which are crucial for appropriate annotation of these molecules. Results LeARN is a flexible software package which handles the complete process of ncRNA annotation by integrating the layers of automatic detection and human curation. Conclusion This software provides the infrastructure to deal properly with ncRNAs in the framework of any annotation project. It fills the gap between existing prediction software, that detect independent ncRNA occurrences, and public ncRNA repositories, that do not offer the flexibility and interactivity required for annotation projects. The software is freely available from the download section of the website http://bioinfo.genopole-toulouse.prd.fr/LeARN
MicroScope: a platform for microbial genome annotation and comparative genomics.

Science.gov (United States)

Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

2009-01-01

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of
Fluid inclusions in salt: an annotated bibliography

International Nuclear Information System (INIS)

Isherwood, D.J.

1979-01-01

An annotated bibliography is presented which was compiled while searching the literature for information on fluid inclusions in salt for the Nuclear Regulatory Commission's study on the deep-geologic disposal of nuclear waste. The migration of fluid inclusions in a thermal gradient is a potential hazard to the safe disposal of nuclear waste in a salt repository. At the present time, a prediction as to whether this hazard precludes the use of salt for waste disposal can not be made. Limited data from the Salt-Vault in situ heater experiments in the early 1960's (Bradshaw and McClain, 1971) leave little doubt that fluid inclusions can migrate towards a heat source. In addition to the bibliography, there is a brief summary of the physical and chemical characteristics that together with the temperature of the waste will determine the chemical composition of the brine in contact with the waste canister, the rate of fluid migration, and the brine-canister-waste interactions
Procedures for the elicitation of expert judgements in the probabilistic risk analysis of the long-term effects of radioactive waste repositories: an annotated bibliography

International Nuclear Information System (INIS)

Watson, S.R.

1993-01-01

This annotated bibliography describes the key literature relevant to the elicitation of expert judgements in radioactive waste management. The bibliography is divided into seven sections; section 2 lists the literature exploring the proper interpretation of probabilities used in Probabilistic Risk Analysis (PRA). Section 3 lists literature describing other calculi for handling uncertainty in a numerical fashion. In section 4 comments are given on how to elicit probabilities from individuals as a measure of subjective degrees of belief and section 5 lists the literature concerning how expert judgements can be combined. Sections 6 and 7 list literature giving an overview of the issues involved in PRA for radioactive waste repositories. (author)
Decontamination of metals by melt refining/slagging. An annotated bibliography: Update on stainless steel and steel

Energy Technology Data Exchange (ETDEWEB)

Worchester, S.A.; Twidwell, L.G.; Paolini, D.J.; Weldon, T.A. [Montana Tech of the Univ., of Montana (United States); Mizia, R.E. [Lockheed Idaho Technologies Co., Idaho Falls, ID (United States)

1995-01-01

The following presentation is an update to a previous annotation, i.e., WINCO-1138. The literature search and annotated review covers all metals used in the nuclear industries but the emphasis of this update is directed toward work performed on mild steels. As the number of nuclear installations undergoing decontamination and decommissioning (D&D) increases, current radioactive waste storage space is consumed and establishment of new waste storage areas becomes increasingly difficult, the problem of handling and storing radioactive scrap metal (RSM) gains increasing importance in the DOE Environmental Restoration and Waste Management Program. To alleviate present and future waste problems, Lockheed Idaho Technologies Co (LITCO) is managing a program for the recycling of RSM for beneficial use within the DOE complex. As part of that effort, Montana Tech has been awarded a contract to help optimize melting and refining technology for the recycling of stainless steel RSM. The scope of the Montana Tech program includes a literature survey, a decontaminating slag design study, small wide melting studies to determine optimum slag compositions for removal of radioactive contaminant surrogates, analysis of preferred melting techniques, and coordination of large scale melting demonstrations (100--2,000 lbs) to be conducted at selected facilities. The program will support recycling and decontaminating stainless steel RSM for use in waste canisters for Idaho Waste Immobilization Facility densified high level waste and Pit 9/RWMC boxes. This report is the result of the literature search conducted to establish a basis for experimental melt/slag program development. The program plan will be jointly developed by Montana Tech and LITCO.
Automating Ontological Annotation with WordNet

Energy Technology Data Exchange (ETDEWEB)

Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

2006-01-22

Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

ONEMercury: Towards Automatic Annotation of Earth Science Metadata

Science.gov (United States)

Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

2012-12-01

Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the
A Resource Guide for Teachers of Hindu Literature.

Science.gov (United States)

Meehan, John L., Ed.; And Others

Focusing on Hinduism, India, and Indian literature, this course guide offers help in organizing a course or units of instruction. The guide contains a historical outline of the evolution of Hinduism over a period of 3,000 years; a glossary listing terms in the guide and in Hindu literature; five courses of different time lengths (the literature of…
Annotated checklist and database for vascular plants of the Jemez Mountains

Energy Technology Data Exchange (ETDEWEB)

Foxx, T. S.; Pierce, L.; Tierney, G. D.; Hansen, L. A.

1998-03-01

Studies done in the last 40 years have provided information to construct a checklist of the Jemez Mountains. The present database and checklist builds on the basic list compiled by Teralene Foxx and Gail Tierney in the early 1980s. The checklist is annotated with taxonomic information, geographic and biological information, economic uses, wildlife cover, revegetation potential, and ethnographic uses. There are nearly 1000 species that have been noted for the Jemez Mountains. This list is cross-referenced with the US Department of Agriculture Natural Resource Conservation Service PLANTS database species names and acronyms. All information will soon be available on a Web Page.
RICD: A rice indica cDNA database resource for rice functional genomics

Directory of Open Access Journals (Sweden)

Zhang Qifa

2008-11-01

Full Text Available Abstract Background The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Results Rice Indica cDNA Database (RICD is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. Conclusion The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.
DOE Hydropower Program biennial report 1992--1993 (with an updated annotated bibliography)

Energy Technology Data Exchange (ETDEWEB)

Cada, G.F.; Sale, M.J. [Oak Ridge National Lab., TN (United States); Francfort, J.E.; Rinehart, B.N.; Sommers, G.L. [EG and G Idaho, Inc., Idaho Falls, ID (United States)

1993-07-01

This report, the latest in a series of annual/biennial Hydropower Program reports sponsored by the US Department of Energy, summarizes the research and development and technology transfer activities of fiscal years 1992 and 1993. The report discusses the activities in the four areas of the hydropower program: Environmental research; resource assessment; research coat shared with industry; and technology transfer. The report also offers an annotated bibliography of reports pertinent to hydropower, written by persons in Federal and state agencies, cities, metropolitan water districts, irrigation companies, and public and independent utilities. Most reports are available from the National Technical Information Service.
Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations.

Science.gov (United States)

Tamborero, David; Rubio-Perez, Carlota; Deu-Pons, Jordi; Schroeder, Michael P; Vivancos, Ana; Rovira, Ana; Tusquets, Ignasi; Albanell, Joan; Rodon, Jordi; Tabernero, Josep; de Torres, Carmen; Dienstmann, Rodrigo; Gonzalez-Perez, Abel; Lopez-Bigas, Nuria

2018-03-28

While tumor genome sequencing has become widely available in clinical and research settings, the interpretation of tumor somatic variants remains an important bottleneck. Here we present the Cancer Genome Interpreter, a versatile platform that automates the interpretation of newly sequenced cancer genomes, annotating the potential of alterations detected in tumors to act as drivers and their possible effect on treatment response. The results are organized in different levels of evidence according to current knowledge, which we envision can support a broad range of oncology use cases. The resource is publicly available at http://www.cancergenomeinterpreter.org .
Prepare-Participate-Connect: Active Learning with Video Annotation

Science.gov (United States)

Colasante, Meg; Douglas, Kathy

2016-01-01

Annotation of video provides students with the opportunity to view and engage with audiovisual content in an interactive and participatory way rather than in passive-receptive mode. This article discusses research into the use of video annotation in four vocational programs at RMIT University in Melbourne, which allowed students to interact with…
Developing Annotation Solutions for Online Data Driven Learning

Science.gov (United States)

Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

2009-01-01

Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…
Effects of Reviewing Annotations and Homework Solutions on Math Learning Achievement

Science.gov (United States)

Hwang, Wu-Yuin; Chen, Nian-Shing; Shadiev, Rustam; Li, Jin-Sing

2011-01-01

Previous studies have demonstrated that making annotations can be a meaningful and useful learning method that promote metacognition and enhance learning achievement. A web-based annotation system, Virtual Pen (VPEN), which provides for the creation and review of annotations and homework solutions, has been developed to foster learning process…
A Set of Annotation Interfaces for Alignment of Parallel Corpora

Directory of Open Access Journals (Sweden)

Singh Anil Kumar

2014-09-01

Full Text Available Annotation interfaces for parallel corpora which fit in well with other tools can be very useful. We describe a set of annotation interfaces which fulfill this criterion. This set includes a sentence alignment interface, two different word or word group alignment interfaces and an initial version of a parallel syntactic annotation alignment interface. These tools can be used for manual alignment, or they can be used to correct automatic alignments. Manual alignment can be performed in combination with certain kinds of linguistic annotation. Most of these interfaces use a representation called the Shakti Standard Format that has been found to be very robust and has been used for large and successful projects. It ties together the different interfaces, so that the data created by them is portable across all tools which support this representation. The existence of a query language for data stored in this representation makes it possible to build tools that allow easy search and modification of annotated parallel data.
LocusTrack: Integrated visualization of GWAS results and genomic annotation.

Science.gov (United States)

Cuellar-Partida, Gabriel; Renteria, Miguel E; MacGregor, Stuart

2015-01-01

Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservation, chromatin state, and other genomic and regulatory elements. Currently, LocusTrack can integrate annotation tracks from the UCSC genome-browser as well as from any tracks provided by the user. LocusTrack is an easy-to-use application and can be accessed at the following URL: http://gump.qimr.edu.au/general/gabrieC/LocusTrack/. Users can upload and manage GWAS results and select from and/or provide annotation tracks using simple and intuitive menus. LocusTrack scripts and associated data can be downloaded from the website and run locally.
"Annotated Lectures": Student-Instructor Interaction in Large-Scale Global Education

Directory of Open Access Journals (Sweden)

Roger Diehl

2009-10-01

Full Text Available We describe an "Annotated Lectures" system, which will be used in a global virtual teaching and student collaboration event on embodied intelligence presented by the University of Zurich. The lectures will be broadcasted via video-conference to lecture halls of different universities around the globe. Among other collaboration features, an "Annotated Lectures" system will be implemented in a 3D collaborative virtual environment and used by the participating students to make annotations to the video-recorded lectures, which will be sent to and answered by their supervisors, and forwarded to the lecturers in an aggregated way. The "Annotated Lectures" system aims to overcome the issues of limited studentinstructor interaction in large-scale education, and to foster an intercultural and multidisciplinary discourse among students who review the lectures in a group. After presenting the concept of the "Annotated Lectures" system, we discuss a prototype version including a description of the technical components and its expected benefit for large-scale global education.
Semantic mashups intelligent reuse of web resources

CERN Document Server

Endres-Niggemeyer, Brigitte

2013-01-01

Mashups are mostly lightweight Web applications that offer new functionalities by combining, aggregating and transforming resources and services available on the Web. Popular examples include a map in their main offer, for instance for real estate, hotel recommendations, or navigation tools. Mashups may contain and mix client-side and server-side activity. Obviously, understanding the incoming resources (services, statistical figures, text, videos, etc.) is a precondition for optimally combining them, so that there is always some undercover semantics being used. By using semantic annotations
Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation.

Science.gov (United States)

Rund, Samuel S C; Yoo, Boyoung; Alam, Camille; Green, Taryn; Stephens, Melissa T; Zeng, Erliang; George, Gary F; Sheppard, Aaron D; Duffield, Giles E; Milenković, Tijana; Pfrender, Michael E

2016-08-18

Marine and freshwater zooplankton exhibit daily rhythmic patterns of behavior and physiology which may be regulated directly by the light:dark (LD) cycle and/or a molecular circadian clock. One of the best-studied zooplankton taxa, the freshwater crustacean Daphnia, has a 24 h diel vertical migration (DVM) behavior whereby the organism travels up and down through the water column daily. DVM plays a critical role in resource tracking and the behavioral avoidance of predators and damaging ultraviolet radiation. However, there is little information at the transcriptional level linking the expression patterns of genes to the rhythmic physiology/behavior of Daphnia. Here we analyzed genome-wide temporal transcriptional patterns from Daphnia pulex collected over a 44 h time period under a 12:12 LD cycle (diel) conditions using a cosine-fitting algorithm. We used a comprehensive network modeling and analysis approach to identify novel co-regulated rhythmic genes that have similar network topological properties and functional annotations as rhythmic genes identified by the cosine-fitting analyses. Furthermore, we used the network approach to predict with high accuracy novel gene-function associations, thus enhancing current functional annotations available for genes in this ecologically relevant model species. Our results reveal that genes in many functional groupings exhibit 24 h rhythms in their expression patterns under diel conditions. We highlight the rhythmic expression of immunity, oxidative detoxification, and sensory process genes. We discuss differences in the chronobiology of D. pulex from other well-characterized terrestrial arthropods. This research adds to a growing body of literature suggesting the genetic mechanisms governing rhythmicity in crustaceans may be divergent from other arthropod lineages including insects. Lastly, these results highlight the power of using a network analysis approach to identify differential gene expression and provide novel
Automatic Function Annotations for Hoare Logic

Directory of Open Access Journals (Sweden)

Daniel Matichuk

2012-11-01

Full Text Available In systems verification we are often concerned with multiple, inter-dependent properties that a program must satisfy. To prove that a program satisfies a given property, the correctness of intermediate states of the program must be characterized. However, this intermediate reasoning is not always phrased such that it can be easily re-used in the proofs of subsequent properties. We introduce a function annotation logic that extends Hoare logic in two important ways: (1 when proving that a function satisfies a Hoare triple, intermediate reasoning is automatically stored as function annotations, and (2 these function annotations can be exploited in future Hoare logic proofs. This reduces duplication of reasoning between the proofs of different properties, whilst serving as a drop-in replacement for traditional Hoare logic to avoid the costly process of proof refactoring. We explain how this was implemented in Isabelle/HOL and applied to an experimental branch of the seL4 microkernel to significantly reduce the size and complexity of existing proofs.
Automatically Annotated Mapping for Indoor Mobile Robot Applications

DEFF Research Database (Denmark)

Özkil, Ali Gürcan; Howard, Thomas J.

2012-01-01

This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...... localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally...... consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots....
AGORA : Organellar genome annotation from the amino acid and nucleotide references.

Science.gov (United States)

Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman

2018-03-29

Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.
Mining the literature: new methods to exploit keyword profiles.

Science.gov (United States)

Andrade-Navarro, Miguel A

2012-01-01

Bibliographic records in the PubMed database of biomedical literature are annotated with Medical Subject Headings (MeSH) by curators, which summarize the content of the articles. Two recent publications explain how to generate profiles of MeSH terms for a set of bibliographic records and to use them to define any given concept by its associated literature. These concepts can then be related by their keyword profiles, and this can be used, for example, to detect new associations between genes and inherited diseases. http://www.biomedcentral.com/1471-2105/13/249/abstracthttp://genomemedicine.com/content/4/9/75/abstract.
Annotated Tsunami bibliography: 1962-1976

International Nuclear Information System (INIS)

Pararas-Carayannis, G.; Dong, B.; Farmer, R.

1982-08-01

This compilation contains annotated citations to nearly 3000 tsunami-related publications from 1962 to 1976 in English and several other languages. The foreign-language citations have English titles and abstracts
Methods for eliciting, annotating, and analyzing databases for child speech development.

Science.gov (United States)

Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F

2017-09-01

Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of

Annotation-Based Whole Genomic Prediction and Selection

DEFF Research Database (Denmark)

Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...
Vesiclepedia: a compendium for extracellular vesicles with continuous community annotation.

Directory of Open Access Journals (Sweden)

Hina Kalra

Full Text Available Extracellular vesicles (EVs are membraneous vesicles released by a variety of cells into their microenvironment. Recent studies have elucidated the role of EVs in intercellular communication, pathogenesis, drug, vaccine and gene-vector delivery, and as possible reservoirs of biomarkers. These findings have generated immense interest, along with an exponential increase in molecular data pertaining to EVs. Here, we describe Vesiclepedia, a manually curated compendium of molecular data (lipid, RNA, and protein identified in different classes of EVs from more than 300 independent studies published over the past several years. Even though databases are indispensable resources for the scientific community, recent studies have shown that more than 50% of the databases are not regularly updated. In addition, more than 20% of the database links are inactive. To prevent such database and link decay, we have initiated a continuous community annotation project with the active involvement of EV researchers. The EV research community can set a gold standard in data sharing with Vesiclepedia, which could evolve as a primary resource for the field.
Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources

Science.gov (United States)

Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...
Creating New Medical Ontologies for Image Annotation A Case Study

CERN Document Server

Stanescu, Liana; Brezovan, Marius; Mihai, Cristian Gabriel

2012-01-01

Creating New Medical Ontologies for Image Annotation focuses on the problem of the medical images automatic annotation process, which is solved in an original manner by the authors. All the steps of this process are described in detail with algorithms, experiments and results. The original algorithms proposed by authors are compared with other efficient similar algorithms. In addition, the authors treat the problem of creating ontologies in an automatic way, starting from Medical Subject Headings (MESH). They have presented some efficient and relevant annotation models and also the basics of the annotation model used by the proposed system: Cross Media Relevance Models. Based on a text query the system will retrieve the images that contain objects described by the keywords.
Annotated checklist of Albanian butterflies (Lepidoptera, Papilionoidea and Hesperioidea

Directory of Open Access Journals (Sweden)

Rudi Verovnik

2013-08-01

Full Text Available The Republic of Albania has a rich diversity of flora and fauna. However, due to its political isolation, it has never been studied in great depth, and consequently, the existing list of butterfly species is outdated and in need of radical amendment. In addition to our personal data, we have studied the available literature, and can report a total of 196 butterfly species recorded from the country. For some of the species in the list we have given explanations for their inclusion and made other annotations. Doubtful records have been removed from the list, and changes in taxonomy have been updated and discussed separately. The purpose of our paper is to remove confusion and conflict regarding published records. However, the revised checklist should not be considered complete: it represents a starting point for further research.
Intra-species sequence comparisons for annotating genomes

Energy Technology Data Exchange (ETDEWEB)

Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

2004-07-15

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.
Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

Directory of Open Access Journals (Sweden)

McCarthy Fiona M

2007-11-01

Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and
Automatically annotating topics in transcripts of patient-provider interactions via machine learning.

Science.gov (United States)

Wallace, Byron C; Laws, M Barton; Small, Kevin; Wilson, Ira B; Trikalinos, Thomas A

2014-05-01

Annotated patient-provider encounters can provide important insights into clinical communication, ultimately suggesting how it might be improved to effect better health outcomes. But annotating outpatient transcripts with Roter or General Medical Interaction Analysis System (GMIAS) codes is expensive, limiting the scope of such analyses. We propose automatically annotating transcripts of patient-provider interactions with topic codes via machine learning. We use a conditional random field (CRF) to model utterance topic probabilities. The model accounts for the sequential structure of conversations and the words comprising utterances. We assess predictive performance via 10-fold cross-validation over GMIAS-annotated transcripts of 360 outpatient visits (>230,000 utterances). We then use automated in place of manual annotations to reproduce an analysis of 116 additional visits from a randomized trial that used GMIAS to assess the efficacy of an intervention aimed at improving communication around antiretroviral (ARV) adherence. With respect to 6 topic codes, the CRF achieved a mean pairwise kappa compared with human annotators of 0.49 (range: 0.47-0.53) and a mean overall accuracy of 0.64 (range: 0.62-0.66). With respect to the RCT reanalysis, results using automated annotations agreed with those obtained using manual ones. According to the manual annotations, the median number of ARV-related utterances without and with the intervention was 49.5 versus 76, respectively (paired sign test P = 0.07). When automated annotations were used, the respective numbers were 39 versus 55 (P = 0.04). While moderately accurate, the predicted annotations are far from perfect. Conversational topics are intermediate outcomes, and their utility is still being researched. This foray into automated topic inference suggests that machine learning methods can classify utterances comprising patient-provider interactions into clinically relevant topics with reasonable accuracy.
Selected, annotated bibliography of studies relevant to the isolation of nuclear wastes. [705 references

Energy Technology Data Exchange (ETDEWEB)

Hyder, L.K.; Fore, C.S.; Vaughan, N.D.; Faust, R.A.

1980-09-01

This annotated bibliography of 705 references represents the first in a series to be published by the Ecological Sciences Information Center containing scientific, technical, economic, and regulatory information relevant to nuclear waste isolation. Most references discuss deep geologic disposal, with fewer studies of deep seabed disposal; space disposal is also included. The publication covers both domestic and foreign literature for the period 1954 to 1980. Major chapters selected are Chemical and Physical Aspects; Container Design and Performance; Disposal Site; Envirnmental Transport; General Studies and Reviews; Geology, Hydrology and Site Resources; Regulatory and Economic Aspects; Repository Design and Engineering; Transportation Technology; Waste Production; and Waste Treatment. Specialized data fields have been incorporated to improve the ease and accuracy of locating pertinent references. Specific radionuclides for which data are presented are listed in the Measured Radionuclides field, and specific parameters which affect the migration of these radionuclides are presented in the Measured Parameters field. The references within each chapter are arranged alphabetically by leading author, corporate affiliation, or title of the document. When the author is not given, the corporate affiliation appears first. If these two levels of authorship are not given, the title of the document is used as the identifying level. Indexes are provided for author(s), keywords, subject category, title, geographic location, measured parameters, measured radionuclides, and publication description.
Annotating risk factors for heart disease in clinical narratives for diabetic patients.

Science.gov (United States)

Stubbs, Amber; Uzuner, Özlem

2015-12-01

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 296 patients for risk factors and the times they were present. We designed the annotation task for this track with the goal of balancing annotation load and time with quality, so as to generate a gold standard corpus that can benefit a clinically-relevant task. We applied light annotation procedures and determined the gold standard using majority voting. On average, the agreement of annotators with the gold standard was above 0.95, indicating high reliability. The resulting document-level annotations generated for each record in each longitudinal EMR in this corpus provide information that can support studies of progression of heart disease risk factors in the included patients over time. These annotations were used in the Risk Factor track of the 2014 i2b2/UTHealth shared task. Participating systems achieved a mean micro-averaged F1 measure of 0.815 and a maximum F1 measure of 0.928 for identifying these risk factors in patient records. Copyright © 2015 Elsevier Inc. All rights reserved.
Transcriptome survey of Patagonian southern beech Nothofagus nervosa (= N. Alpina: assembly, annotation and molecular marker discovery

Directory of Open Access Journals (Sweden)

Torales Susana L

2012-07-01

Full Text Available Abstract Background Nothofagus nervosa is one of the most emblematic native tree species of Patagonian temperate forests. Here, the shotgun RNA-sequencing (RNA-Seq of the transcriptome of N. nervosa, including de novo assembly, functional annotation, and in silico discovery of potential molecular markers to support population and associations genetic studies, are described. Results Pyrosequencing of a young leaf cDNA library generated a total of 111,814 high quality reads, with an average length of 447 bp. De novo assembly using Newbler resulted into 3,005 tentative isotigs (including alternative transcripts. The non-assembled sequences (singletons were clustered with CD-HIT-454 to identify natural and artificial duplicates from pyrosequencing reads, leading to 21,881 unique singletons. 15,497 out of 24,886 non-redundant sequences or unigenes, were successfully annotated against a plant protein database. A substantial number of simple sequence repeat markers (SSRs were discovered in the assembled and annotated sequences. More than 40% of the SSR sequences were inside ORF sequences. To confirm the validity of these predicted markers, a subset of 73 SSRs selected through functional annotation evidences were successfully amplified from six seedlings DNA samples, being 14 polymorphic. Conclusions This paper is the first report that shows a highly precise representation of the mRNAs diversity present in young leaves of a native South American tree, N. nervosa, as well as its in silico deduced putative functionality. The reported Nothofagus transcriptome sequences represent a unique resource for genetic studies and provide a tool to discover genes of interest and genetic markers that will greatly aid questions involving evolution, ecology, and conservation using genetic and genomic approaches in the genus.
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Directory of Open Access Journals (Sweden)

Parichit Sharma

Full Text Available The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Science.gov (United States)

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design
Annotating Emotions in Meetings

NARCIS (Netherlands)

Reidsma, Dennis; Heylen, Dirk K.J.; Ordelman, Roeland J.F.

We present the results of two trials testing procedures for the annotation of emotion and mental state of the AMI corpus. The first procedure is an adaptation of the FeelTrace method, focusing on a continuous labelling of emotion dimensions. The second method is centered around more discrete
Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

Science.gov (United States)

Good, Benjamin M; Nanis, Max; Wu, Chunlei; Su, Andrew I

2015-01-01

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402.
DOE Hydropower Program biennial report 1994--1995 with an updated annotated bibliography

Energy Technology Data Exchange (ETDEWEB)

Rinehart, B.N.; Francfort, J.E.; Sommers, G.L. [Lockheed Idaho Technologies Co., Idaho Falls, ID (United States); Cada, G.F.; Sale, M.J. [Oak Ridge National Lab., TN (United States)

1995-05-01

This report, the latest in a series of annual/biennial Hydropower Program reports sponsored by the US Department of Energy, summarizes the research and development and technology transfer activities of fiscal years 1994 and 1995. The report discusses the activities in the four areas of the hydropower program: Environmental Research; Resource Assessment; Research Cost-Shared with Industry; and Technology Transfer. The report also includes an annotated bibliography of reports pertinent to hydropower, written by the staff of the Idaho National Engineering Laboratory, Oak Ridge National Laboratory, Federal and state agencies, cities, metropolitan water districts, irrigation companies, and public and independent utilities. Most reports are available from the National Technical Information Service.
DOE Hydropower Program biennial report 1994--1995 with an updated annotated bibliography

International Nuclear Information System (INIS)

Rinehart, B.N.; Francfort, J.E.; Sommers, G.L.; Cada, G.F.; Sale, M.J.

1995-05-01

This report, the latest in a series of annual/biennial Hydropower Program reports sponsored by the US Department of Energy, summarizes the research and development and technology transfer activities of fiscal years 1994 and 1995. The report discusses the activities in the four areas of the hydropower program: Environmental Research; Resource Assessment; Research Cost-Shared with Industry; and Technology Transfer. The report also includes an annotated bibliography of reports pertinent to hydropower, written by the staff of the Idaho National Engineering Laboratory, Oak Ridge National Laboratory, Federal and state agencies, cities, metropolitan water districts, irrigation companies, and public and independent utilities. Most reports are available from the National Technical Information Service
DrugSig: A resource for computational drug repositioning utilizing gene expression signatures.

Directory of Open Access Journals (Sweden)

Hongyu Wu

Full Text Available Computational drug repositioning has been proved as an effective approach to develop new drug uses. However, currently existing strategies strongly rely on drug response gene signatures which scattered in separated or individual experimental data, and resulted in low efficient outputs. So, a fully drug response gene signatures database will be very helpful to these methods. We collected drug response microarray data and annotated related drug and targets information from public databases and scientific literature. By selecting top 500 up-regulated and down-regulated genes as drug signatures, we manually established the DrugSig database. Currently DrugSig contains more than 1300 drugs, 7000 microarray and 800 targets. Moreover, we developed the signature based and target based functions to aid drug repositioning. The constructed database can serve as a resource to quicken computational drug repositioning. Database URL: http://biotechlab.fudan.edu.cn/database/drugsig/.
Consumer energy research: an annotated bibliography. Vol. 3

Energy Technology Data Exchange (ETDEWEB)

Anderson, D.C.; McDougall, G.H.G.

1983-04-01

This annotated bibliography attempts to provide a comprehensive package of existing information in consumer related energy research. A concentrated effort was made to collect unpublished material as well as material from journals and other sources, including governments, utilities research institutes and private firms. A deliberate effort was made to include agencies outside North America. For the most part the bibliography is limited to annotations of empiracal studies. However, it includes a number of descriptive reports which appear to make a significant contribution to understanding consumers and energy use. The format of the annotations displays the author, date of publication, title and source of the study. Annotations of empirical studies are divided into four parts: objectives, methods, variables and findings/implications. Care was taken to provide a reasonable amount of detail in the annotations to enable the reader to understand the methodology, the results and the degree to which the implications fo the study can be generalized to other situations. Studies are arranged alphabetically by author. The content of the studies reviewed is classified in a series of tables which are intended to provide a summary of sources, types and foci of the various studies. These tables are intended to aid researchers interested in specific topics to locate those studies most relevant to their work. The studies are categorized using a number of different classification criteria, for example, methodology used, type of energy form, type of policy initiative, and type of consumer activity. A general overview of the studies is also presented. 17 tabs.
Image annotation based on positive-negative instances learning

Science.gov (United States)

Zhang, Kai; Hu, Jiwei; Liu, Quan; Lou, Ping

2017-07-01

Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.

An Annotated Dataset of 14 Meat Images

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....
Update of the FANTOM web resource

DEFF Research Database (Denmark)

Lizio, Marina; Harshbarger, Jayson; Abugessaisa, Imad

2017-01-01

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore...... transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit...... for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives....
neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

Science.gov (United States)

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being
MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes.

Science.gov (United States)

Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine

2017-01-04

The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Graph-based sequence annotation using a data integration approach

Directory of Open Access Journals (Sweden)

Pesch Robert

2008-06-01

Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

Science.gov (United States)

Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

2010-07-02

The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data
BEACON: automated tool for Bacterial GEnome Annotation ComparisON

KAUST Repository

Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

2015-01-01

We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/
Automatic medical image annotation and keyword-based image retrieval using relevance feedback.

Science.gov (United States)

Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal

2012-08-01

This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.
MIPS: analysis and annotation of proteins from whole genomes in 2005.

Science.gov (United States)

Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

2006-01-01

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs.

Science.gov (United States)

Zielezinski, Andrzej; Dziubek, Michal; Sliski, Jan; Karlowski, Wojciech M

2017-04-15

ORCAN (ORtholog sCANner) is a web-based meta-server for one-click evolutionary and functional annotation of protein sequences. The server combines information from the most popular orthology-prediction resources, including four tools and four online databases. Functional annotation utilizes five additional comparisons between the query and identified homologs, including: sequence similarity, protein domain architectures, functional motifs, Gene Ontology term assignments and a list of associated articles. Furthermore, the server uses a plurality-based rating system to evaluate the orthology relationships and to rank the reference proteins by their evolutionary and functional relevance to the query. Using a dataset of ∼1 million true yeast orthologs as a sample reference set, we show that combining multiple orthology-prediction tools in ORCAN increases the sensitivity and precision by 1-2 percent points. The service is available for free at http://www.combio.pl/orcan/ . wmk@amu.edu.pl. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Design and Applications of a GeoSemantic Framework for Integration of Data and Model Resources in Hydrologic Systems

Science.gov (United States)

Elag, M.; Kumar, P.

2016-12-01

Hydrologists today have to integrate resources such as data and models, which originate and reside in multiple autonomous and heterogeneous repositories over the Web. Several resource management systems have emerged within geoscience communities for sharing long-tail data, which are collected by individual or small research groups, and long-tail models, which are developed by scientists or small modeling communities. While these systems have increased the availability of resources within geoscience domains, deficiencies remain due to the heterogeneity in the methods, which are used to describe, encode, and publish information about resources over the Web. This heterogeneity limits our ability to access the right information in the right context so that it can be efficiently retrieved and understood without the Hydrologist's mediation. A primary challenge of the Web today is the lack of the semantic interoperability among the massive number of resources, which already exist and are continually being generated at rapid rates. To address this challenge, we have developed a decentralized GeoSemantic (GS) framework, which provides three sets of micro-web services to support (i) semantic annotation of resources, (ii) semantic alignment between the metadata of two resources, and (iii) semantic mediation among Standard Names. Here we present the design of the framework and demonstrate its application for semantic integration between data and models used in the IML-CZO. First we show how the IML-CZO data are annotated using the Semantic Annotation Services. Then we illustrate how the Resource Alignment Services and Knowledge Integration Services are used to create a semantic workflow among TopoFlow model, which is a spatially-distributed hydrologic model and the annotated data. Results of this work are (i) a demonstration of how the GS framework advances the integration of heterogeneous data and models of water-related disciplines by seamless handling of their semantic
Resources for Developing Acquaintance Rape Prevention Programs for Men.

Science.gov (United States)

Earle, James P.; Nies, Charles T.

1994-01-01

Provides an annotated bibliography of videos and printed materials that may be used as educational tools in rape prevention programs. Focuses on sources that are aimed directly at men. Also outlines the use of consultants or lecturers as one of many resources in the construction and implementation of rape prevention programs. (KW)
MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

Science.gov (United States)

Holt, Carson; Yandell, Mark

2011-12-22

Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.
Annotation: The Savant Syndrome

Science.gov (United States)

Heaton, Pamela; Wallace, Gregory L.

2004-01-01

Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…
Analysis of LYSA-calculus with explicit confidentiality annotations

DEFF Research Database (Denmark)

Gao, Han; Nielson, Hanne Riis

2006-01-01

Recently there has been an increased research interest in applying process calculi in the verification of cryptographic protocols due to their ability to formally model protocols. This work presents LYSA with explicit confidentiality annotations for indicating the expected behavior of target...... malicious activities performed by attackers as specified by the confidentiality annotations. The proposed analysis approach is fully automatic without the need of human intervention and has been applied successfully to a number of protocols....
Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud

Directory of Open Access Journals (Sweden)

Bahar Sateli

2015-12-01

Full Text Available Motivation. Finding relevant scientific literature is one of the essential tasks researchers are facing on a daily basis. Digital libraries and web information retrieval techniques provide rapid access to a vast amount of scientific literature. However, no further automated support is available that would enable fine-grained access to the knowledge ‘stored’ in these documents. The emerging domain of Semantic Publishing aims at making scientific knowledge accessible to both humans and machines, by adding semantic annotations to content, such as a publication’s contributions, methods, or application domains. However, despite the promises of better knowledge access, the manual annotation of existing research literature is prohibitively expensive for wide-spread adoption. We argue that a novel combination of three distinct methods can significantly advance this vision in a fully-automated way: (i Natural Language Processing (NLP for Rhetorical Entity (RE detection; (ii Named Entity (NE recognition based on the Linked Open Data (LOD cloud; and (iii automatic knowledge base construction for both NEs and REs using semantic web ontologies that interconnect entities in documents with the machine-readable LOD cloud.Results. We present a complete workflow to transform scientific literature into a semantic knowledge base, based on the W3C standards RDF and RDFS. A text mining pipeline, implemented based on the GATE framework, automatically extracts rhetorical entities of type Claims and Contributions from full-text scientific literature. These REs are further enriched with named entities, represented as URIs to the linked open data cloud, by integrating the DBpedia Spotlight tool into our workflow. Text mining results are stored in a knowledge base through a flexible export process that provides for a dynamic mapping of semantic annotations to LOD vocabularies through rules stored in the knowledge base. We created a gold standard corpus from computer
Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader.

Science.gov (United States)

Richardson, Mark F; Sequeira, Fernando; Selechnik, Daniel; Carneiro, Miguel; Vallinoto, Marcelo; Reid, Jack G; West, Andrea J; Crossland, Michael R; Shine, Richard; Rollins, Lee A

2018-01-01

Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. © The Authors 2017. Published by Oxford University Press.
First generation annotations for the fathead minnow (Pimephales promelas) genome

Science.gov (United States)

Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...
Literature searches on Ayurveda: An update.

Science.gov (United States)

Aggithaya, Madhur G; Narahari, Saravu R

2015-01-01

The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. To update on Ayurveda literature search and strategy to retrieve maximum publications. Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Five among 46 databases are now relevant - AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. "The Researches in Ayurveda" and "Ayurvedic Research Database" (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand
Resource Directory of Hispanic Educational Materials on Child Abuse Prevention.

Science.gov (United States)

Hill, Nancy; And Others

This annotated resource directory lists brochures, booklets, audiovisual materials, charts, and other educational materials, most of which are available in both English and Spanish, that address the following issues: (1) child abuse; (2) child development; (3) parenting skills; (4) mental health; (5) self-esteem; (6) stress management; (7) family…

Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

Science.gov (United States)

Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

2016-01-01

Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.
"Tropic of Cancer" and the Censors: A Case Study and Bibliographic Guide to the Literature.

Science.gov (United States)

Kincaid, Larry; Koger, Grove

1997-01-01

Traces the history of Henry Miller's novel "Tropic of Cancer"--censored in England and America for being too obscene--from its inception in 1932 to its vindication by the United States judicial system 30 years later. Also includes an annotated bibliography of related literature. (AEF)
Jannovar: a java library for exome annotation.

Science.gov (United States)

Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

2014-05-01

Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.
Graph-based sequence annotation using a data integration approach.

Science.gov (United States)

Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

2008-08-25

The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator

Science.gov (United States)

Seyed, P.; Chastain, K.; McGuinness, D. L.

2013-12-01

Use of Semantic Web technologies for data management in the Earth sciences (and beyond) has great potential but is still in its early stages, since the challenges of translating data into a more explicit or semantic form for immediate use within applications has not been fully addressed. In this abstract we help address this challenge by introducing the SemantEco Annotator, which enables anyone, regardless of expertise, to semantically annotate tabular Earth Science data and translate it into linked data format, while applying the logic inherent in community-standard vocabularies to guide the process. The Annotator was conceived under a desire to unify dataset content from a variety of sources under common vocabularies, for use in semantically-enabled web applications. Our current use case employs linked data generated by the Annotator for use in the SemantEco environment, which utilizes semantics to help users explore, search, and visualize water or air quality measurement and species occurrence data through a map-based interface. The generated data can also be used immediately to facilitate discovery and search capabilities within 'big data' environments. The Annotator provides a method for taking information about a dataset, that may only be known to its maintainers, and making it explicit, in a uniform and machine-readable fashion, such that a person or information system can more easily interpret the underlying structure and meaning. Its primary mechanism is to enable a user to formally describe how columns of a tabular dataset relate and/or describe entities. For example, if a user identifies columns for latitude and longitude coordinates, we can infer the data refers to a point that can be plotted on a map. Further, it can be made explicit that measurements of 'nitrate' and 'NO3-' are of the same entity through vocabulary assignments, thus more easily utilizing data sets that use different nomenclatures. The Annotator provides an extensive and searchable
Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.

Science.gov (United States)

Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J

2014-01-01

Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.
Automated evaluation of annotators for museum collections using subjective login

NARCIS (Netherlands)

Ceolin, D.; Nottamkandath, A.; Fokkink, W.J.; Dimitrakos, Th.; Moona, R.; Patel, Dh.; Harrison McKnight, D.

2012-01-01

Museums are rapidly digitizing their collections, and face a huge challenge to annotate every digitized artifact in store. Therefore they are opening up their archives for receiving annotations from experts world-wide. This paper presents an architecture for choosing the most eligible set of
A Guide to Outdoor Education Resources and Programs for the Handicapped. Outdoor Education for the Handicapped.

Science.gov (United States)

Kentucky Univ., Lexington.

The resource guide is designed to assist educators, park resource persons, and parents of disabled children in locating and identifying sources of information for developing, implementing, and evaluating outdoor education programs for all disabled children and youth. The guide has two main parts. The first part contains an annotated bibliography…
Genome Annotation and Transcriptomics of Oil-Producing Algae

Science.gov (United States)

2015-03-16

AFRL-OSR-VA-TR-2015-0103 GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE Sabeeha Merchant UNIVERSITY OF CALIFORNIA LOS ANGELES Final...2010 To 12-31-2014 4. TITLE AND SUBTITLE GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE 5a. CONTRACT NUMBER FA9550-10-1-0095 5b...NOTES 14. ABSTRACT Most algae accumulate triacylglycerols (TAGs) when they are starved for essential nutrients like N, S, P (or Si in the case of some
Animal Rights: Selected Resources and Suggestions for Further Study.

Science.gov (United States)

Davidoff, Donald J.

1989-01-01

Presents an annotated list of selected resources intended to serve as a guide to the growing amount of material on animal rights. Suggestions to aid in additional research include subject headings used to find books, indexes used to locate periodical articles, sources for locating organizations, and a selected list of animal rights organizations.…
Annotating smart environment sensor data for activity learning.

Science.gov (United States)

Szewcyzk, S; Dwan, K; Minor, B; Swedlove, B; Cook, D

2009-01-01

The pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track the activities that people perform at home. Machine learning techniques can perform this task, but the software algorithms rely upon large amounts of sample data that is correctly labeled with the corresponding activity. Labeling, or annotating, sensor data with the corresponding activity can be time consuming, may require input from the smart home resident, and is often inaccurate. Therefore, in this paper we investigate four alternative mechanisms for annotating sensor data with a corresponding activity label. We evaluate the alternative methods along the dimensions of annotation time, resident burden, and accuracy using sensor data collected in a real smart apartment.
Annotation Method (AM): SE7_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE7_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE36_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE36_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE14_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE14_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE33_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE33_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE12_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE12_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE20_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE20_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE2_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE2_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data
Annotation Method (AM): SE28_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE28_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat
Annotation Method (AM): SE11_AM1 [Metabolonote[Archive

Lifescience Database Archive (English)

Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE11_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

Some links on this page may take you to non-federal websites. Their policies may differ from this site.